High Dependability Computing Program Modeling Dependability The Unified Model of Dependabil

更新时间：2023-05-22 17:13:01 阅读量：实用文档文档下载

说明：文章内容仅供预览，部分内容可能不全。下载后的文档，内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的，是否完整无缺。

high推荐度：
相关推荐

High Dependability Computing Program

Modeling Dependability

The Unified Model of Dependability

Victor Basili

Paolo Donzelli

Sima Asgari

Computer Science Department

University of Maryland

College Park, Maryland 20742

Technical Report CS-TR-4601 - UMIACS-TR-2004-43

June 2004

Abstract

Achieving dependability is a major challenge, which has spawned many efforts both at national and international levels. This work is part of the High Dependability Computing Program (HDCP), a five-year cooperative research agreement between NASA and various universities and research centers to increase NASA’s ability to engineer highly dependable software systems.

HDCP brings together, under the common goal of improving systems dependability, a large and heterogeneous group of actors, from industry and academia alike, with various perspectives, and different (sometimes even conflicting) needs. Thus, the polysemous nature of the concept of dependability, while unifying so many different efforts, brings also the risk of creating confusion, making the task of developing dependable systems even more difficult.

From this perspective, the Unified Model of Dependability (UMD) aims to establish a common language for discussing a variety of dependability attributes, and to make them measurable. To capture stakeholders’ dependability needs and perspectives, UMD takes into account different aspects of a dependability attribute, including the affected system functionalities, the acceptable manifestation of a specific failure (hazard) or class of failures (hazards), the external events (adverse conditions, attacks, etc.) that can create an unreliable system, and the expected system reaction to mitigate failures (hazards) impact over the stakeholders.

By providing a structured approach to eliciting and organizing both functional and non-functional dependability requirements, UMD helps the stakeholders to better express their needs, understand interactions among the dependability attributes, and set the corresponding values.

In order to illustrate the features and capabilities of UMD, an Air Traffic Control System is used as case study.

Table of Contents

1 Introduction...........................................................................................................................4 2 The Unified Model of Dependability (UMD)..................................................................6

2.1 Identifying the building blocks of dependability....................................................6

2.2 “Robustness” of UMD..............................................................................................10

2.3 UMD to capture stakeholders dependability needs..............................................14

2.4 Measuring dependability..........................................................................................16

2.5 Enhancing UMD: capturing the “System Reaction”............................................17

2.6 The UMD Tool..........................................................................................................19 3 Applying UMD to build a System Dependability Model.............................................19

3.1 The case study – TSAFE..........................................................................................19

3.2 Data Gathering...........................................................................................................21

3.3 Data Analysis.............................................................................................................24 4 Formalizing the UMD application process.....................................................................28

4.1 The single-stakeholder scenario..............................................................................28

4.2 The multiple-stakeholder scenario..........................................................................29 5 Conclusions and future work............................................................................................30 6 References...........................................................................................................................31 Appendix A – Description of the UMD Tool.........................................................................33

1 Introduction

Individuals and organizations increasingly use sophisticated software systems from which they demand great reliance. “Reliance” is contextually subjective and depends on the particular users’ needs, therefore, in different circumstances, stakeholders will focus on different properties of such systems, e.g., availability, performance, real-time response, ability to avoid catastrophic failures, capability of resisting adverse conditions, and prevention of deliberate intrusions, as well as different levels of adherence to such properties. The concept of dependability enables these various concerns to be subsumed within a single conceptual framework. The International Federation for Information Processing (IFIP) WG-10.4 [7] defines dependability as the trustworthiness of a computing system that allows reliance to be justifiably placed on the services it delivers. Achieving systems dependability is a major challenge, and it has spawned many efforts at the national and international level, such as the European Dependability Initiative [14], the US Government strategy “Trust in cyberspace” [15], or the Critical Infrastructures improvement and protection initiatives adopted by various countries

[10,16]. This work is part of the High Dependability Computing Program (HDCP), a five-year cooperative research agreement between NASA and various universities and research centers1, to increase NASA’s ability to engineer highly dependable software systems. The Program involves: a) understanding NASA’s dependability problems; b) developing new engineering practices and technologies to address such problems; c) empirically assessing (and iteratively improving) the capabilities of new practices and technologies, using realistic testbeds; d) transferring technologies to technology users with clear indications about their effectiveness under varying conditions.

HDCP brings together, under the common goal of improving systems dependability, a large and heterogeneous group of actors, from government and academia alike, with various perspectives, and different (sometimes even conflicting) needs. First, there are the actors directly involved in using, building, and developing systems or technologies:

The system users, who are concerned mainly about the final system’s behavior, and who need to understand whether or not, and to what extent, they can depend upon a system to achieve their goals.

The system developers (or technology users), who need to know which processes and or technologies should be selected to meet the system users’ needs in the most efficient and effective way.

The technology researchers/developers, who focus on specific means to develop dependable systems [1].

The empiricists, whose role is to help the users define dependability needs, support the developers in selecting the right approaches, and provide empirical evidence of the technology’s ability to meet those needs. The empirical researchers act as “observers” to support the transfer of knowledge (needs, opportunities, technologies’ capabilities and limits) among the other actors.

The universities and research centers involved in HDCP are: Carnegie Mellon, University of Maryland, Fraunhofer Center Maryland, University of Southern California, Massachusetts Institute of Technology, University of Washington, University of Wisconsin, and many others 1

The success of the program depends on the synergic collaboration of all these actors. It would be valuable to have a common and operational definition of dependability that allows:

The system users to express their needs (i.e. build a precise dependability model

of the required system), in such a way that can be understood, and eventually addressed by the others;

The system developers to clearly compare what they are delivering with what

requested by the users;

The technology researchers/developers to make explicit their goals in terms of the

impact of their technology over dependability;

The empirical researchers to measure, and make explicit what is achievable and

what has been achieved. For example, the gap between users’ demands and developers’ products, or between technology developers’ claims and actual technologies performances. This means to be able to identify the “good” practices and support their transfer.

Many definitions of dependability have been provided in the literature, see for example [1,7,8,9,11]. However, they are mostly general and qualitative. It may not be possible to find a common and operational definition of dependability. To be operational, in fact, a definition needs to be strictly related to the specific context it refers to (the project and its stakeholders). For this reason, we have adopted an alternative approach. Rather than stating yet another definition of dependability, we are identifying a framework for modeling dependability that the different actors could adopt as a common language, enabling them to communicate and understand each other’s needs.

Figure 1. A framework to foster cooperation

This paper is organized as follows. Section 2 introduces the Unified Model of Dependability (UMD), by illustrating its underlying concepts. It discusses its robustness, and shows how UMD can be used to capture the users’ dependability needs (or more precisely the stakeholders’ needs [18]), to build dependability models of individual

systems. A comparison with related work is also provided. Section 3 shows how the UMD can be customized to a specific context/project to obtain a system dependability model that can be used as operational dependability definition. A case study is used for illustration. Section 4 formalizes the process for applying UMD in both a single and a multi-stakeholder scenario. Finally, Section 4 provides an outline of the future work. 2 The Unified Model of Dependability (UMD)

This Section introduces UMD, by illustrating the underlying theory, and discusses its robustness. It also provides a comparison with related works.

2.1 Identifying the building blocks of dependability

Dependability involves many different attributes, and each attribute can be defined in a variety of ways. In order to begin our analysis for identifying the building blocks of dependability, around which we build UMD, let us consider a standard sub-set of such attributes: reliability, accuracy, performance, availability, survivability, security, maintainability, and safety. It is important to note that this choice is purely arbitrary, and any other set could have been adopted, as in the following we will show that our results are independent from the selected set. For each of these attributes different definitions are available in literature. In the following we have randomly chosen some of them from

[1,4,9,13]:

o Reliability is an index of how often the system or part of it fails.

o Accuracy is the ability of the system to provide data within the desired range and with the required precision.

o Performance is a static or dynamic system’s capability (response time, throughput) defined in terms of an acceptable range.

o Availability is the degree to which a system or component is operational and accessible when required for use.

o Survivability is the ability of a system to provide essential services in the presence of adverse conditions that can occasionally happen within its operational environment (e.g., exceptional weather conditions, un-natural load peaks, etc.).

o Security is the system’s capability to resist attacks intentionally carried on against the system (e.g., logical breaches, data accesses, denial of service attacks, etc.).

o Maintainability is the ability of the system to undergo repairs and modifications.

o Safety is the absence of catastrophic consequences on the user(s) and the environment.

Based on the above definitions, we observe that dependability can be viewed as an index of the issues that the system can cause to the users. In other terms, given two similar systems, the one that causes fewer, and less severe issues is the one that is more dependable for the users. By carefully reading the above definitions, we can also recognize that an issue may derive from the misbehavior of the system (e.g., the system fails, or is not available at a given time, or is not able to survive external adverse conditions), or because the system creates a situation that could lead to catastrophic consequences for the users or the environment (see definition of safety). For this reasons, we distinguish between two kinds of issues:

Failure: any departure of the system behavior from the user’s expectations.

Hazard: a state of the system that can lead to catastrophic consequences for the user(s) and the environment.

Note that the concepts of hazard and failure are not exclusive, but overlap: a failure may be also a hazard (i.e. a failure can lead to an accident), whereas a hazard can occur without a failure occurring. Given the chosen set of dependability attributes, then, we can further distinguish failures into different failure types:

Accuracy failure: the departure of the system behavior from providing data within the desired range and with the required precision;

Performance failure: the departure of the system behavior from providing the desired static or dynamic capability (response time, throughput);

Other failure: any failure that cannot be classified as accuracy or performance failure.

In addition, having availability among the chosen dependability attributes, we can also distinguish failures according to their impact upon availability. For example, we can distinguish between:

Stopping failure is any failure that makes the system unavailable.

Non-Stopping failure is any failure that does not make the system unavailable. It is worth noting that the above classifications in terms of Failures Types (accuracy, performance, other) and Failure impact over availability (stopping, non-stopping) are orthogonal.

The same observations can be repeated for the hazards. Based on the above definition of safety, in fact, we can distinguish different hazards types:

User(s) Hazard: a state of the system that can lead to catastrophic consequences for the user(s);

Environment Hazard: a state of the system that can lead to catastrophic consequences for the environment.

Finally, from the above definitions (see for example reliability), we can also observe that the issues caused to the users by a system could result from the misbehavior of the whole system or of part of it, for example, a service or component. Thus, we can characterize an issue in terms of the part of the system that it affects. We distinguish the scope:

The system, i.e., the whole system;

A service, i.e., a functionality delivered by the system, as perceived by the users (a human or another interacting system).

From this initial analysis, thus, it results that some concepts are common across the different definitions, however, with different degrees of commonality and independence from the chosen set of attributes. The concept of issue (with the more elementary ones of failure and hazard) and the concept of scope are common across all the attributes and independent from the initial set. Each dependability attribute can in fact be defined in terms of some kind of issues affecting the whole system or part of it. The characterizations of failure, hazard and scope, instead, depend on the set of dependability attributes taken into account. For example, the distinction of failures into accuracy, performance and other failures is the result of the chosen sub-set of dependability attributes. Similarly, the idea of classifying failures according to their impact on availability results from having availability among the considered attributes. In this case,

in particular, the choice of distinguishing only between stopping and non-stopping failures is purely arbitrary. A finer distinction (e.g., stopping, partly stopping, and non-stopping) could be adopted in order to be able to model gradual services degradations. The emerging concepts and their relationships are pictured in Figure 2. This structure represents the common backbone of the different dependability attributes definitions taken into account, and thus, it provides an initial structure for our framework. In Figure 2, to distinguish the UMD concepts with higher commonality and independence (i.e. issue, failure, hazard and scope) from the ones with lower commonality and independence (i.e., the characterizations), the latter are shown on a darker background. In the following, we refer to them as UMD Hardware component; and as UMD Software component, respectively. FAILURE

characterization:

- Type

- Accuracy failure

- Performance failure

- Other failurecharacterization:- Type - Whole System - Service

- Availability impact

- Stopping

- Non-Stopping

HAZARD

characterization:

- Type

- User(s) hazard - Environment hazard

Figure 2. The “emerging” UMD

By using the concepts of UMD, all the dependability attributes definitions taken into account can be reformulated. For example, we can define availability as the index of all the stopping failures, of any type (accuracy, performance, or others) (ISSUE) affecting the system or a service (SCOPE), where the definitions of stopping failures, and of accuracy, performance and other failures are the ones given above. Similarly, the definitions of the others dependability attributes introduced above become:

o (SCOPE).

o o (SCOPE).

o (SCOPE), due to adverse conditions that can occasionally happen within its operational environment (e.g., exceptional weather conditions, un-natural load peaks, etc.).

due to attacks intentionally carried on against the system (e.g., logical breaches, data accesses, denial of service attacks, etc.).

o Maintainability2(SCOPE), due to actions intentionally carried on to improve the system (e.g., repairs, upgrades).

o Safety: index of the hazards (ISSUE) created by the system or a service (SCOPE).

At this point, we can start from these new definitions for refining our analysis. We recognize that some failures (see the definitions of survivability, security, and maintainability) are the results of some external events. Due to our choice of the initial set of dependability attributes, we can distinguish three main external events types:

Adverse condition: any external event that may have an actual or potential harmful effect on the system or a service (e.g., extreme weather conditions, un-natural load peaks, etc.);

Attack: any intentional action carried on against the system or a service (e.g., logical breaches, data accesses, denial of service attacks, etc.);

Update: any action intentionally carried on to change the system or a service (e.g., repairs, upgrades.). characterization:

- Type

- Adverse ConditionFAILUREcharacterization:- Type - Accuracy failurecharacterization:- Type - Whole System - Service - Attack

- Upgrades

- Performance failure

- Other failure

- Availability impact

- Stopping

- Non-Stopping

HAZARD

characterization:

- Type

- User(s) hazard - Environment hazard

Figure 3: The “evolving” UMD

Thus, the concept of external event emerges as another common item across the different definitions. Each dependability attribute can in fact be defined in terms of some kind of issues affecting the whole system or part of it (the scope), due or not due to some external events. Figure 3 extends the framework introduced in Figure 2, by encompassing the new concept of event

By using the new framework, the definitions of the dependability attributes become:

2 Note that with this new definition of maintainability we cover only partially the initial one. While the original definition encompasses, for example, the capability of the system of being repaired and/or upgraded within the expected budget and time, the new definition focuses only upon the easiness of the maintenance process, taking into account the possible issues caused by repairs and upgrades. UMD, however, also allows for the expression of the desired system behavior during maintenance, as will be illustrated in the Section “Capturing System reaction”.

(SCOPE), which are due or not due to external events (EVENT).

o (SCOPE), which are due or not due to external events (EVENT).

o o (SCOPE), which are due or not due to external events (EVENT).

o o o o Safety: index of the hazards (ISSUE) created by the system or a service (SCOPE),

It is important to note that these definitions reflect only some of the possibilities. In fact, while we have defined accuracy as “an index of the accuracy failures, affecting the system or a service, which are due or not due to external events”, it would have been also possible to define it as “an index of the accuracy failures, affecting the system or a service, which are not due to external events”, clearly stating that accuracy failures due to external events are not to be considered as part of the accuracy of the system. This is also true for the other definitions. Thus, by reformulating the definitions around the identified framework, we have obtained definitions more precise than the original ones (where some aspects were left implicit). Put another way, the framework not only provides a guide to define the different attributes, but also helps to make explicit options that could have been neglected otherwise.

2.2 “Robustness” of UMD

In order to see if UMD can really be adopted on a larger scale, that is, determine whether it can encompass the many available definitions of dependability and its attributes, we will verify the capability of the framework to accommodate: a) different definitions of the dependability attributes already taken into account; b) other dependability attributes not included into the initial set.

Let us start by considering different possible definitions for the same dependability attributes. In the following, for each definition found in literature (Literature Definition – LD), we will provide corresponding definition expressed by using the framework (Framework Definition – FD), together with the necessary framework adjustments and extensions (FA).

o LD: Reliability is the continuity of correct service (a service is correct when implements the system function) [Laprie01]. (SCOPE), which are due or not due to external events (EVENT).

FA: none

o LD: Availability is the capability to maximize the function of time that the system will provide stakeholder-desired levels of service with respect to a system’s operational profile (probability distribution of transaction frequencies, task complexities, workload volumes, others) [Boehm03]. due to external events (EVENT).

FA: The adjustments concern the Software component of the Framework:

Stopping failures are defined as failures preventing the system from

providing the stakeholder-desired levels of service.

The system’s operational profile (probability distribution of transaction

frequencies, task complexities, workload volumes, others) is added as

further Scope characterization.

o LD: Availability is the ability of the system to provide service at any given time. It is the probability of being operational, under given use condition, at a given instant in time [Melhart00]. events (EVENT).

FA: The adjustments and extensions concern the Software component of the Framework:

Use condition is added as further Scope characterization.

o LD: Accuracy is the capability to minimize the difference between delivered computational results and the real world quantities that they represent [Boehm03]. FA: The adjustments and extensions concern the Software component of the Framework:

Accuracy failures are defined as differences between delivered computational results and the real world quantities that they represent.

o LD: Survivability is the capability of the system to minimize the expected value of the information, property, human life and health losses due to natural causes

[Boehm03]. FA: The adjustments and extensions concern the Software component of the Framework:

Hazards are defined as system states that can lead to information,

property, human life and health losses.

Adverse conditions are defined as all natural causes.

o LD: Survivability is the capability of the system to accomplish its mission despite a man-made hostile environment, i.e. the power of the system to detect and withstand an attack [Melhart00]. FA: The adjustments and extensions concern the Software component of the Framework:

Adverse conditions are defined as any man-made hostile factor.

o LD: Security is the capability of the system to minimize the expected value of the information, property, human life and health losses due to adversarial causes

[Boehm03]. FA: The adjustments and extensions concern the Software component of the Framework:

Hazards are defined as system states that can lead to information,

property, human life and health losses.

Attacks are defined as adversarial causes.

o LD: Performance is concerned with quantifiable attributes of the system, such as response time (how quickly the system reacts to a user input), throughput (how much work the system can accomplish within a specified amount of time), availability (the degree to witch a system or component is operational or accessible when required for use), and accuracy [Bruegge04]. FA: The adjustments and extensions concern the Software component of the Framework:

Performance failures are defined as response time failures, throughput

failures, stopping failures, and accuracy failures.

o LD: Safety is the ability of the system to deliver service under given use conditions with no catastrophic effect [Melhar00].

FD: Safety: index of the hazards (ISSUE) created by the system or a service (EVENT).

FA: The adjustments and extensions concern the Software component of the Framework:

Hazards are defined as system states that can lead to catastrophic effects.

Use condition is added as further Scope characterization.

o LD: Safety is the capability of the system to minimize the expected value of human life and health losses due to natural and adversarial causes [Boehm03].

FD: Safety: index of the hazards (ISSUE) created by the system or a service FA: The adjustments and extensions concern the Software component of the Framework:

Hazards are defined as system states that can lead to human life and health

losses.

Adverse conditions are defined as all natural causes.

Attacks are defined as adversarial causes.

From the previous analysis, it results that the UMD Hardware component is stable across the various dependability attributes definitions, whereas in all cases the adjustments and extensions concerned the Software component. In particular, such adjustments concern the definitions of some of the UMD items, such as the definitions of some types of failures, or the structure of some characterization. For example, by introducing the “operational profile description” as a further element of the scope’s characterization. Such extension allows for the accommodation of different definitions of dependability attributes where the use conditions or the operation profile for the system or a service are taken into account.

At this point, to complete our evaluation of the UMD robustness, we can take into account other attributes of dependability. As above, in the following, for each definition found in literature (Literature Definition – LD), we will provide the corresponding one expressed by using the framework (Framework Definition – FD), together with the necessary framework adjustments and extensions (FA).

o LD: Confidentiality is the absence of unauthorized disclosure of information [Laprie01]. (SCOPE), which are due or not due to external events (EVENT).

FA: The adjustments and extensions concern the Software component of the Framework:

Confidentiality failures are introduced and defined as unauthorized

disclosure of information.

o LD: Integrity is the absence of improper system state alteration [Laprie01]. (SCOPE), which are due or not due to external events (EVENT).

FA: The adjustments and extensions concern the Software component of the Framework:

Integrity failures are introduced and defined as improper system state

alteration.

o LD: Robustness is the degree to which a system or component can function correctly in the presence of invalid inputs or stressful environment conditions [Bruegge04].

FA: The adjustments and extensions concern the Software component of the Framework:

Adverse conditions are defined as invalid inputs or stressful environment

conditions.

o LD: Correctness system implementation precisely satisfies its requirements and/or design specifications [Boehm03]. FA: The adjustments and extensions concern the Software component of the Framework:

Correctness failures are introduced and defined as improper system state

alterations.

Again, the previous analysis reveals, that the Hardware component of the framework is stable across the various definitions of the various dependability attributes, whereas the Software component of the framework is flexible enough to accommodate the necessary adjustments and extensions.

2.3 UMD to capture stakeholders dependability needs

As illustrated in Figure 4, UMD permits the transfer of the stakeholders’ focus from the very abstract concepts of dependability and its attributes to the more concrete and manageable concepts of failure, hazard, event and scope.

abstract

level

concrete

level

Figure 4. Making the “dependability definition” problem more concrete Transfer of thedependability definitionproblem:Stakeholder deals withdependability at a moreconcrete level

So, while for a stakeholder it could be difficult to define dependability, or to provide a clear definition of what a dependability attribute means for a specific system, it could be easier to specify which failures the system should not have. Put another way, the UMD-Hardware component reduces the complexity of the problem, so that the stakeholders, rather than dealing with abstract entities (dependability and its attributes), can organize their thoughts about dependability by defining the characterization of the failures (e.g., failures types, and availability impact), and the hazards (e.g., hazards types and severity) that should not affect the system, together with the possible triggering external events (e.g., events types). So, for example, stakeholders will express their views of performance of the system, or a service, by specifying the characteristics of the performance failures that should be avoided. Similarly, stakeholders will define the system or service’s security, by specifying first what possible attacks the system could face, and then identifying the resulting failures that should be avoided. For example, a stakehoder could recognize as possible attacks the “denial of service attacks”, and then specify the characteristics of the possible resulting performance failures that the system should avoid. During this process, UMD-Software component provides stakeholders with useful guidance: they can use the characterization already available, or, whenever necessary, extend it with their own definitions or with other definitions available in literature. For example, a stakeholder could:

Use the same types of failures already present in the framework. That is, they could use the existing definition which states that a “performance failure is the departure of the system behavior from providing the desired static or dynamic capability”, and then declare that: “Service X should not manifest performance failures”.

Use the same types of failures but provide different definitions. That is, they could say that a “performance failure is a failure in response time or throughput“, and then state that: “Service X should not manifest performance failures”.

Introduce more specific failures types, that is, state that a “response time failure is when the system reacts too slowly to a user input”, and then that: “Service X should not manifest response time failures”.

Introduce templates for failures definitions (maybe designed for specific classes of systems). That is, stakeholders could introduce the template “response time failure is when the system fails responding within xx seconds”, then declare that “a response time failure is when the system fails responding within 2 seconds”, and finally state that “Service X should not manifest response time failures”.

As shown in Figure 5, UMD can be seen as an experience base that supports the stakeholders while building a specific system dependability model: a) the knowledge embedded in UMD can be customized and provides guidance while eliciting the specific context needs, then b) the new knowledge acquired while building the system dependability model can be extracted, and, finally, c) analyzed and packaged for reuse in UMD.

Figure 5. Building dependability knowledge

2.4 Measuring dependability

Up to now, we have used the UMD to build qualitative definitions of dependability, or, in other terms, to specify the failures and the hazards that we do not want to occur. Although useful, this is only partly valuable, given that failures and hazards will always be likely to happen. For this reason, it is important to introduce the possibility of measuring dependability, allowing the stakeholders not only to identify the undesired failures and hazards for the system or a specific service, but also to quantify what they assume could be tolerable corresponding manifestations. Here, we want to extend the framework to enable the stakeholder to quantify their dependability needs, i.e. to express a measure of dependability.

Built around the elementary concept of issue, the framework can easily address such a need. It is, in fact, possible to introduce the concept of measure as another basic item of the framework, an item whose value defines the manifestation of the Issue.

The resulting framework is illustrated in Figure 6, where the concept of measure has been added to the Hardware component, and its characterization, defining the possible different kinds of measures that the stakeholder can use, has been added to the Software component. As example, we used the following measures types:

Time-based (probabilistic) measures, such as Mean Time to Failure (MTTF),

Probability of Occurrence (in next time unit or transaction).

Absolute measures, such as number of occurrences (in a given time frame).

Ordinal measures, for example by introducing an ordinal scale such as “very

rarely”/”rarely”/”sometimes”.

Thus, structured as in Figure 6, UMD allows the stakeholder to specify, for the whole system or a specific service, the acceptable manifestation of a specific failure or class of failures, together with the triggering events that can cause it (when applicable).

So, for example, stakeholders will express their views of performance of the system, or a service, by specifying the characteristics of the performance failures and then specifying the tolerable manifestation of such failures. In particular, by extending the example in the previous Section, stakeholders will not simply state that “Service X should not manifest response time failures”, but, more precisely, can say that “Response time failures could

be tolerated for Service X when MTTF is greater than 1000 hours”, where a response time failure is when the system fails responding within 2 seconds.

characterization:

- Type

- Adverse Condition - Attack - UpgradesHAZARD

characterization:

- Type

- User(s) hazard

- Environment hazard

Figure 6. Introducing a “measure” in UMD

2.5 Enhancing UMD: capturing the “System Reaction”

Up to now, we have used UMD to specify “negative” non-functional requirements [9], i.e. to specify undesired system’s behaviors, as a whole, or while delivering specific services. Here, we want to extend the framework to enable the stakeholder to provide ideas about means to improve dependability. In other terms, to enable stakeholders, while expressing their views of dependability in terms of acceptable manifestations of failures and hazards, to specify also how the system should behave to be more dependable from their point of view.

UMD can easily address such a need. It is in fact possible to introduce the concept of reaction as another basic item, through which the stakeholder can describe the desired system behavior in case of occurrence of the issue. The resulting, and final structure for UMD is illustrated in Figure 7. Again, while the concept of reaction has been added to the Hardware component, its characterization has been added to the Software component. As example, we adopted the following Reaction Types:

Warning Services: to warn users about what happened or is happening (the issue); Mitigation Services: to reduce the impact of the issue on the users (e.g., a word processor should save the open files if a crash occurs);

Alternative Services: to help users to carry on their tasks regardless of the issue (e.g., for a PC, if the floppy drive does not work, the users can export data via email);

Guard Services: to act as guard against the issue, i.e. may reduce the probability of occurrence (e.g., to add an extra password to reduce probability of security breaches). This idea can be extended to capture any possible suggestion the stakeholder can have to prevent the issue from happening: suggestions about modifications of existing services, design changes, or new technologies;

Recovery Behavior: the time necessary to recover from the issue (e.g., expressed as Mean Time to Recover - MTTR) and the required intervention (e.g., user or technician intervention). As already discussed in Section 2.2, with the possibility of expressing the desired system behavior during maintenance, UMD covers all the aspects normally embraced by the definitions of maintainability available in literature.

HAZARD

characterization:

- Type

- User(s) hazard

- Environment hazard

Figure 7. Capturing “system reaction” in UMD

So, for example, by using the framework, stakeholders will express their views of performance of the system, or a service, by specifying the characteristics of the performance failures, specifying the tolerable manifestation of such failures, and then the desired system’s behavior in case of failure. In particular, by extending the example introduced in the previous Section, the stakeholder will not only state that “Response time failures could be tolerated for Service X when MTTF greater than 1000 hours”, but also that if a response time failure occurs, the system should provide: (a) Warning Service: “the request should be rejected and an apology should be given to the user”, and (b) a Mitigation Service: “different options should be provided to the user indicating the best time to try again”.

2.6 The UMD Tool

A Web-based tool that implements UMD has been developed. The two main table frames offered by the tool to collect data from the stakeholders are:

The Table “Scope” (see Figure 9) allows the stakeholder to identify all the services of the system for which dependability could be of concern. For the system and each identified service, the stakeholder has to provide an identifier (left column), and a brief description (right column).

The Table Frame “Issue” (see Figures 10 and 11) allows the users to specify their dependability needs by selecting and defining potential issues (failures and/or hazards), their tolerable manifestations, the possible triggering events, and the desired system reactions, for the whole system or a specific service.

3 Applying UMD to build a System Dependability Model

This Section shows how UMD can be customized to a specific system/project to obtain a dependability model that can be used as the operational dependability definition of the system. A case study is used for illustration.

3.1 The case study – TSAFE

The Tactical Separation Assisted Flight Environment, or TSAFE, is a tool designed to aid air traffic controllers in detecting and resolving short-term conflicts between aircraft. To introduce the case study, we present the following extract from [Dennis03]:

“In today’s Air Traffic Control (ATC) system, air traffic controllers are primarily responsible for maintaining aircraft separation. Controllers accomplish this by surveilling radar data for potential conflicts and issuing clearances to pilots to alter their trajectories accordingly. Ground-based automated tools play only a supportive role in this process. Under this current system, the airspace within the United States operates at only half its potential capacity. Experience has shown controllers’ workload limits to be the fundamental limiting factor in the current model. (……) exploiting the full airspace capacity requires a new paradigm, the Automated Airspace Concept (AAC).

Under the AAC framework, automated mechanisms would play a primary role in maintaining aircraft separation. Aircraft would remain in direct connection with a round-based automated system, which would transmit conflict alerts and air traffic control clearances via a persistent two-way data link. By shifting much of the responsibility of aircraft separation from controllers to automated systems, AAC will allow controllers to focus more on long-term strategic traffic management, and thereby allow for a safe increase in the volume of aircraft per sector.

The role of TSAFE is as an independent monitor of this AAC Computing System. It is to act as a reliable safety net—a last line of defense against inevitable imperfections in the AAC model. Its job is to probe for short-term conflicts and issue avoidance maneuvers

accordingly. As an independent monitor, it must sit outside of the primary system, on separate hardware, and on a separate software process, yet be privy to the same data as the AAC.

TSAFE differs in purpose and functionality from the existing conflict avoidance systems CTAS and TCAS. Whereas CTAS performs long-term conflict prediction on the order of 20-40 minutes ahead, and whereas TCAS detects conflicts only seconds away, TSAFE is intended to detect conflicts somewhere between 3 and 7 minutes in the future. Because TCAS operates on the order of seconds, it only considers aircraft state information–velocities, headings, altitudes, etc. TSAFE and CTAS, on the other hand, must also take intent information into account, including flight routes, cruise altitudes, and cruise speeds. But due to TSAFE’s shorter time horizon, its algorithms must be simpler and less computationally intensive than those of CTAS”. TSAFE provides the air traffic controller with a graphical representation of the flight conditions (position, planned route, forecast route) and of the status (conformance or not conformance with planned route) of the flights within a selected geographical area. A snapshot of the TSAFE display is given in Figure 11.

Figure 11 – TSAFE display - example

The main functionalities provided by TSAFE are described in the following table: Display current aircraft position

and signal route conformance Display a dot on the map to show current aircraft position. The A/C dot color is either white/red/yellow, depending on

conformance/non-conformance/absence of planned route (flight plan).

The air traffic controller can select flights to be displayed and the

conformance parameters.

Display aircraft planned route

Display aircraft synthesized route

Conflict Detection & Warning*

Conflict Avoidance Maneuvers* Display the aircraft planned route (flight plan). Color is blue. The air traffic controller can select flights to be displayed. Display the synthesized aircraft route. Color is pink.

The air traffic controller may select synthesized routes to be displayed Probe along the synthesized routes, searching for points at which two flights break legal separation. Provide timely and reliable warnings to controllers should any imminent loss of separation be detected. The conflict warnings are relayed to the controller in the form of visual and aural signals. danger for about three minutes, in response to high-risk conflict warnings.

The conflict warnings are relayed to the controller in the form of visual and

aural signals.

(Note) * Functionality not implemented by the TSAFE version in [Dennis03]

3.2 Data Gathering

For the case study, a small group of computer science researchers and students acted as stakeholders (specifically as air traffic controllers), after being given a short introduction to TSAFE and its purposes. The aim of this initial case study was in fact to evaluate the feasibility of the suggested approach, rather than identifying the correct dependability requirements for TSAFE. However, in order to better evaluate the UMD tool capabilities, and represent real-life situations during which the stakeholders might be unfamiliar with automatic tools, all the acting stakeholders have interacted with the UMD tool through an analyst.

Figure 9. The UMD Tool “Scope” table

本文来源：https://www.bwwdw.com/article/hai4.html

相关文章：

2011年432统计学真题及答案10-12

LNF15型砂辊碾米机的设计105-02

利用分解素因数法解题01-16

北师大版四年级语文上册表格式教案 - 图文07-03

高三化学：1.2《元素周期律》教案（第2课时）(新人教版必修2)09-21

面试题目100及最佳答案【面试问题】05-02

童年二三事作文700字07-04

苏教版二年级语文下册备课 - 图文05-30

钢结构施工方案05-31

上一篇：黑白识趣——比亚兹莱下一篇：ABS塑料件检验标准2014.02.28