Dependable and Secure Computing Concepts
Dependable and Secure Computing Concepts
References
Algirdas Avizienis, Fellow, IEEE, Jean-Claude Laprie, Brian Randell, and Carl Landwehr. Basic Concepts and Taxonomy of Dependable and Secure Computing, in IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 1, NO. 1, JANUARY-MARCH 2004
Threads to Dependability
Fault.
The cause of a failure is a fault that ranges from specification and design defects to physical or human factors.
Error.
An error is a design flaw or a deviation from the desired or intended state of a system.
Failure.
A failure is defined as the manner in which a component, subsystem, or system could potentially fail to meet or deliver the intended function.
Effect.
The effect is the actual consequence of a system behavior in the presence of a failure.
Dependability Attributes
Availability
readiness for correct service.
mean time to failure / mean time to failure + mean time to repair
Reliability
continuity of correct service.
mean time to failure
Safety
absence of catastrophic consequences on the user(s) and the environment.
Integrity
absence of improper system alterations.
Maintainability
ability to undergo modifications and repairs.
projecting failure modes in software development is used to establish a fault hypothesis and estimate the presence of faults, the future incidence, and the likely consequences of faults.
Fault prevention / avoidance
tries to identify complex structures that are likely to become a source of faults.
Fault tolerance
Fault projection
Anticipate potential scenarios of failure as soon as possible Focus on high-risk components
Identification of Failure Modes, Effects, Causes, Design Controls Assessment with Risk Priority Number (RPN)
Detection
Severity
RPN
Occurrence
Fault avoidance
Fault avoidance
use of formal methods, semi-formal methods, structured methods and object-oriented methods.
Fault Removal
Software does not wear out over time. It is therefore reasonable to assume that as long as errors are uncovered reliability increases for each error that is eliminated. The failure rate of software decreases when errors are removed. However, new errors are introduced when the software is changed.
10
11
12
Structural check
internal data structure is as it should be
Coding checks
E.g. with checksums
13
Damage confinement
Error detected
Prevent error from propagating through the system
Firewalls
Design firewalls into the system to ensure that no information flow takes place across the walls.
14
15
Error recovery
Backward recovery
system state is restored to an earlier state, hoping that the earlier state is error-free.
16
Error recovery
Independent checkpointing
Domino effect
17
Error recovery
Forward recovery
no previous state is available. Instead the system attempts to go forward trying to make the system error-free by taking corrective actions
18
redundant designs
19
Redundant design
Information Redundancy:
For example, checksums or double-linked lists are/make use of redundant information. Data structures that make use of redundant information are usually referred to as robust data structures. If, for example, a double linked list is used and one link is corrupted, the list can be regenerated using the other link.
Time Redundancy
Redundancy in time can be realized for example, by allowing a function to execute again if a previous execution failed.
Physical Redundancy
Redundancy in space is called replication. The concept is founded on the assumption that parts that are replicated fail independently. A common use of replication is for example, to use several sensors, networks or computers in parallel.
Model Redundancy
Model-based redundancy uses properties of a known model, e.g., physical laws. If for example, a revolution counter for a wheel, in a four wheel drive vehicle fails, it is possible to estimate the revolution speed based on the other wheels speeds.
20
10
Faults
21
Failures
Content failures
The content of the information delivered at the service interface (i.e., the service content) deviates from implementing the system function.
Timing failures
The time of arrival or the duration of the information delivered at the service interface (i.e., the timing of service delivery) deviates from implementing the system function.
Halt failure
or simply halt when the service is halted (the external state becomes constant, i.e., system activity, if there is any, is no longer perceptible to the users)
Erratic failures
i.e., when a service is delivered (not halted), but is erratic (e.g., babbling).
22
11
Failure Consistency
consistent failures
The incorrect service is perceived identically by all system users.
inconsistent failures
Some or all system users perceive differently incorrect service (some users may actually perceive correct service); inconsistent failures are usually called, after, Byzantine failures.
23
Development Failure
Budget failure.
The allocated funds are exhausted before the system passes acceptance testing.
Schedule failure
The projected delivery schedule slips to a point in the future where the system would be technologically obsolete or functionally inadequate for the users needs.
Downgrading
The developed system is delivered with less functionality, lower performance, or is predicted to have lower dependability or security than was required in the original system specification.
24
12
25
theorem proving;
on a model of the system behavior, where the model is usually a state-transition model (Petri nets, finite or infinite state automata), leading to model checking.
26
13
Failure severity
for availability
the outage duration
for safety
the possibility of human lives being endangered;
for confidentiality
the type of information that may be unduly disclosed
for integrity
the extent of the corruption of data and the ability to recover from these corruptions.
27
14