0% found this document useful (0 votes)
85 views9 pages

Dependability: Dependability Proper Improper Failure Restoration

The document discusses dependability as the ability of a system to deliver specified service. A system provides proper service if the service is delivered as specified, otherwise it is improper. System failure occurs when transitioning from proper to improper service. Dependability is measured through availability, reliability, safety, and other metrics.

Uploaded by

Wesal Refat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views9 pages

Dependability: Dependability Proper Improper Failure Restoration

The document discusses dependability as the ability of a system to deliver specified service. A system provides proper service if the service is delivered as specified, otherwise it is improper. System failure occurs when transitioning from proper to improper service. Dependability is measured through availability, reliability, safety, and other metrics.

Uploaded by

Wesal Refat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Dependability

• Dependability is the ability of a system to deliver a specified service.


• System service is classified as proper if it is delivered as specified; otherwise it
is improper.
• System failure is a transition from proper to improper service.
• System restoration is a transition from improper to proper service.

failure
proper improper
service service
restoration

⇒ The “properness” of service depends on the user’s viewpoint!

Reference: J.C. Laprie (ed.), Dependability: Basic Concepts and Terminology,


Springer-Verlag, 1992.

ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module 1, Slide 1
permission of the author.
Examples of Specifications of Proper Service

• k out of N components are functioning.


• every working processor can communicate with every other working processor.
• every message is delivered within t milliseconds from the time it is sent.
• all messages are delivered in the same order to all working processors.
• the system does not reach an unsafe state.
• 90% of all remote procedure calls return within x seconds with a correct result.
• 99.999% of all telephone calls are correctly routed.

⇒ Notion of “proper service” provides a specification by which to evaluate a


system’s dependability.

ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module 1, Slide 2
permission of the author.
Dependability Concepts
• Measures - properties expected from • Impairments - causes of
a dependable system undependable operation
– Availability – Faults
– Reliability – Errors
– Safety – Failures
– Confidentiality
– Integrity
– Maintainability
– Coverage
• Means - methods to achieve
dependability
– Fault Avoidance
– Fault Tolerance
– Fault Removal
– Dependability Assessment

ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module 1, Slide 3
permission of the author.
Faults, Errors, and Failures can Cause Improper Service

• Failure - transition from proper to improper service


• Error - that part of system state that is liable to lead to subsequent failure
• Fault - the hypothesized cause of error(s)

Module 1 Fault Error Failure Fault Error Failure

Module 2 Fault Error Failure

Module 3 Fault Error Failure

ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module 1, Slide 4
permission of the author.
Dependability Measures: Availability
Availability - quantifies the alternation between deliveries of proper and improper
service.
– A(t) is 1 if service is proper at time t, 0 otherwise.

– E[A(t)] (Expected value of A(t)) is the probability that service is proper at


time t.

– A(0,t) is the fraction of time the system delivers proper service during [0,t].

– E[A(0,t)] is the expected fraction of time service is proper during [0,t].

– P[A(0,t) > t*] (0 ≤ t* ≤ 1) is the probability that service is proper more than
100t*% of the time during [0,t].

– A(0,t)t→∞ is the fraction of time that service is proper in steady state.

– E[A(0,t)t→∞], P[A(0,t)t→∞ > t*] as above.

ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module 1, Slide 5
permission of the author.
Other Dependability Measures
• Reliability - a measure of the continuous delivery of service
– R(t) is the probability that a system delivers proper service throughout [0,t].

• Safety - a measure of the time to catastrophic failure


– S(t) is the probability that no catastrophic failures occur during [0,t].
– Analogous to reliability, but concerned with catastrophic failures.

• Time to Failure - measure of the time to failure from last restoration. (Expected
value of this measure is referred to as MTTF - Mean time to failure.)

• Maintainability - measure of the time to restoration from last experienced


failure. (Expected value of this measure is referred to as MTTR - Mean time to
repair.)

• Coverage - the probability that, given a fault, the system can tolerate the fault
and continue to deliver proper service.

ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module 1, Slide 6
permission of the author.
Illustration of the Impact of Coverage on Dependability
• Consider two well-known architectures: simplex and duplex.

λ
λ
λ
Simplex System
Duplex System

• The Markov model for both architectures is:


2cλ
1 2 1
µ
λ 2( λ
1-
c)
λ

• The analytical expression of the MTTF can be calculated for each architecture
using these Markov models.

ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module 1, Slide 7
permission of the author.
Illustration of the Impact of Coverage, cont.
• The following plot shows the ratio of MTTF (duplex)/MTTF (simplex) for
different values of coverage (all other parameter values being the same).
• The ratio shows the dependability gain by the duplex architecture.

1E+04
c=1
1E+03 c = 0.999

1E+02 c = 0.99

1E+01 c = 0.95

1E+00
1E-04 1E-03
1E-02
&'#
Ratio of failure rate to repair rate $ !
%µ"
• We observe that the coverage of the detection mechanism has a significant
impact on the gain: a change of coverage of only 10-3 reduces the gain in
dependability by the duplex system by a full order of magnitude.
ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module 1, Slide 8
permission of the author.
Failure Sources and Frequencies
Non-Fault-Tolerant Systems Fault-Tolerant Systems
– Japan, 1383 organizations – Tandem Computers (Gray 1990)
(Watanabe 1986, Siewiorek & – Bell Northern Research (Cramp et al.
Swarz 1992) 1992)
– USA, 450 companies (FIND/SVP
1993)
Mean time to failure:
Mean time to failure: 6 to 12 weeks 21 years (Tandem)
Average outage duration after failure:
1 to 4 hours
Failure Sources:
Hardware
10% 10% 8%
Software 7%
15%
50%
Communications
25% Environment 65%
Operations-
Procedures
ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module 1, Slide 9
permission of the author.

You might also like