Chapter 5
Chapter 5
Computer Reliability
1
Learning Objectives
1. Introduction
2. What is Software Reliability ?
3. Software reliability and Hardware reliability
4. Need for software reliability measurement
5. Increasing reliability
6. Software Metrics for Reliability
7. Two Kinds of Data-related Failure
2
Introduction
• Computer systems are sometimes unreliable
– Erroneous information in databases
– Misinterpretation of database information
– Malfunction of embedded systems
• Effects of computer errors
– Inconvenience
– Bad business decisions
– Injuries or Fatalities
3
What is Software Reliability ?
• According to ANSI, “Software Reliability is
defined as the probability of failure-free software
operation for a specified period of time in a specified
environment”.
5
Hardware reliability
• In Hardware reliability , in the first phase of the
manufacturing , there may be a high number of faults.
• But after discovering and removing faults this number may
decrease and gradually in the second phase (Useful life) ,
there exists only a few number of faults.
• After this phase , there will be wear out phase in which , the
physical component wear out due to the time and usage and
the number of faults will again increase.
Burn In Useful Wear out
Life
8
Distinct Characteristics of
Software and Hardware
• Software is not manufactured- A software is developed it is not
manufactured like hardware. It depends upon the individual skills
and creative abilities of the developer which is very difficult to
specify and even more difficult to quantify and virtually
impossible to standardize.
• Time dependency and life cycle- Software reliability is not a
function of operational time. But it is applicable on hardware
reliability.
• Environmental Factors- Environment factors do not affect
software reliability, but it affect to the hardware.
9
Need for software reliability
measurement
• In any software industry , system quality plays an
important role.
• We know that hardware quality is constantly
high .So if the system quality changes , it is because
of the variation in software quality only.
• Software quality can be measured in many ways.
Reliability is an user – oriented measure of
“software quality”.
10
Need for software reliability
measurement
• As an Example, assume that there are 3 programs
that are executing to solve a problem.
• By finding the reliability of each program we can
find which program has less reliability and we can
put more effort to modify that program to improve
the overall reliability of the system.
• So always there is a need to measure the reliability.
11
Increasing reliability
• Reliability can be increased by preventing the above said errors
and developing quality software through all of the stages of
software life cycle. To do this,
– We have to ensure that whether the requirements are clearly specifying
the functionality of the final product or not. (Requirement phase)
– Among the phases of the software reliability , the second one i.e useful
life is the most important one and so the software product must be
maintained carefully. So we have to ensure that the code generated can
support maintainability to avoid any additional errors. (Coding phase)
12
Increasing reliability
13
Software Metrics for Reliability
14
Requirements Reliability Metrics
• Requirements indicate what features the software must contain.
• So for this requirement document, a clear understanding between client
and developer should exist. Otherwise it is critical to write these
requirements .
• The requirements must contain valid structure to avoid the loss of
valuable information.
• Next , the requirements should be thorough and in a detailed manner
so that it is easy for the design phase.
• The requirements should not contain inadequate information .
15
Requirements Reliability Metrics
• Next one is to communicate easily .There should not be
any ambiguous data in the requirements. If there exists
any ambiguous data , then it is difficult for the developer
to implement that specification.
• Requirement Reliability metrics evaluates the above said
quality factors of the requirement document.
16
Design and Code Reliability Metrics
• The quality factors that exists in design and coding plan are
complexity , size and modularity.
• If there exists more complex modules, then it is difficult to
understand and there is a high probability of occurring errors.
So complexity of the modules should be less.
• Next coming to size, it depends upon the factors such as total
lines, comments, executable statements etc.
• According to SATC , the most effective evaluation is the
combination of size and complexity.
17
Design and Code Reliability Metrics
• The reliability will decrease if modules have a
combination of high complexity and large size or high
complexity and small size. In the later combination also
the reliability decreases because , the smaller size results
in a short code which is difficult to alter.
• These metrics are also applicable to object oriented code ,
but in this , additional metrics are required to evaluate the
quality.
18
Testing Reliability Metrics
• Testing Reliability metrics uses two approaches to
evaluate the reliability.
• First, it ensures that the system is fully equipped with the
functions that are specified in the requirements. Because
of this, the errors due to the lack of functionality
decreases .
• Second approach is nothing but evaluating the code ,
finding the errors and fixing them.
19
Basic Reliability Metrics
• Some reliability metrics which can be used to quantify the
reliability of the software product are discussed below:-
• MEAN TIME TO FAILURE (MTTF) The first metric
that we should understand is the time that a system is not
failed, or is available. Often referred to as “uptime” in the
IT industry, the length of time that a system is online
between outages or failures can be thought of as the “time to
failure” for that system.
20
Basic Reliability Metrics
• For example, if I bring my RAID array online on Monday at
noon and the system functions normally until a disk failure
Friday at noon, it was “available” for exactly 96 hours.
• If this happens every week, with repairs lasting from Friday
noon until Monday noon, I could average these numbers to
reach a “mean time to failure” or “MTTF” of 96 hours.
• I would probably also call my system vendor and demand that
they replace this horribly unreliable device.
21
Basic Reliability Metrics
• MEAN TIME BETWEEN FAILURE (MTBF) We can
combine MTTF &MTTR metrics to get the MTBF metric.
• MTBF = MTTF + MTTR
22
Basic Reliability Metrics
• MEAN TIME TO REPAIR (MTTR)
23
24
Basic Reliability Metrics
• RATE OF OCCURRENCE OF FAILURE (ROCOF)
25
Basic Reliability Metrics
• PROBABILITY OF FAILURE ON DEMAND (POFOD)
• POFOD is defined as the probability that the system will fail when a
service is requested. It is the number of system failures given a number
of systems inputs.
• POFOD is the likelihood that the system will fail when a service request
is made.
• A POFOD of 0.1 means that one out of a ten service requests may result
in failure. POFOD is an important measure for safety critical systems.
POFOD is appropriate for protection systems where services are
demanded occasionally.
26
Basic Reliability Metrics
• AVAILABILITY (AVAIL)
• Availability is the probability that the system is available for use at a
given time. It takes into account the repair time & the restart time for the
system.
• An availability of 0.995 means that in every 1000 time units, the system
is likely to be available for 995 of these.
• The percentage of time that a system is available for use, taking into
account planned and unplanned downtime. If a system is down an
average of four hours out of 100 hours of operation, its AVAIL is 96%.
27
Two Kinds of Data-related Failure
28
Disfranchised Voters
• November 2000 general election
• Florida disqualified thousands of voters
• Reason: People identified as felons
• Cause: Incorrect records in voter database
• Consequence: May have affected outcome
of national presidential election
29
False Arrests
• Sheila Jackson Arrested and spent five days in detention
mistaken for Shirley Jackson
30
Accuracy of NCIC Records
• March 2003: Justice Dept. announces FBI not
responsible for accuracy of National Crime
Information Center (NCIC) information
31
Dept. of Justice Position
• Impractical for FBI to be responsible for data’s
accuracy
• Much information provided by other law
enforcement and intelligence agencies
• Agents should be able to use discretion
• If Privacy Act strictly followed, much less
information would be in NCIC
• Result: fewer arrests
32
Position of Privacy Advocates
• Number of records is increasing
• More erroneous records more false
arrests
• Accuracy of NCIC records more important
than ever
33
Errors When Data Are Correct
• Assume data correctly fed into
computerized system
• System may still fail if there is an error in
its programming
34
Errors Leading to System
Malfunctions
• Qwest sent incorrect bills to cell phone customers
• Faulty The United States Department of Agriculture
(USDA) beef price reports
• U.S. Postal Service returned mail addressed to Patent
and Trademark Office
• New York City Housing authority overcharged renters
• About 450 California prison inmates mistakenly
released
35
Errors Leading to System
Failures
• Ambulance dispatch system in London
• Japan’s air traffic control system
• Comair’s Christmas Day shutdown (The 2004 crash
of a critical legacy system at Comair is a classic risk
management mistake that cost the airline $20 million
and badly damaged its reputation)
• NASDAQ stock exchange shut down
• Insulin pump demo at Black Hat conference
36
Comair Cancelled All Flights on
Christmas Day, 2004
37
Analysis: E-Retailer Posts Wrong
Price, Refuses to Deliver
• Amazon.com in Britain offered iPad for £7
instead of £275
• Orders flooded in
• Amazon.com shut down site, refused to deliver
unless customers paid true price
• Was Amazon.com wrong to refuse to fill the
orders?
38