Chapter 3
Reliability, Maintainability, and Availability of
Facilities
and
Chapter 4
The Assessment and Control of Product Reliability
Definition, Quality versus Reliability; Failure (Hazard) rate, basic
derivations, MTBF, MTTF, Bathtub curve or mortality curve, availability,
maintainability, system effectiveness, graphic evaluation.
1
Introduction
Until the 1960s, quality targets were deemed to have been reached
when the item considered was found to be free of defects or
systematic failures at the time it left the manufacturer.
The growing complexity of equipment and systems, as well as the
rapidly increasing cost incurred by loss of operation as a consequence
of failures, have brought to the forefront the aspects of reliability,
maintainability, availability, and safety.
The expectation today is that complex equipment and systems are not
only free from defects and systematic failures at time t = 0 (when they
are put into operation), but also perform the required function failure
free for a stated time interval and have a fail-safe behavior in case of
critical or catastrophic failures.
2
Definition
If n statistically identical and independent items are put into operation
at time t = 0 to perform a given mission and ν ≤ n of them accomplish
it successfully, then the ratio ν / n is a random variable which
converges for increasing n to the true value of the reliability.
Reliability is a characteristic of the item, expressed by the probability
that it will perform its required function under given conditions for a
stated time interval.
Qualitatively, it is the ability of the item to remain functional.
Quantitatively, it is probability that no operational interruptions will
occur during a stated time interval
3
Definition
On the basis of it, the following characteristics of reliability can be
traced.
1. It is expressed as probability. Hence it can vary from zero to one.
Probability values near to 1 imply higher reliability.
2. The required/intended function specifies the item's task. It is
starting point for any reliability analysis, as it defines failures.
3. The operating conditions. Experience shows, for instance, that the
failure rate of semiconductor devices will double for operating
temperature increase of 10 to 20°C.
4. The time period for which the devise has to work satisfactory. It
can be expressed in operating hours or days or months or years.
4
History of Reliability
• As a discipline it may be traced back to the 1930s when probability
concepts were applied to electric power generation-related
problems.
• During World War II, Germans used basic reliability concepts to
improve reliability of their V1 and V2 missiles.
• In 1954, a National Symposium on Reliability and Quality Control
was held for the first time in the United States.
• In 1956, the first commercially available book, entitled Reliability
Factors for Ground Electronic Equipment, was published.
5
History of Reliability
• In 1962, a graduate degree program in reliability engineering was
started by the Air Force Institute of Technology, Dayton, Ohio.
• Today, there are many publications available on the discipline of
reliability, and each year many conferences are held around the world
that deals directly or indirectly with the field. In addition, many
academic institutions offer programs in reliability engineering.
6
Reliability and Quality
Reliability and quality go hand-in-hand and are complimentary to each
other.
• The quality of a product is the degree of conformance to
specifications or standards. It is not concerned with the elements of
time and environment.
• Reliability is the ability of a product to maintain its quality for
specified period of time under specified working conditions.
• Another difference between quality and reliability is it is possible to
build a reliable complex system by using less reliable components
(by using redundant components) where as it is not possible to
improve quality once it has been produced (unless it is reworked)
7
Objectives of Reliability
The objectives of Reliability are to ensure
1. Trouble free running of equipment
2. Adequate performance for specified period of time
3. The equipment works under specified conditions.
4. Minimization of down time of equipment.
5. Maintainability of equipment.
8
Failure and Failure (hazard) rate
Failure: It is the inability of an item to perform its required function.
Some components fail suddenly and some components may fail
gradually.
Failure rate: It is the frequency with which a component fails.
The failures occurring over a time interval can be termed as failure
rate. It is the average number of failures per unit time and is denoted
as λ
9
Basic Derivations: MTTF, MTBF
Let us assume that n statistically identical, new, and independent
items are put into operation at time t = 0, under the same conditions,
and at the time t a subset of these items have not yet failed.
is a right continuous decreasing step function. t1 , ..., tn, measured
from t = 0, are the observed failure-free times (operating times to
failure) of the n items considered.
(3.1)
is the empirical mean (empirical
expected value) of τ.
For converges to the true mean E[τ] = MTTF
The function is the empirical reliability function, which
converges to R( t ) for n →∞.
10
Basic Derivations: MTTF, MTBF
Mean time to failure (MTTF) is used when the components are
irreparable. When n number of items are tested until all fails, if Ti is
the time at which the ith item fails, then
(3.2)
Mean time between failures (MTBF) is used to indicate the frequency
of failure for repairable items. If a component fails after T1 period,
after repair and put into operation, if it fails after T2 period, after
repair and put into operation, if it fails after T3 period … Then,
(3.3)
The relation between failure rate and MTBF
(3.4)
11
Failure rate curve (Bathtub curve)
In reliability analysis of engineering systems it is often assumed that
the hazard or time-dependent failure rate of items follows the
shape of a bathtub as shown below.
The curve shown in the figure below has three distinct regions:
1. Burn-in period (Early Failures)
2. Useful life period, and
3. Wear-out period
12
Failure rate curve (Bathtub curve)
Burn-in period
The burn-in period also known as mortality period, break-in period,
or debugging period. The failures occurring in this period are due to
poor design, poor manufacturing, poor quality control, poor
debugging, human error, and substandard material and
workmanship.
Useful life period
The hazard rate is constant, minimum and the failures occurs
randomly or unpredictably. Some of the causes of failures in this
region include insufficient design margins, incorrect use of
environments, undetectable defects, human error and, unavoidable
failures.
13
Failure rate curve (Bathtub curve)
Wear-out period
It begins when the item passes its useful life period. During this
period the hazard rate increases. Some causes for the occurrence
of wear-out region failures are wear due to aging and friction,
inadequate or improper preventive maintenance, limited-life
components, misalignments, corrosion and creep, and incorrect
overhaul practices.
The failures can be reduced significantly by executing effective
replacement and preventive maintenance policies and procedures.
14
Reliability Measures
The reliability of an item can be obtained by using the following
equations
R(t) 1 - F(t) 1 - f(t) dt (3.5)
Where
R (t) = reliability at time
F (t) = cumulative distribution function,
f (t) = failure density function.
R(t) e-(t)dt (3.6)
Where (t) = hazard rate or time-dependent failure rate.
In general, Reliability can be reduced to the equation
R(t) e-t
(3.7)
15
Example 1:
Consider a component that has a MTTF of 1000 hours. Find its
reliability for 100, 1000 and 2000 hours and offer your
comments.
Example 2:
Determine the reliability of an equipment having a MTBF of 50
hours for an operating period of 45 hours. If the reliability has to
be improved by 20% what percent change in MTBF is required?
16
System Reliability
The reliability evaluation of most standard systems/networks are
important in engineering systems.
When two or more components are used to form a system, the
components can be arranged in series or in parallel.
Components are in series: Let three components A, B, and C with
reliabilities RA, RB and RC are arranged in series, then the system
reliability is
RA RB RC
(3.8)
If any one of the units fails, the system fails. All system units must
work normally for successful operation of the system.
Generally, (3.9)
where Ri is the unit i reliability, for i = 1, 2, 3, … m 17
System Reliability
The system reliability can be approximated by using
(3.10)
For identical units (i.e. Ri = R),
(3.11)
The failure rate of a system is the sum of the failure rates of all the
components if they are arranged in series
(3.12)
18
System Reliability
Components are in parallel: Let three components A, B and C with
reliabilities RA, RB and RC are arranged in parallel, then the system
reliability is
(3.13)
Generally, Rs = 1 - Fs = (3.14)
19
Example 3:
Assume that an automobile has four independent and identical tires. The tire
reliability is 0.95. If any one of the tires is punctured, the automobile cannot be
driven. Calculate the automobile reliability with respect to tires.
Example 4:
A step down transformer, rectifier and filter comprise a series system with the
following failures.
Transformer =1.56 % failures/10000 hours,
Rectifier = 2 % failures/10000 hours
Filter = 1.7 % failures/10000 hours
The equipment has to operate for 1500 hours. What is the probability of not
survival of the system?
Example 5:
A computer has two independent and identical CPUs operating simultaneously.
At least one CPU must operate normally for the computer to function
successfully. If the CPU reliability is 0.96, calculate the computer reliability with
respect to CPUs.
20
Concept of Standby Redundancy
Redundant systems consist of two or more components connected in
parallel. When one fails, the required function is performed by
another (called as redundant component), thus improving the
reliability.
In some systems, the redundant components may not be
continuously operating but remain in the system in a stand by mode.
When the main component fails, the redundant component is
switched on.
Standby Redundancy is more appropriate for mechanical systems
such as motors and pumps.
21
Problem: Consider the following arrangement of components; three
transformers each with reliability of 0.95, two rectifiers each
with reliability of 0.94, and four filters each with reliability
of 0.90. Determine the reliability of the system.
C
A
B C
A
C
B
A
C
22
Maintainability
• No equipment/system can be perfectly reliable for very large period
in spite of the designer’s best effort. It is bound to fail during its
operation which some times are dangerous and costly.
• Maintenance therefore becomes important consideration in long
term performance of the equipment. The system requires preventive
maintenance to eliminate or slow down failures during its operation.
• Maintenability is the characteristic of an equipment design and
installation which is expressed in terms of ease and economy of
maintenance, availability and safety of equipment.
• Maintainability can be defined as the probability that a failed
equipment is restored to operable condition in a specified time
(called as down time) when maintenance is performed under stated
conditions.
23
Maintainability
• Maintainability is a design parameter intended to reduce repair
time, as opposed to maintenance, which is the act of repairing or
servicing an item or equipment.
• Maintainability Engineering: An application of Scientific knowledge
and skill to develop equipment / item that is inherently be able to
be maintained as measured by favorable maintenance
characteristics.
• Maintainability function: A plot of the probability of repair within
a time given on the y-axis, against maintenance time on the x-axis
and is useful to predict the probability that repair will be
completed in a specified time.
24
Objectives of Maintainability
• To maximize equipment and facility availability.
• To reduce predictive maintenance time and costs by
simplifying maintenance through design.
• To determine labor-hours and other resources needed to
perform the projected maintenance.
• To use maintainability data to determine item
availability/unavailability.
25
Where M = Maintainability i.e., the probability of repairing in time T
T= Max allowable time to repair
μ= maintenance action rate or average no of maintenance actions per period
of time
𝑅𝑒𝑝𝑎𝑖𝑟𝑒𝑑 𝐼𝑡𝑒𝑚𝑠
𝜇=
𝑇𝑜𝑡𝑎𝑙 𝑅𝑒𝑝𝑎𝑖𝑟 𝐻𝑜𝑢𝑟𝑠
For example 15 items are repaired in 150 hours, then
Another form of Maintainability is
Where φ = average hours per maintenance action
26
Example 6:
Consider a component with a total number of failures of 86. The
corresponding total number of maintenance hours required for these 86
failures is 540. Compute the maintainability for 1, 5 and 10 hours.
27
Mean time to repair (MTTR)
• Mean time to repair (MTTR) is probably the most widely used
maintainability measure. It is the mean time required to perform a
given maintenance activity of a system.
• Let T1, T2, T3 etc are the times required to repair, then, MTTR is
expressed by:
• MTTR can also be computed using the following equation
Where MTBF is mean time between failures and Ai is the
inherent availability
28
Optimizing Maintainability
Considering the importance of Maintainability, it imperative to
maximize it. The following are the basic strategies for doing it.
1. Fault location and isolation: this is most frequently the time
consuming especially in complex systems. This can be minimized
by preventive maintenance where likely trouble spots are noted and
logged.
2. Repair time: Equipping well trained maintenance personnel, proper
tooling and repair facilities, spare parts tools.
3. Accessibility: A part that is easier to get reduces maintenance time
and reduces down time.
4. Interchangeability: This refers to plug-in devices where spares are
instantly interchangeable.
5. Redundancy: In large complex systems parallel components and
subsystems can be built in to be used while the failures are repaired.
29
Availability
• The probability that a system or equipment will be up and ready for
use is availability. In order to be ready for use the system must either
not had a failure or if a failure had occurred, have had it repaired.
• Thus availability includes both reliability and maintainability. On the
basis of this the availability can be defined as
“the probability that a stated percentage of equipment will have no
downtime in excess of t the mission time”.
• This definition implies that poor reliability can be offset by a good
maintainability.
30
Types of Availability
There are three different types of Availability depending on time.
1. Inherent Availability
2. Achieved Availability
3. Operational Availability
31
Inherent Availability
• It is “the probability that a system or equipment when used under
stated conditions in an ideal support environment (i.e., readily
available tools, spares, maintenance personnel etc) will operate
satisfactorily at any point in time as required”.
• It excludes preventive or scheduled maintenance actions, logistics
delay time and administrative delay times. It includes corrective
maintenance downtime.
• Inherent Availability is expressed as;
Where MTBF = Mean time between failure.
MTTR = Mean time to repair
32
Achieved Availability
• Achieved Availability is “the probability that a system or
equipment when used under stated conditions in an ideal support
environment (i.e., readily available tools, spares, maintenance
personnel etc) will operate satisfactorily at any point in time”
• This definition is similar to the Inherent Availability except the
preventive (i.e., scheduled) maintenance is included.
• It excludes logistic delay time and administrative delay time and is
expressed as;
• Where MTBM is mean time between maintenance and M the
mean active maintenance time, both being the functions of
corrective (unscheduled) and preventive (scheduled) maintenance
actions and times respectively
33
Operational Availability
• Operational Availability is “the probability that a system or
equipment when used under stated conditions in an actual
operational environment will operate satisfactorily when called
upon”.
• It is expressed as
• Where MDT is the mean maintenance down time. The reciprocal of
MDT is the frequency of maintenance which in turn is significant in
determining logistic support requirements.
• MDT includes active maintenance time (M), logistic delay time
and administrative delay time.
34
Example 7:
Consider a system that has the following information based on its
historical data.
Total uptime = 3000 hrs
CM downtime = 100 hrs
PM downtime = 30 hrs
Number of system failure = 10
Number of system downing PMs = 2
Mean logistics delay = 2 hrs
Determine the inherent (Ai), achieved (Aa), and operational (Ao)
availability of the system.
CM = Corrective maintenance, PM = Preventive maintenance
35