Learning objectives

When you complete Unit V you should be able to

 Define the terms reliability, maintainability and
Dr. Devendra Choudhary  Understand the importance of reliability,
Department of Mechanical Engineering maintainability and availability
Govt. Engineering College Ajmer  Understand the concepts of failure and its causes
 Differentiate among MTTF, MTBF and MTTR
 Define lifetime related functions such as density,
failure, survival and hazard rate
 Calculate lifetime functions from time to failure data
 Analyze the failure data for exponential and Weibull

Learning objectives Reliability & Quality

When you complete Unit V you should be able to  While quality is conformance of customer
 Sketch the bathtub curve and describe its three phases requirement only at a given point of time, the
 Calculate system-wide reliability of series, parallel, reliability is conformance of customer
mixed and m-out-of-n systems requirement over a period of time.
 Explain various reliability improvement techniques
 Explain the concept of redundancy
 Reliability is quality over time.
 Differentiate between active and standby redundant
 Differentiate between low-level and high-level
 Calculate the maintainability and the availability

Reliability Definition Reliability Definition

Reliability is the probability that a product,  Probability: Reliability is a probability, a probability of
performing without failure; thus, reliability is a number
equipment or process will perform its intended between zero and one.
function, without failure, under specific
 Failure: A failure is defined as any functioning of the device or
conditions for a specific period of time. component which is not considered within the prescribed limits
of satisfactory functioning. For example, if the function of a
pump is to deliver at least 500 liters of fluid per minute and it
is now delivering 400 liters per minute, the pump has failed,
by this definition.

 Function: The device whose reliability is in question must

perform a specific function. For example, if a blade of lawn-
mower used to trim hedges breaks, this should not be charged
as a failure as it can be repaired or replaced.


Reliability Definition Non-repairable system

 Conditions: The device must perform its function  A non-repairable system (or component, unit, part, etc.) is
under given conditions. For example, if electrical discarded upon first failure.
generators intended for use in ambient temperatures  For example, bulb, satellite, microchips and many electronic
of 0-120 degrees Fahrenheit are brought to operate  The lifetime of non-repairable system T is a random variable and
in the below zero degree Fahrenheit and fail, we described by a single time to failure (TTF).
should not charge failures to these units.  By noting values of TTF for a large number of identical and
independent items, one can fit an appropriate distribution.
 Time: The device must perform for a period of time.  For most of the items, the TTF assumes exponential or Weibull
One should never cite a reliability figure without  For given TTF distribution, one can determine mean time to failure
specifying the time in question. The exception to this (MTTF).
rule is for one-shot devices such as amunitions,  The average time to failure of the non-repairable system after
rockets, automobile air-bags. entering into service is known as mean time to failure (MTTF).

Relationship among MTTF, MTBF and

Repairable system
 A repairable system is a system which, upon failure, is  MTTF is reserved for a non-repairable component
restored to operation by any repair action other than or system, MTBF is used as a reliability measure
replacing the entire system. for the study of repairable systems.
 For example, autos and appliances can be repaired.
 Sometimes, MTTF is also used for repairable
 There are two random variables of interest in case of
repairable systems.
systems, where it represents a mean time to first
 The number of survivals or failures at a given point of time t
failure of the item.
 The operating time between successive failures (TBF)  The term MTTR is mean time to repair. It is
 The mean operating time between failures (abbreviated defined as the mean time elapsed in restoring a
as MTBF) is the expected length of time between process or system for stated operational condition
successive failures of a repairable component or system. after a failure through maintenance and repair.

Relationship among MTTF, MTBF and

Product lifetime related functions
 These functions are widely used in measuring
 Let T be the random variable defining the lifetime
of the product, which is the time the product will
operate before failure.
Down MTTR MTTR MTTR  Sometimes T can take only discrete countable values,
for example, a number of cycles.
t=0 Time of 1st failure Time of 2nd failure
 Most of the time, however, time to failure T will be a
continuous random variable.


Failure density function Failure density function

 Failure density function f (t) is the probability density  The PDF of TTF satisfies below given two
function (PDF) of the TTF.
 It approximates the probability of failure in a small
time interval t or probability of failure around an age  f (t)  0, for all t
of t. That is 

P t 
T t
t 

 f t dt  1

f t   Lim 
2 2 
t 0 t  The probability of reaching an age between t1 and
 For a product starting at age t = 0, the probability to
t2, t1 < t2, is
fail up to an age t > 0 is given by t2

Pt t  T  t 2    f t dt
t t1

PT  t    f t dt

Failure density function Failure (or unreliability) function

 Assume that there are a large number of  The probability that a process, device or system will
independent and identical items that started not perform its intended function for a given interval
operating in a common environment at t = 0, N(0) = of time under specified operating conditions is called
N. Failure times of items are recorded, and therefore unreliability.
the number of operating items N(t) at each instant of
time t  0 is known. Then  The failure distribution function F(t), also known as
unreliability, gives the probability of failing up to
f t  
Number of failuresduring a unit interval
age t or of having a life span of at most length t, that
Total number of items
F t   PT  t 
 If nk is number of failures during kth, t interval, we
have  The value of failure function increases as the time
nk increases.
f t k  

Failure (or unreliability) function Failure (or unreliability) function

 F(t) is identical to the cumulative distribution  In case of testing a large number of independent
function (CDF) in probability theory and hence and identical items N, if the number of operating
gives the probability that a measured value of items n(t) and the number of failed items m(t) at
TTF will fall between 0 and t. each instant of time t  0 is known. Then, F(t) is
 In probability theory, CDF and PDF of a lifetime the fraction of items that fail by time t. That is
variable are related as follows:
F t  
m(t ) m(t )
F t  t   F t  dF t 

f t   Lim  Or F t    f t dt 
t 0 t dt 0 N m(t )  n(t )

 Thus, the failure function is equal to the area

beneath the probability density function.


Reliability (or Survival) function Relationship between R(t) and F(t)

 The reliability function is the probability of no failures in  The failure function and the reliability function
the interval between 0 and t or equivalently, the
probability of failure after time t, that is are complementary functions, so
t 

Rt   PT  t  t 0 Rt   1  F (t )  1   f t dt   f t dt

0 t

 In case of testing a large number of independent and  R(t) is the probability of exceeding t and F(t) is
identical items N, if the number of operating items n(t)
and the number of failed items m(t) at each instant of the probability of reaching t.
time t  0 is known. Then, R(t) is the fraction of items in a  In other words, R(t) gives the probability of its
population that survive up to time t. That is
functioning at time t and F(t) is the probability of
its being down at time t.
Rt  
n(t ) n(t )

N m(t )  n(t )

Hazard rate function Hazard rate function

It quantifies the risk of failure as the age of the F t  t   F t  f t 
ht   Lim
 dR(t ) 1
  .
system increases. t 0 Rt t Rt  dt R(t )
 The hazard rate function is the conditional
probability of a failure in time interval from t to t  The hazard rate function is the ratio of the
+ t given that the system has survived to time t, probability density function to the reliability
that is function.
F t  t   F t  Rt   Rt  t   Integrating both sides of the above equation, we
ht   Pt  T  t  t / T  t   Lim  Lim
t 0 Rt t t 0 Rt t get:
 Note that hazard function h(t) is not a probability t
  h ( t ) dt
 t 
t t t
and hence can be greater than 1.  h(t )dt  
dR(t )
 h(t )dt   ln Rt  or R(t )  exp    h(t )dt   e 0
0 0
R(t ) 0  0 

Failure rate function Failure rate function

 The hazard function can be increasing,  Assume that there are N number of independent and
decreasing or constant. identical items that started operating in a common
environment at t = 0. If after t  0, the number of
 Whenever, the hazard function is constant, we operating items remains N(t). Then
call it as failure rate λ. 1 N  N t 
 .
t N
Total number of failures in a population
Cumulative operating time of the population  For example, if 1000 controllers that started
operating in a common environment at t = 0. After
10 hours if the number of operating controllers
remains 950, then λ = 50/(1000*10) = 0.005 failures
per hour.


MTTF Expected number of failures

 The mean time to failure (MTTF) is mean lifespan or  If N(t) is the total number of failures by time t,
expected time to failure of a component or system. then the expected number of failures E[N(t)] by
 
 dRt  
 time t can be obtained by taking integration of
MTTF  E (t )   tf t dt   t   dt  tR t  0   Rt dt

0 0 
dt  0 the hazard function from period 0 to t. That is
Since, at t = 0, R(t) = 1 and at t = , R(t) = 0, so t

 EN t    ht dt

MTTF   Rt dt 0

 Thus, the mean life or MTTF is equal to the area

beneath the reliability function.

Example 1 Example 1
 Consider that one hundred identical products are  Let N be the total initial population and nk be the
installed and the number of products that fail number of failures during kth unit interval, the
during each year interval is noted. Total number failure density is obtained using Equation
of failed products at the end of 1 year, 2 year, 3 f t1  

 0.22 f t2  

 0.16
year and so on is given below. Nt 100 Nt 100
1 2 3 4 5 6 7 8 9 10 11
 Since, failure function is the cumulative
Number distribution function of time to failure, we have
of 22 16 12 10 8 7 5 4 4 3 9 k
F t k    f t i   f t1   f t 2   ...  f t i 
i 1

 Compute failure density, failure function, hazard  For example,

rate, reliability and MTTF. F t1   0.22 F t 2   0.22  0.16  0.38

Example 1 Example 1
 As the failure function and the reliability function  Hazard rate is the ratio of the number of failures
are complementary functions, so during kth unit interval to the average population
R(tk) = 1 – F(tk) in that particular interval, that is
 For example, ht k  
 
 N (t t )  N (t t )  / 2
R(t1) = 1 – F(t1) = 1 – 0.22 = 0.78 and  k
 2
2 

R(t2) = 1 – F(t2) = 1 – 0.38 = 0.62  For example

ht1   ht 2  
22 16
 0.2472  0.2286
(100  78) / 2 (78  62) / 2


Example 1 Example 1
Interval Number of Cumulative Number of Failure Failure Hazard Reliability
failures failures survivors density function rate function
0.25 2.50
1 22 22 78 0.22 0.22 0.2472 0.78
2 16 38 62 0.16 0.38 0.2286 0.62 0.2 2.00
Hazard function
Density function
3 12 50 50 0.12 0.5 0.2143 0.50
0.15 1.50
4 10 60 40 0.10 0.6 0.2222 0.40


5 8 68 32 0.08 0.68 0.2222 0.32 0.1 1.00

6 7 75 25 0.07 0.75 0.2456 0.25

0.05 0.50
7 5 80 20 0.05 0.8 0.2222 0.20
8 4 84 16 0.04 0.84 0.2222 0.16 0 0.00
1 6 11
9 4 88 12 0.04 0.88 0.2857 0.12
10 3 91 9 0.03 0.91 0.2857 0.09
11 9 100 0 0.09 1.00 2.0000 0.00

Example 1 Example 1
 Let nk (k = 1, 2, ..., l) be the number of failures during
1.2 kth, t interval, then mean time to failure will be
1 Failure function l

 kn k 
n1  2n2  3n3  ...  knk 
F(t) & R(t)

N k 1 N

 In given problem, we have
0.2 function

MTTF = (1*22 + 2*16 + 3*12 + 4*10 + 5*8 + 6*7 +
1 6 11
time 7*5 + 8*4 + 9*4 + 10*3 + 11*9) / 100
MTTF = 444 / 100 = 4.44 years

Example 2 Example 2
 0.25 
 The failure density function for a class of  Given, f t   0.25   t
 8 
components is given by  Failure distribution function is given by

 0.25    0.25    0.25  2

t t

f t   0.25   t F t    f t  dt    0.25  
 8  
 t  dt  0.25 t  
 16 
 8   We know,

 0.25  2
where t is in years. Find failure distribution, Rt   1  F t   1  0.25 t   t
 16 
reliability and hazard rate functions. Sketch the  Since, the hazard rate function is the ratio of the probability density function to
the reliability function, that is
four functions and also find MTTF.
 0.25 
0.25   t
f t   8  2  0.25t
ht    
Rt   0.25  2 8  2t  0.125t 2
1  0.25 t   t
 16 


Example 2 Example 2
 Integrate reliability function from 0 to 8 to get
1.5 MTTF as it is equal to the area under the



reliability function, that is
  0.25  2 
0 8
 0.25 2 0.25 3 
MTTF   1  0.25 t  
0 2 4 6 8 10 0 2 4 6 8 10
 t  dt  t  t  t  2.667 years.
48  0

0  16    2

1.2 1.2

1 1

0.8 0.8


0.4 0.4

0.2 0.2

0 0
0 2 4 6 8 10 0 2 4 t 6 8 10

Example 3 Example 3
 The hazard rate function for a class of  Given, ht   3t 2  2t
components is given by  Following equation is used to derive the
ht   3t 2  2t reliability function from a known hazard rate
function. t

where t is in hours. Find failure density and  t    h ( t ) dt

R(t )  exp    h(t )dt   e 0
reliability functions. Also find reliability at t = 2.  0 

 t 
    
R(t )  exp    3t 2  2t dt   exp  t 3  t 2  et (1t )

 0 

 At t = 2, R(2) = 0.0183

Lifetime functions under exponential

Example 3
failure distribution
 Following equation is used to derive the failure  The failure density function under exponential
density function from a known reliability function distribution is given by:
or hazard rate function. f t   e  t
 t  The remaining lifetime functions under
 

f (t )  ht Rt   ht  exp    h(t )dt   3t 2  2t et (1t )

exponential distribution are given by:

 0 
F (t )  1  e t R(t )  e t

h(t )   MTTF 
EN t   t


System reliability models Series system

 A system consists of a complex configuration of  In series reliability system, all the components
multiple components. must be working for the system to function as
these may be connected in series.
 System reliability models are usually studied to
1 2 n
 The reliability of a system for given configuration of
several components, or
 The number of components and their structure to  The word series does not imply the physical
achieve target reliability. arrangement of the components; rather it
describes the response of the system to the
failure of one of its components.

Series system Series system

 If there are n (i = 1, 2, …, n) components in series  If the components have exponential failure
in a system and Ri(t) be the corresponding distribution, then system reliability reduces to:
reliabilities, then the reliability of system Rs(t) is  n

 i t 
Rs t   e 1t * e 2t * ... * e nt  e
 
 i 1 
given by:
Rs t    Ri t   R1 t  * R2 t  * ... * Rn t   If λs is the failure rate of a series system, you can
i 1
note from the above equation that it is the sum of
 The reliability of a series system is the product of the component failure rates. That is
the component reliabilities. n
 s   i and MTTFs  n

i 1 
i 1

Series system Parallel system

 In a series system, the system reliability is a  In parallel reliability system, at least one of the
function of components must be working for the system to
 Individual component reliabilities and function as these are connected in parallel.
 The number of components in series.

0.60 R(t) = 0.95

R(t) = 0.80 R(t) = 0.90
0.00 n
1 5 9 13 17 21
Number of components


Parallel system Parallel system

 If there are n (i = 1, 2, …, n) components in  If similar components (i.e., λ1 = λ2 = …= λn =
parallel in a system and Ri(t) be the λ), having exponential time to failure distribution,
corresponding reliabilities, then the reliability of are connected in parallel, then system reliability
parallel system Rs(t) is determined by first and mean time to failure are given by:
calculating the probability that system will fail.
Rs t   1  1  e t 
That is
n n
Rs t   1  Fs t   1   Fi t   1   1  Ri t  1 n
i 1 i 1

i 1

Rs t   1  1  R1 t  * 1  R2 t  * ... * 1  Rn t 

Parallel system Example 4

 The reliability of a parallel system increases with  Three components X, Y and Z have reliabilities of
increase in the number of components, or 0.92, 0.95 and 0.96 respectively. Compare the
increase in individual component reliabilities, or reliability of a series system with parallel system
both. 1.00
made up of these components.
 Solution
 Given: RX(t) = 0.92, RY(t) = 0.95 and RZ(t) = 0.96

 The reliability of a series system is the product

of the component reliabilities, that is
0.80 Rs t Series  RX t  * RY t  * RZ t 
Rs t Series  0.92 * 0.95 * 0.96  0.839
1 2 3 4 5
Number of components

Example 4 Example 5
 The reliability of parallel system is given by  A series system is composed of four components
with failure rates of 0.002, 0.001, 0.0025 and
Rs t Parallel  1  1  RX t  * 1  RY t  * 1  RZ t 
0.0005. What is the 100 hours system reliability?
Rs t Parallel  1  1  0.92 * 1  0.95 * 1  0.96  0.9998 Also, compute MTTF.
 Solution
 We observe that the system-wide reliability of
parallel system is higher than series system.  The series system reliability is given by
 n 
  it 
Therefore, a higher reliability can be achieved by Rs t   e 1t * e 2t * ... * e nt  e

  i 1 

connecting components in parallel.


Example 5 Example 6
 For given problem, we have  The reliability of a communication channel is
0.60. How many identical channels should be
Rs t   e 0.0020.0010.00250.0005t  e 0.006 t placed in parallel so as to achieve the reliability
of communication system as 0.93?
 At t = 100 hours, we have
 Solution
Rs t   e 0.006 t  e 0.006*100  0.5488  The reliability of parallel system is given by
1 1 n
MTTFs  n
  166.67 hours Rs t   1   1  Ri t 
 i
i 1
i 1

Example 6 Reliability improvement techniques

 We rearrange above equation as follows:  There are several ways to improve system

 1  R t   1  R t 
i s
i 1
 As components are identical, we can write  Product design
or ln 1  Rs t   Redundancy
1  Ri t n  1  Rs t  n
ln 1  Ri t   Maintenance

 For given Rs(t) = 0.93 and Ri(t) = 0.60  The performance of maintenance is measured in
terms of maintainability and availability.
ln 1  0.93
n = 2.90  3
ln 1  0.60
 Therefore, three channels should be placed in
parallel to achieve desired reliability.

Design for reliability Design for reliability

 Design for reliability is a process which is  The following activities are performed to ensure design
for reliability:
performed during the design of the product so as  Specify product reliability targets before any design work is
to ensure that the product is able to perform to a undertaken.
 Include reliability requirements as per likely service conditions
required level of reliability. in the problem definition.
 Designs are to be assessed based on ease of inspection, ease
of maintenance and cost of maintenance.
 The important product characteristic like failure  Select reliable parts and components in new product design
based on estimating failure rate, failure mode, MTTF and
rate, failure mode, failure mechanism, others.
availability, life of the product, maintenance ease  Check for quality assurance during production.
are used in developing the product design.  Take feedback related to service failures, MTTF and MTTR,
which is to be used to improve the design.


Redundancy Active (or dynamic) redundancy

 In reliability engineering, redundancy refers to  The parallel reliability systems are often called active
redundant systems as all the components are functioning
the use of more than one component or system simultaneously.
for the same function. 1 2 n

 The increase in the reliability value depends on: A

1 2 n

 reliability values of the individual components,

1 2 n
 number of components and Divider

 type of configuration in which these components are

1 2 n

connected with one another.

 The component level redundancy is known as low-level
redundancy, whereas the system level redundancy known
as high-level redundancy.

Standby (or passive) redundancy Maintenance

 A standby unit cuts in and takes over when the  Maintenance includes activities such as cleaning,
current operating unit fails. lubrication, topping up, adjustment and
calibration, condition assessment, repairs and
 The unit should be provided by sensors and replacement.
switching mechanisms to sense the failure and to
 Maintenance enables to detect and prevent
place the unit in service. failures as they would occur and hence increase
A system reliability and availability.
 The effectiveness of maintenance is mainly
SW assessed by maintainability and availability

Maintainability Maintainability
 Maintainability M(t) is the probability that a  Assume that an equipment has m failures during
system that has failed can be retained in or a certain period of time selected for our analysis.
restored to a specified operable condition within
a specified interval of time, when maintenance is  Obviously, time to repair (TTR) these failures
performed in accordance with prescribed would be a random variable. Let g(t) be the
procedures. probability density function of TTR, then
 Maintainability is a characteristic of design, m

 TTR i 
installation, and operation of system and MTTR  i 1
  tg t dt
equipment. 0

 It is quantified by mean time to repair (MTTR). t

M t   PTTR  t    g t dt


Maintainability Maintainability
 For an exponential time to repair distribution, we  Maintainability for repair time 1 hour, 2 hours
have and 10 hours are computed below:

M t   1  e MTTR
 1  e tr M 1  1  e 0.2*1  0.1812 M 2  1  e 0.2*2  0.3296
M 10  1  e0.2*10
 0.8646
where μ = 1/MTTR is called repair rate.  The interpretation of these results is as follows:
 Let a system failed 50 times during its lifetime.  A failure has only 18% chance of being repaired in 1
Assume that maintenance hours used to repair hour; however 86% chance of being repaired in 10
these failures is 250 hours. Then, MTTR = 250/50
 In other words, 18 failures out of 100 will be repaired in
= 5 hours or μ = 0.2 repairs per hour. 1 hour, but 86 can be repaired in 10 hours.

Availability Availability
 An equipment (process or system) may be either  The system may be down due to following reasons:
in working state or in non-working state during  Equipment failures
its specified life.  Tooling damage
 The running or working time of a system is called  Unplanned maintenance

up time.  Process warm up

 Machine changeovers
 The time period during which a system is not
 Material shortage
able to deliver requested services is called down
time.  The time elapsed in setup, planned maintenance and
any scheduled shut down are not the part of the
down time.

 Operational availability Ao(t) is the proportion of the up time to
the sum of up and down times. That is

Ao t  
Up time
Up time  Down time
 Note that up time is MTBF/MTFF and down time is MTTR.
Ao t  
 If TTR and TTF distribution are exponential, then λ = 1/MTTF and
μ = 1/MTTR and we get

Ao t  
 Operational availability is also called as steady-state availability
or inherent availability.

View publication stats 12

