Chapter 1-1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 60



Course Outline
• Reliability:
• failure probability and density functions
• component reliability
• measures of reliability
• reliability in the systems life cycle
 reliability analysis methods
• design review and evaluation or reliability
• Maintainability:
• Maintenance management organization and scheduling
• measures of maintainability
• maintainability in the system life cycle
• maintainability analysis methods
• Design review and evaluation of maintainability
Course Outline

• Core Reading Material

1. Patrick O. 2002, Practical Reliability Engineering, Wiley, New
Delhi, INDIA
2. Charles E.E 2009, An Introduction to Reliability and
MaintainabilityEngineering, Waveland, New London
3. Nicholas S. 2004, Basic Reliability: An Introduction to
ReliabilityEngineering, Author House, New York.

Recommended Reference Material

1. Igor B. 2004, Reliability Theory and Practical, Dover
Publications, New York, Minneola
2. Ricky Smith, R. Keith 2007, Rules of Thumb for Maintenance
andReliability Engineers, Butterworth-Heinemann, London
3. Ushakov ed. Igor 1994, Handbook of Reliability Engineering,
Wiley, New Delhi, INDIA

Definition of Reliability:
“The ability of an item to perform a required function under stated
conditions for a stated period of time”
• “quality over time”

• Monitored according to BS4778 standard

• Has both quantitative and qualitative aspects;

• Measurements of reliability are necessary for customer

requirements compliance.
• “Measuring reliability does not make a product reliable, only by
designing in reliability can a product achieve its reliability targets”
The Reliability definition has four important elements:
• Probability
(A value between 0 and 1, number of times that an event occurs
(success) divided by total number trials)
e.g. probability of 0.91 means that 91 of 100 items will still be working at stated
time under stated conditions
• Performance
(Some criteria to define when and how product fails, which also describes
what is considered to be satisfactory system operation)
e.g. amount of beam collisions, etc
• Time
(system working until time (t), used to predict probability of an item
surviving without failure for a designated period of time)
• Operating conditions
These describe the operating conditions (environmental factors, humidity,
vibration, shock, temperature cycle, operational profile, etc.) that correspond to
the stated product life.

Introduction to Reliability Engineering 5 e-Learning course.

Conflicts with real world.

There are “Real World” conflicts with this definition that we need to
keep in mind…
• Probability – Customers expect a probability of 1, “It Works”
• Intended Function – The product may be used in unintended ways and still be
expected to work
• Under Stated Conditions – The product may be operated outside of the stated
conditions and still be expected to work
• Prescribed Procedures – Customers may not have the required tools or skill level
and may not follow procedures and still expect the product to work

Customers are looking for Quality over Time

6 e-Learning course.

 Reliability, Availability, Maintainability, Safety and Quality are

what the Customer says they are, not what the Engineers or the
Designers say they are.
 Companies who control the Reliability of their products can only
survive in the business in future as today's consumer is more
“intelligent” and product aware.
 Liability for unreliable products can be very high.
 Complexity of products is ever increasing and thus challenge to
Reliability Engineering is also increasing.
 Products are being advertised by their Reliability Ratings.

“PRIDE = Put Reliability In Daily Efforts”

Introduction to Reliability Engineering 7

Objectives of Reliability Engineering
• To apply engineering knowledge to prevent or reduce the
likelihood or frequency of failures;
• To identify and correct the causes of failure that do occur;
• To determine ways of coping with failures that do occur;
• To apply methods of estimating the likely reliability of new
designs, and for analysing reliability data.

When Should Reliability Be
“From the cradle to the grave.”
i.e. The entire life cycle of the product.

9 e-Learning course.
“any event or collection of events that causes the system to lose
its functionability”

• The inherent characteristic of a product related to its ability to
perform a specified function according to the specified
requirements under the specified operating conditions
• Transition from reliability to failure can be instantaneous (tyre
burst, transformer explosion, transistor blowing)
• Can also be gradual (cracks in insulation, bearing wears, cable
• Health monitoring can prevent failure

Definition of Concepts
• Failure - A failure is an event when an item is not available
to perform its function at specified conditions when
scheduled or is not capable of performing functions to
• Failure Rate - The number of failures per unit of gross
operating period in terms of time, events, cycles.
• MTBF - Mean Time Between Failures - The average time
between failure occurrences. The number of items and
their operating time divided by the total number of failures.
For Repairable Items
• MTTF - Mean Time To Failure - The average time to
failure occurrence. The number of items and their
operating time divided by the total number of failures. For
Repairable Items and Non-repairable Items
Definition of Concepts
 Hazard-The potential to cause harm. Harm including ill health and
injury, damage to property, plant, products or theenvironment,
production losses or increased liabilities.
 Risk-The likelihood that a specified undesired event will occur due
to the realisation of a hazard by, or during work activities or by the
products and services created by workactivities.

Quality, Reliability and Safety
 Reliability can be considered as ”Quality over
time”.Customers frequently use the terms ”quality”
and”reliability”. We need to understand what they expect.

 Measurement of reliability is related to failure rates,number

of failures, warranty cost etc. Thus, reliability is experienced
by the customers when they use the product.

 Quality Level is measured in terms of defect levels (such as

ppm) when the product is received as new.

 Quality and reliability both can have significant impact on


Quality, Reliability and Safety
 Quality defects and failures both can adversely affect safety
ofuser, bystanders and equipment.

 Some quality defects can lead to unreliable and/or


 Some examples of how unreliabily can affect safety:

 Failure of automobile steering system, brake
system, axles etc, can result in seriousaccidents.
 Short circuit in electrical equipment can result in a
shock or death.
 Failure of safety valve in a pressure cooker, leakage of
regulator of an LPG cylinder can result in an explosion.
 Poor reliability of a bridge can result in an accident and

 However, all failures are not safety issues and all safety
issues are not due to failures.
 As Reliability Engineering is concerned with analyzing
failures and providing feedback to design and production to
prevent future failures, it is only natural that a rigorous
classification of failure types must be agreed upon.

How Do Products Really Fail


They include
 Reliability function,
 Cumulative failure distribution function,
 Failure density function, and
 Hazard rate function.
In this section, we also obtain expressions for the mean and median of the random variable
called the time to failure or failure time.

i. Reliability Function
 If reliability is a probability, then a random variable has to be associated with it because
probability is always associated with a random variable.
 The random variable in the case of reliability is generally time – the time in which a
component/system fails.
 Let this random variable, called time to failure or failure time of the component/system,
be denoted by T. Then, by definition, reliability [R(t)] at time t, of the component/system
is given by

 The probability of survival of a component decreases as the life time of the component
increases. Ultimately this probability will approach zero, since no component can perform
its intended function forever. A typical shape of the reliability function is shown in Fig.
 From the figure, you can see that the reliability function is a non-increasing function in
time (t).

ii. Cumulative Failure Distribution Function

 If we want to calculate the probability of failure of a component at time t (known as
unreliability of the component), then we have to simply obtain the value of the function
F(t). Therefore, if we denote reliability and unreliability of a component by R and Q,
respectively, then they satisfy the relation:

 Note that the probability of failure of a component increases after its useful life
period is over. Ultimately this probability will approach 1. A typical shape of
cumulative failure distribution is shown in Fig. 13.2.

iii. Failure Density Function
 From probability you learnt, the derivative of cumulative distribution function of a
random variable is known as the probability density function (pdf) of the random
 But in the terminology of reliability, the probability density function is known as failure
density function (fdf)
 As explained above, the cumulative distribution function in reliability terminology is the
cumulative failure distribution function. So if f(t) denotes failure density function (life
time density function), then

 Recall that the following relationship exists between f(t) and F(t):

 Further manipulations give us:

iv. Hazard Rate
 The instantaneous rate of failure at time t is known as the hazard rate and is generally
denoted by (t).
 Conditional probability of failure in the interval 𝑡 - (𝑡 + 𝑑𝑡) given that no failure has
occurred by time 𝑡

v. Relationship between the Functions R(t), F(t), f(t) and (t)

vi. Mean Time to Failure and Median of the Random Variable T
 In reliability terminology, the mean of the random variable T in the absence of
repair and replacement is known as the mean time to failure (MTTF).
 Usually the users of a product/component are interested in knowing the average life of
the product/component rather than the complete failure details.
 MTTF is a measure which simply gives a number (in units of time in which life of the
product/ component is measured) that tells us on an average how long the product/
component performs its intended function successfully.
 It is calculated by taking the mean of the lifetimes obtained on the basis of results of a
sample of such identical products/components tested under stated conditions.
 For example, suppose we put 10 identical components to test under the stated
Let ti, (i=1, 2,...,10) denote the time for which ith component performs its intended
function successfully. The results of the sample are shown in Fig. 13.4. Thus, mean
time to failure (MTTF) of such components on the basis of the results of this
sample is given by

 However, if we are given the basic functions of a component then by definition, MTTF is
given by

 Median is that value of the variate which divides the distribution of the variate into two
equal parts. Therefore, if tmd denotes the median of the random variable T, then

Example 4

Consider the pdf for the uniform random variable

given below:
f (t )  , 0  t  100
where t is time-to-failure in hours. Draw the pdf,
cdf and the reliability function.
f(t) pdf t t
1 t
F(t)   f(t)dt   dt 
0 0
100 100

100 T
f(t) Cdf & R(t)
1/10 t
0 R(t)  1  F(t)  1 


Example 5
Given the probability density function
f (t )  t , 0  t  100
where t is time-to-failure in hours and the pdf is shown below:




0 t
0 20 40 60 80 100

Graph the cdf and the reliability function.

t t
1 t2
F(t)   f(t)dt   t dt 
0 0
5000 10000
R(t)  1  F(t)  1 
Example 6
For the reliability function
R(t )  e(t /800) , t  0

where t is time-to-failure in hours.

(1) What is the 200 hr reliability?
(2) What is the 500 hr reliability?
(3) If this item has been working for 200 hrs, What is the reliability of
500 hrs? Solution

( 200800) 2
R(200)  e 
( 500800) 2
R(500)  e 
p(T  500) R(500)
R(500/200)   
p(T  200) R(200)

Example 4
Given the following time to failure probability density function (pdf):
f (t )  0.01e0.01t , t  0
where t is time-to-failure in hours. What is the reliability function?


R(t)  e 0.01t

Example 7
Given the cumulative distribution function (cdf):

F (t )  e(t /800) , t  0

where t is time-to-failure in hours.

(1) What is the reliability function?
(2) What is the probability that a device will survive for 70 hr?

Example 8
Consider the pdf used in Example 2 given by
f (t )  t , 0  t  100
Calculate the hazard function.


f(t) ( 5000)t
h(t)  
R(t) t2

Example 9
Given h(t)=18t, find R(t), F(t), and f(t).

Solution t

 18tdt
R(t)  e  e 9t

F(t) 

f(t) 

Example 10

Example 11

Example 11


 Consider a population of new identical components and suppose that all of them enter the
field at some point in time, say, t = 0.
 In general, the curve of relative failure rate of the entire population of the components
without replacement has the typical shape of a bathtub. Therefore, it is known as the
bathtub curve as shown in the figure below.
 The bathtub curve is nothing but a graphical representation of the failure rate of a
population of identical components versus time

I. Early Failure or Infant Mortality
 When a population of new identical components starts to perform its intended function, a
high failure rate is observed in initial stages due to many factors such as manufacturing
errors, poor quality or substandard items, incorrect adjustment or positioning, bad
assembly, human factors, improper design, etc.
 This phase of the bathtub curve is known as the period of early failure or infant
mortality. It is also known as the burn-in or decreasing failure rate, or debugging period.
 Most failures in this period are due to manufacturing errors, design problems or poor
quality for which manufacturer may be responsible. So this period is generally covered
by a warranty period by the manufacturer. The duration of this period varies from
component to component. However, it is typically for the period of time till failure rate
 The hazard rate in this phase can be reduced by increasing quality control at the
production level.
 However, even increased quality control at the production stage cannot completely
eliminate infant mortality. Therefore, components should be tested at the factory before
delivering them to the customers. That is why, good companies generally test the
components before supplying them to users.

II. Normal Life, Useful Life or Chance Failure Period

 Most components having manufacturing error, improper design, bad assembly,
etc. have been failed in the early failure period. So for a long period of time
(length of the period may vary from component to component), fewer failures
are reported and failure rate remains almost at a constant level. This phase of
the life of the component is known as normal life, useful life or chance
failure period.
 Usually failures in this period are due to stress-related causes, random
fluctuations, etc. But these failures cannot be predicted exactly and happen
randomly. That is why this phase is known as chance failure period.
 The appropriate distribution used in this phase is the exponential distribution. We know
that in the case of constant failure rate, the failure distribution of the component follows
an exponential distribution.

III. Wear out Failures

 Since no component is perfect, it cannot last forever. So after the span of useful
life, the failure rate of components starts increasing due to ageing of the
 This phase of the bathtub curve when the components begin to deteriorate is known as
the period of wear out failures.
 In the wear out phase, failure rate increases with time. So if we wish to
minimize this interruption in the smooth working of a real life system, one of
the strategies which can be helpful is replacement of the component when it
reaches mean time to failure of its life.
 The probability distributions generally used to study the failure characteristics in this
phase are normal (due the shape of the failure curve in this phase), Weibull and Gamma
(due to the presence of shape parameter in these distributions).

Bathtub Curve: Summary Table
Phase Failure Rate Possible Causes Possible improvement
Burn-in (A- Decreasing Manufacturing defects, Better QC, Acceptance
B) (DFR) welding, soldering, testing, Burn-in testing,
assembly errors, part screening, Highly
defects, poor QC, poor Accelerated Stress
workmanship, etc Screening, etc.
Useful Life Constant Environment, random loads, Excess Strength,
(B-C) (CFR) Human errors, chance redundancy, robust design,
events, ’Acts of God’, etc etc
Wear-out Increasing Fatigue, Corrosion, Aging, Derating, preventive
(C-D) (IFR) Friction, etc. maintenance, parts
replacement, better
material, improved designs,
technology, etc.

48 e-Learning course.

 It is a real challenge for the people working in the field of reliability to estimate the basic
reliability functions (reliability, cumulative failure distribution, failure density and hazard
rate) from the failure data of a sample of identical components obtained either from test
generated failures or collected from actual field of operation of these components.

Estimate and plot the reliability, cumulative failure distribution, failure density and failure rate
In this example, the number of components in the sample is large. So instead of recording the
time to failure of each component individually, it is a common practice to record the number of
failures in some suitable intervals of time.
 In this example, the interval of time is taken as 10 hours. The estimate of reliability,
cumulative failure distribution, failure density and failure/hazard rate functions are given
in columns 5, 6, 7 and 8 of Table 13.3. The calculations for different columns are
explained below.
Calculations of Columns 1 and 2
 In column 1, we simply enter time in hours starting from the lower bound of first class
interval up to the upper bound of last class interval for the data given in Table 13.2.
 Entries in the second column represent number of failures observed during test in each
interval. The number of failures (frequencies) listed against each class interval of time in
Table 13.2 is the sum of the numbers of all components that fail during that time interval.
 In the second column of Table 13.3, this frequency is entered against the upper bound of
the class interval during which it is observed.
Calculation of Column 3
 Entries in this column simply represent the cumulative frequencies of the frequencies
written in column 2.
 If N(t)f denotes the number of components that have failed by time t, then the first three
entries of the column are calculated as follows:

Calculation of Column 4
 Entries of column 4 represent the number of components that are performing their
intended function adequately at time t. Therefore, if N(t)s denotes the number of
components that are performing their intended function at time t, then


You might also like