100% found this document useful (1 vote)

140 views34 pages

Overview of Reliability Engineering: Eric Marsden

Reliability engineering aims to ensure that systems function as required over time. Reliability engineers address questions about when failures occur, why they occur, and how to reduce failures. Key concepts include failures, faults, errors, failure modes, failure classification (safe/dangerous, detected/undetected), common cause failures, and reliability definitions. Reliability is quantified using metrics like mean time between failures and survival functions, with the goal of modeling and improving system dependability.

Uploaded by

Luis Gonzalo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

140 views34 pages

Overview of Reliability Engineering: Eric Marsden

Uploaded by

Luis Gonzalo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Overview of reliability engineering

Eric Marsden
<[email protected]>
Context

▷ I have a fleet of airline engines and want to anticipate when they may fail

▷ I am purchasing pumps for my refinery and want to understand the

MTBF, lambda etc. provided by the manufacturers

▷ I want to compare different system designs to determine the impact of

architecture on availability
Reliability engineering

▷ Reliability engineering is the discipline of ensuring that a system will

function as required over a specified time period when operated and
maintained in a specified manner.

▷ Reliability engineers address 3 basic questions:

• When does something fail?
• Why does it fail?
• How can the likelihood of failure be reduced?
Failure

Failure
Loss of ability to perform as required.

▷ A failure is always related to a required function. The function is often

specified together with a performance requirement (eg. “must handle up
to 3 tonnes per minute”, “must respond within 0.1 seconds”).

▷ A failure occurs when the function cannot be performed or has a

performance that falls outside the performance requirement.

Source: International Electrotechnical Vocabulary (IEV) part 192 on dependability

Fault

Fault
Inability to perform as required, due to an internal state [IEV 192-04-01]

▷ While a failure is an event that occurs at a specific point in time, a fault is

a state that will last for a shorter or longer period.

▷ When a failure occurs, the item enters the failed state. A failure may
occur:
• while running
• while in standby
• due to demand

Source: International Electrotechnical Vocabulary (IEV) part 192 on dependability

Error

Error
Discrepancy between a computed, observed, or measured value or condition and
the true, specified, or theoretically correct value or condition.

▷ An error is present when the performance of a function deviates from the

target performance, but still satisfies the performance requirement

▷ An error will often, but not always, develop into a failure

Source: International Electrotechnical Vocabulary (IEV) part 192 on dependability, item 192-03-02
Failure mode

Failure mode
The way a failure is observed on a failed item.

▷ An item can fail in many different ways: a failure mode is a description of

a possible state of the item after it has failed

Source: International Electrotechnical Vocabulary (IEV) part 192 on dependability, item 192-03-17
Failure classification

IEC 61508 classifies failures according to their:

▷ Causes:
• random (hardware) faults
• systematic faults (including software faults)

▷ Effects:
• safe failures
• dangerous failures

▷ Detectability:
• detected: revealed by online diagnostics
• undetected: revealed by functional tests or upon a real demand for activation

IEC 61508: Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems

Markovian models

inputs outputs
λ Models the transitions between correct state
and failed state.

Assumption: nothing in the past determines

future events except for current state.
failure rate λ

Failure and repair are stochastic processes.

correct failed
state state Availability = proportion of time spent in
correct state.
repair rate μ
The “safe failure fraction”

failure rate λ

correct failed
state state

repair rate μ
The “safe failure fraction”

failure rate λ Not all failures are dangerous: the system may have
been designed to tolerate them.
correct failed
state state

repair rate μ

rate of non-detected and

dangerous failures 𝜆𝐷
dangerous
state

service
repair rate μ
OK
degraded
but safe
rate of safe or dangerous service
and detected failure 𝜆𝑆
The “safe failure fraction”

failure rate λ Not all failures are dangerous: the system may have
been designed to tolerate them.
correct failed
state state

repair rate μ

rate of non-detected and

dangerous failures 𝜆𝐷
dangerous
state

service
repair rate μ
OK Importance of the coverage of the error detection
degraded
but safe
mechanisms, measured by the “safe failure fraction”:
rate of safe or dangerous service conditional probability that a failure will be safe, or
and detected failure 𝜆𝑆
dangerous-but-detected.
Failure classification

▷ Safe undetected (SU): A spurious (untimely) activation of a component

when not demanded

▷ Safe detected (SD): A non-critical alarm raised by the component

▷ Dangerous detected (DD): A critical diagnostic alarm reported by the

component, which will, as long as it is not corrected prevent the safety
function from being executed

▷ Dangerous undetected (DU): A critical dangerous failure which is not

reported and remains hidden until the next test or demanded activation of
the safety function
Common cause failures

Common cause failure

A failure that is the result of one or more events, causing concurrent failures of
two or more separate channels in a multiple channel system, leading to system
failure [IEC 61508]

▷ Typical examples: loss of electricity supply, massive physical destruction

▷ More subtle example: loss of clock function (electronics), common

maintenance procedure
Reliability: definitions

Reliability [ISO 8402]

The ability of an item to perform a required function, under given environmental
and operational conditions for a stated period of time.

▷ The reliability 𝑅(𝑡) of an item at time 𝑡 is the probability that the item
performs the required function in the interval [0–𝑡] given the stress and
environmental conditions in which it operates
Reliability: definitions

▷ If 𝑋 is a random variable representing time to failure of an item, the

survival function (or reliability function) 𝑅(𝑡) is

𝑅(𝑡) = Pr(𝑋 > 𝑡)

▷ 𝑅(𝑡) represents the probability that the item is working correctly at time 𝑡

▷ Properties:
• 𝑅(𝑡) is non-increasing (no rising from the dead)
• 𝑅(0) = 1 (no immediate death/failure)
• lim
𝑡→∞
𝑅(𝑡) = 0 (no eternal life)
Interpreting the reliability function

1 1

P(T ≤ t)

Survival function R(t)

Probability F(t)

P(T > t)

0
t 0
t
Time to failure (T) Time to failure (T)

Cumulative distribution function Reliability function

Tells you the probability that lifetime is ≤ 𝑡 Tells you the probability that lifetime is > 𝑡

𝐹(𝑡) = 𝑃(𝑇 ≤ 𝑡) 𝑅(𝑡) = 𝑃(𝑇 > 𝑡) = 1 − 𝐹(𝑡)

Exercise

Problem
The lifetime of a modern low-wattage electronic light bulb is known to be
exponentially distributed with a mean of 8000 hours.

Q1 Find the proportion of bulbs that may be expected to fail before 7000
hours use.

Q2 What is the lifetime that we have 95% confidence will be exceeded?

For more on the reliability of solid-state lamps, see energy.gov

Exercise

Solution
The time to failure of our light bulbs can be modelled by the distribution

dist = scipy.stats.expon(scale=8000)

Q1: The CDF gives us the probability that the lifetime is ≤ 𝑡. We want
dist.cdf(7000) which is 0.583137. So about 58% of light bulbs will fail
before they reach 7000 hours of operation.

Q2: We need the 0.05 quantile of the lifetime distribution, dist.ppf(0.05)

which is around 410 hours.
Exercise

Problem
A particular electronic device will only function correctly if two essential
components both function correctly. The lifetime of the first component
is known to be exponentially distributed with a mean of 5000 hours and
the lifetime of the second component (whose failures can be assumed to be
independent of those of the first component) is known to be exponentially
distributed with a mean of 7000 hours. Find the proportion of devices that
may be expected to fail before 6000 hours use.
Exercise

Solution
The device will only be working after 6000 hours if both components are
operating. The probability of the first component still working is

> pa = 1 - scipy.stats.expon(scale=5000).cdf(6000)
> pa
0.3011942119122022

and likewise for the second component

> pb = 1 - scipy.stats.expon(scale=7000).cdf(6000)
> pb
0.42437284567695

The probability of both working is pa × pb = 0.127818, so the proportion of

devices that can be expected to fail before 6000 hours use is around 87%.
Hazard function

Hazard function
The hazard function or failure rate function ℎ(𝑡) gives the conditional probability
of failure in the interval 𝑡 to 𝑡 + 𝑑𝑡, given that no failure has occurred by 𝑡.

𝑓 (𝑡)
ℎ(𝑡) =
𝑅(𝑡)

where 𝑓 (𝑡) is the probability density function (failure density function) and
𝑅(𝑡) is the reliability function.

It’s the probability of quitting a given state after having spent a given time
in that state.
Bathtub curve

▷ Early failure (“burn-in”, “infant Decreasing

failure
Constant
failure
Increasing
failure
mortality” period): high hazard rate rate rate rate

due to manufacturing and design

problems

Failure rate
Early Observed failure
“infant rate
▷ Useful life period: probability of mortality”
failure Wear Out
failures
failure is roughly constant Constant (random)
failures

▷ Wearout period: hazard rate starts

to increase due to aging (corrosion,
Time
wear, fatigue)
Reliability measures

∞
▷ Mean time to failure (MTTF) = 𝔼(𝑇 ) = ∫0 𝑅(𝑡)𝑑𝑡

▷ Often calculated by dividing the total operating time of the units tested by
the total number of failures encountered

▷ Often modelled by a Weibull distribution (systems affected by wear) or an

exponential distribution (systems not affected by wear, such as
electronics)
Availability

Availability
The ability of an item (under combined aspects of its reliability, maintainability
and maintenance support) to perform its required function at a stated instant of
time or over a stated period of time [BS 4778]

▷ The availability 𝐴(𝑡) of an item at time 𝑡 is the probability that the item is
correctly working at time 𝑡

𝑀𝑇 𝑇 𝐹
▷ Mean availability =
𝑀𝑇 𝑇 𝐹 + 𝑀𝑇 𝑇 𝑅
Reliability ≠ availability

Reliable system with poor availability

Note the important difference
MTTF MTTR
between:
▷ reliability (failure-free operation
during an interval), measured by
the MTTF
time
▷ availability (instantaneous
Available system with poor reliability failure-free operation on demand,
independently of number of
failure/repair cycles), measured by

MTTF
𝐴=
MTTF + MTTR
time

≠
Also note that reliability
safety
Maintainability

Maintainability
The ability of an item, under stated conditions of use, to be retained in, or restored
to, a state in which it can perform its required functions, when maintenance is
performed under stated conditions and using prescribed procedures and resources
[BS4778]

▷ Measured by MTTR: mean time to repair

▷ Commonly modelled by a lognormal distribution

Reliability measures

MTBF = MTTR + MTTF

MTBF

MTTF MTTR
operational under repair

time

fault multiple errors are possible in this period

een
MTBF: mean time betw
failures
Exercise

Problem
For a large computer installation, the maintenance logbook shows that over a
period of a month there were 15 unscheduled maintenance actions or downtimes,
and a total of 1200 minutes in emergency maintenance status. Based upon
prior data on this equipment, the reliability engineer expects repair times to be
exponentially distributed. A warranty contract between the computer company
and the customer calls for a penalty payment for any downtime exceeding 100
minutes. Find the following:
1 The MTTR and repair rate

2 The probability that the warranty requirement is being met

3 The median time to repair

4 The time within which 95% of the maintenance actions can be completed
Exercise

Solution
1 MTTR = 1200/15 = 80 minutes and the repair rate μ is 1/80 = 0.0125. Our
probability distribution for repair times is dist =
scipy.stats.expon(scale=80).

2 The probability of time to repair not exceeding 100 minutes is

dist.cdf(100) = 71%.

3 The median time to repair is dist.ppf(0.5) = 55 minutes.

4 The time within which 95% of the maintenance actions can be completed is
dist.ppf(0.95) = 240 minutes.
Exercise

Problem
From field data in an oil field, the time to failure of a pump, 𝑋, is known to be
normally distributed. The mean and standard deviation of the time to failure are
estimated from historical data as 3200 and 600 hours, respectively.
1 What is the probability that a pump will fail after it has worked for 2000 hours?

2 If two pumps work in parallel (the system can meet performance requirements
with a single operating pump), what is probability that the system will fail after
it has worked for 2000 hours? Assume that pump failures are independent
events.
Exercise

Solution

1 Random variable 𝑋 can be represented by the model scipy.stats.norm(3200, 600).

We want to assess Pr(𝑋 > 2000), which is 1 − Pr(𝑋 ≤ 2000), or 1 -
scipy.stats.norm(3200, 600).cdf(2000), or 0.977.
2 Let’s call 𝑌 the random variable representing time to failure of the redundant pump system,
and 𝑋1 and 𝑋2 the time to failure of pumps 1 and 2 respectively. We want to determine
Pr(𝑌 > 2000), which is 1 − Pr(𝑌 ≤ 2000).
This is 1 − Pr(𝑋1 ≤ 2000 ∧ 𝑋2 ≤ 2000) (given the parallel configuration of the pumps, the
system fails when both of the pumps fail).
Given that pump failure is independent, that’s 1 − Pr(𝑋1 ≤ 2000) × Pr(𝑋2 ≤ 2000). It’s 1 -
scipy.stats.norm(3200, 600).cdf(2000)**2, which is 0.9994.
Further reading
This presentation is distributed under the terms of the
Creative Commons Attribution – Share Alike licence

▷ Wired.com article Why Things Fail: From Tires to Helicopter Blades,

Everything Breaks Eventually from 2010

For more free content on risk engineering,

visit risk-engineering.org
Feedback welcome!
This presentation is distributed under the terms of the
Creative Commons Attribution – Share Alike licence

@LearnRiskEng

fb.me/RiskEngineering

Was some of the content unclear? Which parts were most useful to
you? Your comments to [email protected]
(email) or @LearnRiskEng (Twitter) will help us to improve these https://risk-engineering.org/
materials. Thanks! reliability-engineering/

For more free content on risk engineering,

visit risk-engineering.org

Chapter 2 Maintnability Reliability and Availability
100% (1)
Chapter 2 Maintnability Reliability and Availability
60 pages
Fundamentals of Reliability Engineering and Applications
100% (1)
Fundamentals of Reliability Engineering and Applications
63 pages
Weibull Analysis - ARMS
No ratings yet
Weibull Analysis - ARMS
35 pages
0-Maintenance Performance
100% (1)
0-Maintenance Performance
29 pages
BTH 780 Reliability Engineering
No ratings yet
BTH 780 Reliability Engineering
31 pages
Repairable and Non-Repairable Items: When Only One Failure Can Occur
100% (1)
Repairable and Non-Repairable Items: When Only One Failure Can Occur
70 pages
10 Reliability Model
No ratings yet
10 Reliability Model
19 pages
Reliability in Maintenance: Source: Chapter 8 From Maintenance Engineering and Management by R.C.Mishra
No ratings yet
Reliability in Maintenance: Source: Chapter 8 From Maintenance Engineering and Management by R.C.Mishra
20 pages
D02-D03 Reliasoft Weibull++ 8
No ratings yet
D02-D03 Reliasoft Weibull++ 8
150 pages
Acourse of Pure Mathematics Cambrige
No ratings yet
Acourse of Pure Mathematics Cambrige
587 pages
Module 8: Breakeven and Sensitivity Analysis (Chap 11)
No ratings yet
Module 8: Breakeven and Sensitivity Analysis (Chap 11)
25 pages
Reliability
100% (2)
Reliability
56 pages
Design For Reliability PDF
No ratings yet
Design For Reliability PDF
44 pages
Sample - Solution Manual Reliability Engineering by Singiresu Rao
No ratings yet
Sample - Solution Manual Reliability Engineering by Singiresu Rao
18 pages
Mensuration Maths
No ratings yet
Mensuration Maths
7 pages
Introduction To Reliability
100% (2)
Introduction To Reliability
37 pages
IE 443: Tutorial 1 1. Introduction To Maintenance and Terminology
100% (1)
IE 443: Tutorial 1 1. Introduction To Maintenance and Terminology
3 pages
Chap 4 MNGT Acctng PDF
No ratings yet
Chap 4 MNGT Acctng PDF
4 pages
Reliability: Federal University of Technology Owerri
100% (1)
Reliability: Federal University of Technology Owerri
105 pages
Condition Based Maintenance Optimization Considering Multiple Objectives
100% (1)
Condition Based Maintenance Optimization Considering Multiple Objectives
9 pages
PM CW1 Assignment Explanation
No ratings yet
PM CW1 Assignment Explanation
41 pages
Reliability Engineering
No ratings yet
Reliability Engineering
57 pages
Applied Statistics and Probability For Engineers Chapter - 8
No ratings yet
Applied Statistics and Probability For Engineers Chapter - 8
13 pages
Reliability Engineering Learning Matrix
100% (1)
Reliability Engineering Learning Matrix
1 page
Reliability and Maintenance (MANE 4015) : Instructor: Dr. Sayyed Ali Hosseini Winter 2015 Lecture #4
100% (1)
Reliability and Maintenance (MANE 4015) : Instructor: Dr. Sayyed Ali Hosseini Winter 2015 Lecture #4
26 pages
Logit Model For Binary Data
No ratings yet
Logit Model For Binary Data
50 pages
Modeling and Design of Plate Heat Exchanger
No ratings yet
Modeling and Design of Plate Heat Exchanger
33 pages
Reliability
100% (1)
Reliability
27 pages
Investor: Awareness Guide
100% (1)
Investor: Awareness Guide
24 pages
SIS Book - Chapter 03 - FMECA
No ratings yet
SIS Book - Chapter 03 - FMECA
54 pages
Chapter 4 & 5
No ratings yet
Chapter 4 & 5
12 pages
On The Consideration of Reliability in The Life Cycle Cost Analysis
No ratings yet
On The Consideration of Reliability in The Life Cycle Cost Analysis
10 pages
The Basic Reliability Calculations
100% (2)
The Basic Reliability Calculations
30 pages
Assignment-MN572-Reliability Engineering
100% (1)
Assignment-MN572-Reliability Engineering
1 page
Class 06 - Time Dependent Failure Models
100% (1)
Class 06 - Time Dependent Failure Models
37 pages
Reliability & Maintenance by Shalabh Capoor
No ratings yet
Reliability & Maintenance by Shalabh Capoor
38 pages
Chapter 7-01 Reliability and Maintainability
No ratings yet
Chapter 7-01 Reliability and Maintainability
30 pages
Fault Tree Analysis For Maintenance Need
No ratings yet
Fault Tree Analysis For Maintenance Need
23 pages
Reliability Enginnering: Presented by
100% (1)
Reliability Enginnering: Presented by
15 pages
Fmea PDF
100% (1)
Fmea PDF
36 pages
Maintenance and Reliability: © 2011 Pearson Education, Inc. Publishing As Prentice Hall
No ratings yet
Maintenance and Reliability: © 2011 Pearson Education, Inc. Publishing As Prentice Hall
55 pages
Risk Reliability
No ratings yet
Risk Reliability
43 pages
2017 H2 Math Functions Lecture Notes
No ratings yet
2017 H2 Math Functions Lecture Notes
32 pages
Determining Leading and Lagging
No ratings yet
Determining Leading and Lagging
6 pages
Maintenance Engineering (CH: 2,0) : Instructors: Dr. M. Zeeshan Zahir Engr. Adnan Rasheed
No ratings yet
Maintenance Engineering (CH: 2,0) : Instructors: Dr. M. Zeeshan Zahir Engr. Adnan Rasheed
25 pages
4 - Basic Concepts of Maintenance
No ratings yet
4 - Basic Concepts of Maintenance
19 pages
Reliability Engineering: Kartik Gupta 2K13/PE/016
100% (1)
Reliability Engineering: Kartik Gupta 2K13/PE/016
15 pages
3a-Case Study XIX Reliability Analysis of Air Handler Units
No ratings yet
3a-Case Study XIX Reliability Analysis of Air Handler Units
16 pages
Reliability Block Diagrams
No ratings yet
Reliability Block Diagrams
9 pages
MS Excel 280 Short Keys Guide Book
No ratings yet
MS Excel 280 Short Keys Guide Book
36 pages
Plant Maintenance Engineering
No ratings yet
Plant Maintenance Engineering
7 pages
Total Productive Maintenance
No ratings yet
Total Productive Maintenance
3 pages
Availability and Reliability
No ratings yet
Availability and Reliability
25 pages
Maintenance Engineering (CH: 2,0) : Instructors: Dr. M. Zeeshan Zahir Engr. Adnan Rasheed
No ratings yet
Maintenance Engineering (CH: 2,0) : Instructors: Dr. M. Zeeshan Zahir Engr. Adnan Rasheed
13 pages
2G Kpi
No ratings yet
2G Kpi
61 pages
Final Exam Review
No ratings yet
Final Exam Review
46 pages
Reliability Lecture Notes
No ratings yet
Reliability Lecture Notes
12 pages
The ECOSSE Control HyperCourse
No ratings yet
The ECOSSE Control HyperCourse
234 pages
The Seven Questions of Reliability Centered Maintenance by Bill Keeter and Doug Plucknette, Allied Reliability
No ratings yet
The Seven Questions of Reliability Centered Maintenance by Bill Keeter and Doug Plucknette, Allied Reliability
3 pages
Labyrinth Type Seals FMEA
No ratings yet
Labyrinth Type Seals FMEA
1 page
Fundamentals of Reliability Engineering Course Offered at VIT Vellore
No ratings yet
Fundamentals of Reliability Engineering Course Offered at VIT Vellore
2 pages
Maintenance and Reliability - Theory: John E. Skog P.E. WGA3-06 Tutorial June 2006 Rio de Janeiro
No ratings yet
Maintenance and Reliability - Theory: John E. Skog P.E. WGA3-06 Tutorial June 2006 Rio de Janeiro
59 pages
Cryptography
No ratings yet
Cryptography
3 pages
Module 11 Unit 1 Correlation Analysis
No ratings yet
Module 11 Unit 1 Correlation Analysis
13 pages
HIMA 4160 Syllabus 601 f09
No ratings yet
HIMA 4160 Syllabus 601 f09
8 pages
The Role of N H P P Models in The Practical Analysis of Maintenance Failure Data
No ratings yet
The Role of N H P P Models in The Practical Analysis of Maintenance Failure Data
8 pages
Reliability Ab PDF
No ratings yet
Reliability Ab PDF
4 pages
Note7 - Reliability - Theory - Revised
No ratings yet
Note7 - Reliability - Theory - Revised
5 pages
Ce2257 Lab Manual
No ratings yet
Ce2257 Lab Manual
53 pages
6.1 Sequences 6.1.1 Finding A Rule Position-To-Term and NTH Term
No ratings yet
6.1 Sequences 6.1.1 Finding A Rule Position-To-Term and NTH Term
14 pages
Maint W Reliability Concept
No ratings yet
Maint W Reliability Concept
20 pages
A Introduction To Advanced Process Control For Managers
No ratings yet
A Introduction To Advanced Process Control For Managers
3 pages
Unit - 2
No ratings yet
Unit - 2
33 pages
Csi Algebra Unit 2 The Real Number System
No ratings yet
Csi Algebra Unit 2 The Real Number System
22 pages
1 PDF
No ratings yet
1 PDF
2 pages
Lectr14 (STOCHASTIC PROCESSES)
No ratings yet
Lectr14 (STOCHASTIC PROCESSES)
49 pages
Midterm - Revision (TA Aladin)
No ratings yet
Midterm - Revision (TA Aladin)
40 pages
Bachelor of Electrical and Electronics Engineering
No ratings yet
Bachelor of Electrical and Electronics Engineering
1 page
GAN-based Synthetic Medical Image Augmentation
No ratings yet
GAN-based Synthetic Medical Image Augmentation
10 pages
Question Bank CS AI - VI Sem - 1-5
No ratings yet
Question Bank CS AI - VI Sem - 1-5
2 pages
Energies 11 02626 PDF
No ratings yet
Energies 11 02626 PDF
16 pages
Formation Characterization Well Logs
No ratings yet
Formation Characterization Well Logs
26 pages
Chapter 3.3 - Formula - Laplace Transform
No ratings yet
Chapter 3.3 - Formula - Laplace Transform
3 pages
TBC Network Adjustment Settings Australia
No ratings yet
TBC Network Adjustment Settings Australia
19 pages
Design of Quadrilateral Learning With RME Approach For Junior High School Students
No ratings yet
Design of Quadrilateral Learning With RME Approach For Junior High School Students
13 pages
Ngineering ATA Nalysis: Math 4
No ratings yet
Ngineering ATA Nalysis: Math 4
14 pages
Script-Template-Developing-Video-Lesson Math 3 Q3 Lesson 65 W6
No ratings yet
Script-Template-Developing-Video-Lesson Math 3 Q3 Lesson 65 W6
9 pages
18Mem/Mpd/Mpe/Mpm/Mpt/ Mpy/Mse/Mde/Mea/Mmd11: (10 Marks)
No ratings yet
18Mem/Mpd/Mpe/Mpm/Mpt/ Mpy/Mse/Mde/Mea/Mmd11: (10 Marks)
2 pages

Overview of Reliability Engineering: Eric Marsden

Uploaded by

Overview of Reliability Engineering: Eric Marsden

Uploaded by

Overview of reliability engineering

▷ I am purchasing pumps for my refinery and want to understand the

▷ I want to compare different system designs to determine the impact of

▷ Reliability engineering is the discipline of ensuring that a system will

▷ Reliability engineers address 3 basic questions:

▷ A failure is always related to a required function. The function is often

▷ A failure occurs when the function cannot be performed or has a

Source: International Electrotechnical Vocabulary (IEV) part 192 on dependability

▷ While a failure is an event that occurs at a specific point in time, a fault is

Source: International Electrotechnical Vocabulary (IEV) part 192 on dependability

▷ An error is present when the performance of a function deviates from the

▷ An error will often, but not always, develop into a failure

▷ An item can fail in many different ways: a failure mode is a description of

IEC 61508 classifies failures according to their:

IEC 61508: Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems

Assumption: nothing in the past determines

Failure and repair are stochastic processes.

rate of non-detected and

rate of non-detected and

▷ Safe undetected (SU): A spurious (untimely) activation of a component

▷ Safe detected (SD): A non-critical alarm raised by the component

▷ Dangerous detected (DD): A critical diagnostic alarm reported by the

▷ Dangerous undetected (DU): A critical dangerous failure which is not

Common cause failure

▷ Typical examples: loss of electricity supply, massive physical destruction

▷ More subtle example: loss of clock function (electronics), common

Reliability [ISO 8402]

▷ If 𝑋 is a random variable representing time to failure of an item, the

𝑅(𝑡) = Pr(𝑋 > 𝑡)

Survival function R(t)

Cumulative distribution function Reliability function

𝐹(𝑡) = 𝑃(𝑇 ≤ 𝑡) 𝑅(𝑡) = 𝑃(𝑇 > 𝑡) = 1 − 𝐹(𝑡)

Q2 What is the lifetime that we have 95% confidence will be exceeded?

For more on the reliability of solid-state lamps, see energy.gov

Q2: We need the 0.05 quantile of the lifetime distribution, dist.ppf(0.05)

and likewise for the second component

The probability of both working is pa × pb = 0.127818, so the proportion of

▷ Early failure (“burn-in”, “infant Decreasing

due to manufacturing and design

▷ Wearout period: hazard rate starts

▷ Often modelled by a Weibull distribution (systems affected by wear) or an

Reliable system with poor availability

▷ Measured by MTTR: mean time to repair

▷ Commonly modelled by a lognormal distribution

MTBF = MTTR + MTTF

fault multiple errors are possible in this period

2 The probability that the warranty requirement is being met

3 The median time to repair

2 The probability of time to repair not exceeding 100 minutes is

3 The median time to repair is dist.ppf(0.5) = 55 minutes.

1 Random variable 𝑋 can be represented by the model scipy.stats.norm(3200, 600).

▷ Wired.com article Why Things Fail: From Tires to Helicopter Blades,

For more free content on risk engineering,

For more free content on risk engineering,

You might also like