Lecture 9 - System Reliability
Lecture 9 - System Reliability
Lecture 9 - System Reliability
1
Overview
• What is reliability ?
• What is MTBF ?
• How do you calculate reliability from MTBF ?
• Accelerated testing & Historical analogy
• How do you combine reliability of components
– In parallel
– In series
• Reducing a complex system
• Using reliability to drive system design
• Traditional assumed reliability=1 components
• These topics are important for the Fundamentals of
Engineering (FE) and Professional Engineer (PE)
examination
2
What is Reliability ?
• Reliability is the probability that a system will successfully
complete its function
• Probability of Failure = 1-Reliability
• If you can observe a system for a sufficiently large number of
operations you can directly compute reliability
number of successful operations
Reliabilit y
number of total operations
– To get statistical significance would have to observe the
system long enough for many failures to occur
• 30 occurrences is a good working rule of thumb
• This is hard to do in complex systems with many components
because you have to observe all of the components failing
• particularly difficult if they are in some redundant arrangement
and you have to observe multiple failures or combinations of
failures
• Better approach is to compute the reliability from the mean time
between failure (MTBF) for each components and combine them
to form a system reliability
3
MTBF
• MTBF is the Mean Time Between Failure
• Relatively easy to determine on a component basis based on
test for components that are run continuously or for fixed
periods of time
– Best to perform multiple tests and get a significant number
of failures to get a distribution
– Compute a mean and standard deviation (sigma)
• From this type of test we can determine a failure rate per
number of hours of operation
• What if
– You are early in design and components haven’t been built yet ?
– You can’t afford to wait around to accumulate failure data ?
– You don’t have enough components to test
• You can perform accelerated life testing
– For electronics this is performed at high temperature
• MIL-SPEC 810 describes tests
• Good overview at www.weibull.com
• Mil-HBK 217 has models for predicting electronics failure
based on Parts Stress Analysis and Parts Count Analysis
– Historically Accelerated Life Testing (HALT) is an extreme test to
determine weakest part of design
– Highly Accelerated Stress Screening (HASS) or Environmental
Stress Screening (ESS) is a less rigorous environmental test used
to screen every production part
• You can use historical reliabilities for like components
– NASA has a database
– Aerospace Corp has a database
– Various published articles
5
MIL HBK 217 Prediction Model
• Typical MIL-217 Failure Rate Model
• A sample MIL-217 failure rate model for a simple semiconductor component
is shown below. Many components, especially microcircuits, have
significantly different and more complex models.
• Failure rate = pib * piT * piA * piR * piS * piC * piQ * piE Failures/million
Hours
• Where:
piT = Temperature factor
piA = Application factor (linear, switching, etc)
piR = Power rating factor
piS = Electrical (voltage) Stress factor
piC = Contact construction factor
piQ = Quality factor
piE = Operating environment factor
• The above listed pi factors are based on a simple component and are
shown for example. There are also pi factors for items such as learning
factor, die complexity factor, manufacturing process factor, device
complexity factor, programming cycles factor, package type factor, etc.
Each component or part group and it's associated subgroup has a base
failure rate plus numerous pi factor tables, unique to that component or part,
that list factors that are used in the model to adjust the base failure rate.
http://www.reliabilityeducation.com/intro_mil217.html 6
MIL 217 Stress Example
• A solid tantalum fixed electrolytic capacitor, for example, has a MIL-217 model as follows:
• Failure rate = pib * piCV * piSR * piQ * piE Failures/million Hours
• Where:
piCV = Base failure rate for component
piSR = Series resistance factor
piQ = Quality factor (quality levels of D, C, S, B, R, P, M, L, Lower)
piE = Environment factor
• MIL-217 has models for
– Microcircuits
– Discrete semi-conductors
– Tubes
– Lasers
– Resistors
– Capacitors
– Inductive devcices
– Rotating devices (motors)
– Relays
– Switches
– Connectors
– Interconnection assemblies
– Meters, Quartz crystals, Lamps, fuses
http://www.reliabilityeducation.com/intro_mil217.html
7
MIL 217 Parts Count Analysis from MIL
HBK 217
8
Example of Historical databases – Launch
vehicle failures by vehicle and subsystem
Launch Vehicle Subsystem Failures, 1980–1999
U.S. 15 4 8 1 1 1 30
CIS/USSR 33 3 2 1 19 58
Europe 7 1 8
China 3 1 2 6
Japan 2 1 3
India 1 1 1 1 1 5
Israel 1 1
Brazil 2 2
N. Korea 1 1
Total 64 11 11 2 3 3 20 114
http://www.aero.org/publications/crosslink/winter2001/03.html
9
Mean
10
Variance of a population
N
1
(Xn X )
2
x
2
N n1
The standard deviation is the square root of the variance
11
Standard Deviation
• The standard deviation of a population is defined as
N
1
SN
N
n
( X
n 1
X ) 2
Sx to use ifNyou have 100 percent sampled the entire population
This is the equation
•
Use
with
1 N
SN
N
This is the equation to use if you
are1
(Xn X )
dealing
2 failure
test
n 1 with a sample of the population and
trying to estimate the entire population’s characteristics data
x S N
12
From “Statistics
For
Dummies”by
Deborah
Rumsey”
Caution – these
probabilities are for
normal distributions
only. Not all data fits
a normal distribution.
It may be log-normal,
exponential, etc…
which have different
interpretations.
13
Standard Deviations In A Normal Distribution
15
MTBF to Reliability
Assuming a normal distribution for failures about the mean
(standard practice)
Reliability e t
Probability of Failure 1 e t
where
mean value for failures/unit time
t time period over which the reliability is calculated
example
if MTBF100 hours then average failure rate is 1/100 ,
the Probability of Failure over 200 hours of operation would be
1
x 200
Probability of Failure 200 hrs 1 e 100
.865
Reliability .135
16
Behavior of Reliability and Failure Rate
over Time
Specified
Failure Probability of
Rate Time failure Reliability Reliability Curves for average
failure rate =.01
0.01 1 0.009950166 0.99005
17
Parallel versus Series
18
Parallel versus Series - Example
Reliability series a &b (.99) * (.99) .9801
.99 .99
.99
19
To calculate reliability of more complex
systems
20
Example System Reduction
B E
.80 .80
A D
.99 .99
C F
.80 .80
To reduce
Combine B and C (1-(1-.8)(1-.8))=.96
Combine E and F = .96
21
Using reliability as a driver in system design
22
Component Versus
System Redundancy
Pump Evap
Example - a
coolant system
Load
System B
Pump B Evap B
Load
Component Redundancy System Redundancy 23
Pro’s Con’s
Component Lowest Cost, weight Does not provide full coverage
Redundancy Lowest System Impact
Best when one
component has a
significantly higher
failure rate than other
components
System Best possible Highest Cost, weight
Redundancy coverage Highest System impact
May not achieve full effect if common
mode failures in supporting systems
(power, instrumentation) or subject to a
common failure location (single failure
takes out both systems – Apollo 13)
24
Assumed Reliability=1.0
25
Margin Design Capability
exceeds requirements
Factor of Safety
29