Lecture 9 - System Reliability

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 29

Lecture 9 – System Reliability

1
Overview

• What is reliability ?
• What is MTBF ?
• How do you calculate reliability from MTBF ?
• Accelerated testing & Historical analogy
• How do you combine reliability of components
– In parallel
– In series
• Reducing a complex system
• Using reliability to drive system design
• Traditional assumed reliability=1 components
• These topics are important for the Fundamentals of
Engineering (FE) and Professional Engineer (PE)
examination
2
What is Reliability ?
• Reliability is the probability that a system will successfully
complete its function
• Probability of Failure = 1-Reliability
• If you can observe a system for a sufficiently large number of
operations you can directly compute reliability
number of successful operations
Reliabilit y 
number of total operations
– To get statistical significance would have to observe the
system long enough for many failures to occur
• 30 occurrences is a good working rule of thumb
• This is hard to do in complex systems with many components
because you have to observe all of the components failing
• particularly difficult if they are in some redundant arrangement
and you have to observe multiple failures or combinations of
failures
• Better approach is to compute the reliability from the mean time
between failure (MTBF) for each components and combine them
to form a system reliability
3
MTBF
• MTBF is the Mean Time Between Failure
• Relatively easy to determine on a component basis based on
test for components that are run continuously or for fixed
periods of time
– Best to perform multiple tests and get a significant number
of failures to get a distribution
– Compute a mean and standard deviation (sigma)
• From this type of test we can determine a failure rate per
number of hours of operation

Tire/Road simulator tester Shock absorber tester Multi-Axis Load Tester


http://www.actminc.com/products/Machines.html 4
Accelerated Testing and Historical Analogy

• What if
– You are early in design and components haven’t been built yet ?
– You can’t afford to wait around to accumulate failure data ?
– You don’t have enough components to test
• You can perform accelerated life testing
– For electronics this is performed at high temperature
• MIL-SPEC 810 describes tests
• Good overview at www.weibull.com
• Mil-HBK 217 has models for predicting electronics failure
based on Parts Stress Analysis and Parts Count Analysis
– Historically Accelerated Life Testing (HALT) is an extreme test to
determine weakest part of design
– Highly Accelerated Stress Screening (HASS) or Environmental
Stress Screening (ESS) is a less rigorous environmental test used
to screen every production part
• You can use historical reliabilities for like components
– NASA has a database
– Aerospace Corp has a database
– Various published articles
5
MIL HBK 217 Prediction Model
• Typical MIL-217 Failure Rate Model
• A sample MIL-217 failure rate model for a simple semiconductor component
is shown below. Many components, especially microcircuits, have
significantly different and more complex models.
• Failure rate = pib * piT * piA * piR * piS * piC * piQ * piE Failures/million
Hours
• Where:
piT = Temperature factor
piA = Application factor (linear, switching, etc)
piR = Power rating factor
piS = Electrical (voltage) Stress factor
piC = Contact construction factor
piQ = Quality factor
piE = Operating environment factor
• The above listed pi factors are based on a simple component and are
shown for example. There are also pi factors for items such as learning
factor, die complexity factor, manufacturing process factor, device
complexity factor, programming cycles factor, package type factor, etc.
Each component or part group and it's associated subgroup has a base
failure rate plus numerous pi factor tables, unique to that component or part,
that list factors that are used in the model to adjust the base failure rate.

http://www.reliabilityeducation.com/intro_mil217.html 6
MIL 217 Stress Example
• A solid tantalum fixed electrolytic capacitor, for example, has a MIL-217 model as follows:
• Failure rate = pib * piCV * piSR * piQ * piE Failures/million Hours
• Where:
piCV = Base failure rate for component
piSR = Series resistance factor
piQ = Quality factor (quality levels of D, C, S, B, R, P, M, L, Lower)
piE = Environment factor
• MIL-217 has models for
– Microcircuits
– Discrete semi-conductors
– Tubes
– Lasers
– Resistors
– Capacitors
– Inductive devcices
– Rotating devices (motors)
– Relays
– Switches
– Connectors
– Interconnection assemblies
– Meters, Quartz crystals, Lamps, fuses
http://www.reliabilityeducation.com/intro_mil217.html

7
MIL 217 Parts Count Analysis from MIL
HBK 217

8
Example of Historical databases – Launch
vehicle failures by vehicle and subsystem
Launch Vehicle Subsystem Failures, 1980–1999

Country Propulsion Avionics Separation Electrical Structural Other Unknown Total

U.S. 15 4 8 1 1 1 30

CIS/USSR 33 3 2 1 19 58

Europe 7 1 8

China 3 1 2 6

Japan 2 1 3

India 1 1 1 1 1 5

Israel 1 1

Brazil 2 2

N. Korea 1 1

Total 64 11 11 2 3 3 20 114

http://www.aero.org/publications/crosslink/winter2001/03.html
9
Mean

10
Variance of a population

N
1
  (Xn  X )
2
x
2

N n1
The standard deviation is the square root of the variance

11
Standard Deviation
• The standard deviation of a population is defined as

N
1
SN 
N
 n
( X
n 1
 X ) 2

 Sx to use ifNyou have 100 percent sampled the entire population
This is the equation

• The standard deviation of a sample is defined as


Use
with
1 N
SN 
N
This is the equation to use if you 
are1
 (Xn  X )
dealing
2 failure
test
n 1 with a sample of the population and
trying to estimate the entire population’s characteristics data
 x S N
12
From “Statistics
For
Dummies”by
Deborah
Rumsey”

Caution – these
probabilities are for
normal distributions
only. Not all data fits
a normal distribution.
It may be log-normal,
exponential, etc…
which have different
interpretations.

13
Standard Deviations In A Normal Distribution

The probability of a value being between mean plus 3 sigma and


mean minus 3 sigma in a normal distribution is 99.6%

Note – This is two sided – can vary the same on either


side of the mean – not all populations of data are two
Source: Wikipedia 14
sided
We can choose
• To compute the mean failure rate based
on the
– mean observed failure rate
– Mean observed failure rate + 1 sigma
– Mean observed failure rate + 2 sigma
– Mean observed failure rate + 3 sigma – most
conservative

15
MTBF to Reliability
Assuming a normal distribution for failures about the mean
(standard practice)
Reliability e  t
Probability of Failure 1  e  t
where
 mean value for failures/unit time
t time period over which the reliability is calculated
example
if MTBF100 hours then average failure rate is 1/100 ,
the Probability of Failure over 200 hours of operation would be
1
 x 200
Probability of Failure 200 hrs 1  e 100
.865
Reliability .135
16
Behavior of Reliability and Failure Rate
over Time
Specified
Failure Probability of
Rate Time failure Reliability Reliability Curves for average
failure rate =.01
0.01 1 0.009950166 0.99005

0.01 5 0.048770575 0.951229 1.2

0.01 10 0.095162582 0.904837 1

0.01 50 0.39346934 0.606531 0.8


Failure Rate
0.6
0.01 100 0.632120559 0.367879 Reliability
0.4
0.01 200 0.864664717 0.135335
0.2
0.01 300 0.950212932 0.049787
0
0.01 400 0.981684361 0.018316
0 200 400 600
0.01 500 0.993262053 0.006738

17
Parallel versus Series

ReliabilitySeries A &B (Rel A ) * (Rel B )


A B system reliability decreases with more components in series

Reliability Parallel A &B 1  ((1  Rel A ) * (1  Rel B ))


A
1 - (Fail A ) * (Fail B )
System reliability increases with components in parallel

18
Parallel versus Series - Example
Reliability series a &b (.99) * (.99) .9801

.99 .99

Reliability Parallela&b 1  (1  .99) * (1  .99) 1  .0001 .9999


.99

.99

19
To calculate reliability of more complex
systems

• Reduce the system


• Combine all series elements by multiplying
the reliabilities
• Combine parallel elements by 1-(product
of the failure rates)
• Keep repeating the process until reduced
to a single block

20
Example System Reduction
B E
.80 .80
A D
.99 .99
C F
.80 .80
To reduce
Combine B and C (1-(1-.8)(1-.8))=.96
Combine E and F = .96

.99 .96 .99 .96


A BC D EF

Combine remaining in series


= A*BC*D*EF=.99*.96*.99*.96= .903

21
Using reliability as a driver in system design

• Reliability may be a driver in system


design in several ways
– You may decide to add redundancy to
components that drive system reliability low
– You may add redundant systems where
component redundancy is not enough
– You may choose to design for maintenance or
repair of lower reliability components
– You may architect the system to fail safe on
failures of lower reliability components

22
Component Versus
System Redundancy

Pump Evap
Example - a
coolant system

Load

Pump A Evap Pump A Evap A


System A
Pump B
Load

System B
Pump B Evap B
Load
Component Redundancy System Redundancy 23
Pro’s Con’s
Component Lowest Cost, weight Does not provide full coverage
Redundancy Lowest System Impact
Best when one
component has a
significantly higher
failure rate than other
components
System Best possible Highest Cost, weight
Redundancy coverage Highest System impact
May not achieve full effect if common
mode failures in supporting systems
(power, instrumentation) or subject to a
common failure location (single failure
takes out both systems – Apollo 13)

24
Assumed Reliability=1.0

• It is not uncommon to assume certain types of systems and devices


are reliability = 1.0 i.e. they are assumed to never fail. Typical
systems and their rationale are:
• Primary and Secondary Structure – rationale is design/analysis/test
demonstrating design capacity above design limit load*factor of
safety
– This is also called “Design For Minimum Risk or DFMR”
• Pyrotechnic devices such as stage separation mechanisms in
launch vehicle, ejection seats in aircraft or automobile airbag
deployments– when properly designed these devices essentially
never fail
– Requires testing to establish minimum energy to initiate the
device and then demonstrating factor of safety above that in
initiating devices
– Initiating systems (electrical circuits that fire the initiators)
typically have reliability less than one and are the most common
failure source in pyrotechnics

25
Margin Design Capability
exceeds requirements

Factor of Safety

Design Capability does


not meet requirements

Design Limit Load

Relationship between Design Limit Load, Design Capability, Factor


of Safety and Design Margin
26
Reliability =1 continued
• Pressure Vessels and plumbing – when designed/tested/analyzed
to a factor of safety (typically 4) above the design limit load.
Fatigue/fracture must also be tracked due to pressure cycles
• Fittings or joints as well as active components such as valves can
have reliability associated with leakage, blockage or fail to function
• Wiring –
– when properly installed with
• protection against mechanical damage
• derated for current and numbers of wires in a bundle
• fusing/circuit breakers to prevent shorts in loads from
stressing the insulation system
– Often wiring is tested with high potential (HiPot) around 500V
placed across the wire to find any breakdowns in insulation
– HiPotTests can occur prior to and sometimes after installation.
Such testing must be carefully performed to have enough
isolation to avoid paths that overstress other electrical
equipment
27
28
Any Questions ?

29

You might also like