Lecture 10 Reliability
Lecture 10 Reliability
Lecture 10 Reliability
Reliability
October 26, 2004
2
Today
DFDC (Design for a Developing Country)
HW November 2
detailed design
Parts list
Trade-off
Midterm November 4
Factory Visit November 16th
3
Midterm
Presentation Purpose- a midcourse correction
less than 15 minutes with 5 minutes discussion
Approx. 7 power point slides- all should participate in
presentation
Show what you have done
Show what you are going to do
Discuss issues, barriers and plans for overcoming
(procedural, team, subject matter, etc.
Scored on originality, candor, thoughtfullness, etc. not
on total amount accomplished
Schedule today from 1:00 to 4:00 (speaker at 4:00 PM)
4
Reliability
The probability that no (system) failure will occur
in a given time interval
A reliable system is one that meets the
specifications Do you accept this?
5
What do Reliability Engineers Do?
Implement Reliability Engineering
Programs across all functions
Engineering
Research
manufacturing
Testing
Packaging
field service
6
Reliability as a Process module
Reliability Goals
Schedule time
Budget Dollars
Test Units
Design Data
Reliability
Assurance
Module
Internal Methods
Design Rules
Components Testing
Subsystem Testing
Architectural Strategy
Life Testing
Prototype testing
Field Testing
Reliability Predictions
(models)
INPUT
Product
Assurance
7
Early product failure
Strongest effect on customer satisfaction
A field day for competitors
The most expensive to repair
Why?
Rings through the entire production system
High volume
Long C/T (cycle time)
Examples from GE (but problem not confined to GE!)
GE Variable Power module for House Air Conditioning
GE Refrigerators
GE Cellular
8
Early Product Failure
Can be catastrophic for human life
Challenger, Columbia
Titanic
DC 10
Auto design
Aircraft Engine
Military equipment
9
# of components
in Series
Component
Reliability =
99.999%
Component
Reliability =
99.99%
100 99.9 99.01
250 99.75 97.53
500 99.50 95.12
1000 99.01 90.48
10,000 90.48 36.79
100,000 36.79 0.01
Reliability as a function of System Complexity
Why computers made of tubes (or discrete transistors)
cannot be made to work
10
Three Classifications of
Reliability Failure
Type
Early (infant mortality)
Wearout (physical
degradation)
Chance (overstress)
Old Remedy- Repair mentality
Burn-in
Maintenance
In service testing
11
Bathtub Curve
Infant
Mortality
Useful life
No memory
No improvement
No wear-out
Random causes
Wear out
Failure Rate
#/million hours
Time
12
Reliability
Age
Prob
of dying
in the next
year
(deaths/
1000)
0
10
20
30
40
50
60
70
80
90
025
1
2
1
6
1
9
3
0
5
0
7
0
8
6
From the Statistical Bulletin 79, no 1, Jan-Mar 1998
13
Early failure causes or infant mortality
(Occur at the beginning of life and then disappear)
Manufacturing Escapes
workmanship/handling
process control
materials
contamination
Improper installation
14
Chance Failures
(Occur throughout the life a product at a constant rate)
Insufficient safety factors in design
Higher than expected random loads
Human errors
Misapplication
Developing world concerns
15
Wear-out
(Occur late in life and increase with age)
Aging
degradation in strength
Materials Fatigue
Creep
Corrosion
Poor maintenance
Developing World Concerns
16
Failure Types
Catastrophic
Degradation
Drift
Intermittent
17
Failure Effects
(What customer experiences)
Noise
Erratic operation
Inoperability
Instability
Intermittent operation
Impaired Control
Impaired operation
Roughness
Excessive effort requirements
Unpleasant or unusual odor
Poor appearance
18
Failure Modes
Cracking
Deformation
Wear
Corrosion
Loosening
Leaking
Sticking
Electrical shorts
Electrical opens
Oxidation
Vibration
Fracturing
19
Reliability Remedies
Early
Wearout
Chance
Quality
manufacture/Robust
Design
Physically-based
models, preventative
maintenance, Robust
design (FMEA)
Tight customer linkages,
testing, HAST
20
Reliability
semi-empirical formulae
2 2
2 / ) (
2
1
) (
M T
e T f
Wear out
Chance Failure
Early failure
T
m
T
e
m
e T f
1
1
) (
k =constant failure rate
m=MTBF
) ( 1
2
1
) ( ) (
k T
e k T k T f
=pdf
21
Failures Vs time as a function of
Stress
High Stress
Medium Stress
Low Stress
22
Highly Accelerated Stress
Testing
Test to Failure
Fix Failed component
Continue to Test
Appropriate for developing world?
23
Duane Plot
Reinertson p 237
Log
Failures
per 100
hours
Log Cumulative Operating Hours
x
x
x
x
x
x
x
x
x x
x x
x
x
x
Actual Reliability
Required Reliability
at Introduction
Predicted
24
Integration into the Product Development Process
FMEA- Failure Modes and Effects Analysis
Customer
Requirements
Baseline
data from
Previous
Products
Brainstorm
potential failures
Summarize
results
(FMEA)
Update
FMEA
Baseline
data from
Previous
Products
Feed results
to Risk Assessment
Process
Use at
Design
Reviews
Develop Failure
Compensation
Provisions
Test Activity
Uncovers new
Failure modes
Failure prob-
through test/field
data
Probabilities
developed
through analysis
25
Risk Assessment process
Assess risk
Program Risk
Market Risk
Technology Risk
Reliability Risk
Systems Integration Risk
Devise mitigation Strategy
Re-assess
26
Fault Tree analysis
Seal Regulator
Valve Fails
Valve Fails Open
when commanded
closed
Fails to meet
response time
Excessive
leakage
Regulates
High
Regulates
Low
Fails closed
when commanded
open
Excessive
hysteresis
Excessive
port leakage
Excessive
case leakage
Fails to meet
response time
Fails to meet
response time
1 5 4 3 2
6 7 8 9
Next
Page
27
Fault Tree analysis (cont)
Valve Fails Open
when commanded
closed
1
Valve Fails Open
when commanded
closed
Mechanical
Failure
Selenoid
Electrical
Failure of
Selenoid
Open
Circuit
Coil short
Insulation
Solder Joint
Failure
Wire
Broken
corosion
Armature
seals
Material
selection
wear
Material
selection
Contamination
Valve
orientation
Insuff
filtering
Wire
Broken
Transient
electro mechanical
force
28
FMEA
29
FMEA Root Cause Analysis
30
Fault Tree Analysis-
example
Example: A solar cell driven LED
31
Reliability Management
Redundancy
Examples
Computers
memory chips?
Aircraft
What are the problems with this approach
1. Design inelegance
expensive
heavy
slow
complex
2. Sub optimization
Can take the eye off the ball of improving component and system reliability by
reducing defects
Where should the redundancy be allocated
system
subsystem
board
chip
device
software module
operation
32
Other best practices
Fewer Components
Small Batch Size (why)
Better material selection
Parallel Testing
Starting Earlier
Module to systems test allocation
Predictive (Duane) testing
Look for past experience
emphasize re-use
over-design
e.g. power modules
Best: Understand the physics of the failure and model
e.g. Crack propagation in airframes or nuclear reactors
33
Other suggestions?