Reliability Notes
Reliability Notes
Reliability Notes
1. Introduction 2. Assumptions/Prerequisites
1. Why important 1. Choosing appropriate life
2. Basic terms and models distribution
3. Common difficulties 2. Plotting reliability data
4. Modeling "physical acceleration" 3. Testing assumptions
5. Common acceleration models 4. Choosing a physical acceleration
model
6. Basic non-repairable lifetime
distributions 5. Models and assumptions for
Bayesian methods
7. Basic models for repairable systems
8. Evaluate reliability "bottom-up"
9. Modeling reliability growth
10. Bayesian methodology
Open Rubric
http://www.itl.nist.gov/div898/handbook/apr/apr.htm (1 of 2) [5/22/2001 1:27:36 PM]
8. Assessing Product Reliability
1. Introduction [8.1.]
1. Why is the assessment and control of product reliability important? [8.1.1.]
1. Quality versus reliability [8.1.1.1.]
2. Competitive driving factors [8.1.1.2.]
3. Safety and health considerations [8.1.1.3.]
2. What are the basic terms and models used for reliability
evaluation? [8.1.2.]
1. Repairable systems and non-repairable populations - lifetime
distribution models [8.1.2.1.]
2. Reliability or survival function [8.1.2.2.]
3. Failure (or hazard) rate [8.1.2.3.]
4. "Bathtub" curve [8.1.2.4.]
5. Repair rate or ROCOF [8.1.2.5.]
3. What are some common difficulties frequently found with reliability data
and how are they overcome? [8.1.3.]
1. Censoring [8.1.3.1.]
2. Lack of failures [8.1.3.2.]
4. What is "physical acceleration" and how do we model it? [8.1.4.]
5. What are some common acceleration models? [8.1.5.]
1. Arrhenius [8.1.5.1.]
2. Eyring [8.1.5.2.]
3. Other models [8.1.5.3.]
6. What are the basic lifetime distribution models used for non-repairable
populations? [8.1.6.]
1. Exponential [8.1.6.1.]
2. Weibull [8.1.6.2.]
3. Extreme value distributions [8.1.6.3.]
4. Lognormal [8.1.6.4.]
5. Gamma [8.1.6.5.]
6. Fatigue life (Birnbaum-Saunders) [8.1.6.6.]
7. Proportional hazards model [8.1.6.7.]
7. What are some basic repair rate models used for repairable
systems? [8.1.7.]
1. Homogeneous Poisson Process (HPP) [8.1.7.1.]
2. Non-Homogeneous Poisson Process (NHPP) - power law [8.1.7.2.]
3. Exponential law [8.1.7.3.]
8. How can you evaluate reliability from the "bottom - up" (component failure
mode to system failure rates)? [8.1.8.]
1. Competing risk model [8.1.8.1.]
2. Series model [8.1.8.2.]
3. Parallel or redundant model [8.1.8.3.]
4. R out of N model [8.1.8.4.]
5. Standby model [8.1.8.5.]
6. Complex systems [8.1.8.6.]
9. How can you model reliability growth? [8.1.9.]
1. NHPP power law [8.1.9.1.]
2. Duane plots [8.1.9.2.]
3. NHPP exponential law [8.1.9.3.]
10. How can Bayesian methodology be used for reliability
evaluation? [8.1.10.]
2. Assumptions/Prerequisites [8.2.]
1. How do you choose an appropriate life distribution model? [8.2.1.]
1. Based on failure mode [8.2.1.1.]
2. Extreme value argument [8.2.1.2.]
3. Multiplicative degradation Argument [8.2.1.3.]
4. Fatigue life (Birnbaum-Saunders) model [8.2.1.4.]
8.1. Introduction
This section introduces the terminology and models that will be used to
describe and quantify product reliability. The terminology, probability
distributions and models used for reliability analysis differ in many
cases from those used in other statistical applications.
Detailed 1. Introduction
contents of 1. Why is the assessment and control of product reliability
Section 1
Important?
1. Quality versus reliability
2. Competitive driving factors
3. Safety and health considerations
2. What are the basic terms and models used for reliability
evaluation?
1. Repairable systems, non-repairable populations and
lifetime distribution models
2. Reliability or survival function
3. Failure (or hazard) rate
4. "Bathtub" curve
5. Repair rate or ROCOF
3. What are some common difficulties frequently found with
reliability data and how are they overcome?
1. Censoring
2. Lack of failures
4. What is "physical acceleration" and how do we model it?
5. What are some common acceleration models?
1. Arrhenius
2. Eyring
3. Other models
6. What are the basic lifetime distribution models used for
non-repairable populations?
1. Exponential
2. Weibull
3. Extreme value distributions
4. Lognormal
5. Gamma
6. Fatigue life (Birnbaum-Saunders)
7. Proportional hazards model
7. What are some basic repair rate models used for repairable
systems?
1. Homogeneous Poisson Process (HPP)
2. Non-Homogeneous Poisson Process (NHPP) with
power law
3. Exponential law
8. How can you evaluate reliability from the "bottoms - up"
(component failure mode to system failure rates)?
1. Competing risk model
2. Series model
3. Parallel or redundant model
4. R out of N model
5. Standby model
6. Complex systems
9. How can you model reliability growth?
1. NHPP power law
2. Duane plots
3. NHPP exponential law
10. How can Bayesian methodology be used for reliability
evaluation?
Shipping It takes a long time for a company to build up a reputation for reliability,
unreliable and only a short time to be branded as "unreliable" after shipping a
products flawed product. Continual assessment of new product reliability, and
can destroy on-going control of the reliability of everything shipped, is a critical
a company's necessity in today's competitive business arena.
reputation.
A motion But how many of these units still meet specifications after a week of
picture operation? Or after a month, or at the end of a one year warrantee
instead of a period? That is where "reliability" comes in. Quality is a snapshot at the
snapshot. start of life and reliability is a motion picture of the day by day
operation. Time zero defects are manufacturing mistakes that escaped
final test. The additional defects that appear over time are "reliability
defects" or reliability fallout.
Any A lifetime distribution model can be any probability density function (or PDF) f(t)
continuous defined over the range of time from t = 0 to t = infinity. The corresponding cumulative
PDF distribution function (or CDF) F(t) is a very useful function, as it gives the probability
defined for that a randomly selected unit will fail by time t. The figure below shows the relationship
non native between f(t) and F(t) and gives three descriptions of F(t).
time can be
a lifetime
distribution
model
Note that the PDF f(t) has only non-negative values and eventually either becomes 0 or
decreases towards 0. The CDF F(t) is monotonically increasing and goes from 0 to 1 as t
approaches infinity. In other words, the total area under the curve is always 1.
The The Weibull 2-parameter distribution is an example of a popular F(t). It has the CDF and
Weibull PDF equations given by:
model is a
good
example of
a life
distribution
where γ is the "shape" parameter and α is a scale parameter called the characteristic
life.
Example: A company produces automotive fuel pumps that fail according to a Weibull
life distribution model with shape parameter γ = 1.5 and scale parameter 8,000 (time
measured in use hours). If a typical pump is used 800 hours a year, what proportion are
likely to fail within 5 years?
The reliability since both must survive in order for the system to survive. This
of the system is building up to the system from the individual components will be
the product of discussed in detail when we look at the "Bottoms - Up" method. The
the reliability general rule is: to calculate the reliability of a system of independent
functions of the components, multiply the reliability functions of all the components
components together.
The Bathtub Curve also applies (based on much empirical evidence) to Repairable Systems.
In this case, the vertical axis is the Repair Rate or the Rate of Occurrence of Failures
(ROCOF).
The Repair The derivative of M(t), denoted m(t), is known as the Repair Rate or
Rate (or the Rate Of Occurrence Of Failures at Time t or ROCOF.
ROCOF) is
the mean Models for N(t), M(t) and m(t) will be described in the section on Repair
rate of Rate Models.
failures per
unit time.
8.1.3.1. Censoring
When not Consider a situation where we are reliability testing n (non repairable) units taken randomly
all units from a population. We are investigating the population to determine if its failure rate is
on test fail acceptable. In the typical test scenario, we have a fixed time T to run the units to see if they
we have survive or fail. The data obtained are called Censored Type I data.
censored
data Censored Type I Data
During the T hours of test we observe r failures (where r can be any number from 0 to n). The
(exact) failure times are t1, t2, ..., tr and there are (n - r) units that survived the entire T hour
test without failing. Note that T is fixed in advance and r is random, since we don't know how
many failures will occur until the test is run. Note also that we assume the exact times of
failure are recorded when there are failures.
This type of censoring is also called "right censored" data, since the times of failure to the
right (i.e. larger than T) are missing.
Another (much less common) way to test is to decide in advance you want to see exactly r
failure times and then test until they occur. For example, you might put 100 units on test and
decide you want to see at least half of them fail. Then r = 50, but T is unknown until the 50th
fail occurs. This is called Censored Type II data.
Censored Type II Data
We observe t1, t2, ..., tr, where r is specified in advance. The test ends at time T = tr, and (n-r)
units have survived. Again we assume it is possible to observe the exact time of failure for
failed units.
Type II censoring has the significant advantage that you know in advance how many failure
times your test will yield - this helps enormously when planning adequate tests. However, an
open-ended random test time is generally impractical from a management point of view and
this type of testing is rarely seen.
.
Multicensored Data
In the most general case, every unit observed yields exactly one of the following three types
of information:
● a run-time if the unit did not fail while under observation
The units may all have different run-times and/or readout intervals.
Time to Fail tu = AF × ts
Failure Probability Fu(t) = Fs(t/AF)
Reliability Ru(t) = Rs(t/AF)
Each failure Note: Acceleration requires there is a stress dependent physical process
mode has its causing change or degradation that leads to failure. Therefore, it is
own unlikely that a single acceleration factor will apply to more than one
acceleration failure mechanism. In general, different failure modes will be affected
factor. differently by stress and have different acceleration factors. Separate
out different types of failure when analyzing failure data.
Failure data
should be Also, a consequence of the linear acceleration relationships shown
separated by above (that follow directly from "true acceleration" is the following:
failure mode The Shape Parameter for the key life distribution models
when (Weibull, Lognormal) does not change for units operating
analyzed, if under different stresses. Plots on probability paper of data
acceleration from different stress cells will line up roughly parallel
is relevant.
These distributions, and probability plotting, will be discussed in later
Data from sections.
different
stress cells
have the
same slope
on
probability
paper (if
there is
acceleration)
8.1.5.1. Arrhenius
The One of the earliest and most successful acceleration models predicts
Arrhenius how time to fail varies with temperature. This empirically based model
model is known as the Arrhenius equation. It takes the form
predicts
failure
acceleration
due to
temperature
increase where T is temperature measured in degrees Kelvin (273.16 + degrees
Centigrade) at the point where the failure process takes place and k is
Boltzmann's constant (8.617 x 10-5 in eV/K). The constant A is a scaling
factor that drops out when calculating acceleration factors, while H
(pronounced "Delta H") is also called the activation energy, and is the
critical parameter in the model.
The The value of H depends on the failure mechanism and the materials
Arrhenius involved, and typically ranges between .3 or .4 up to 1.5 or even higher.
activation Acceleration factors between two temperatures increase exponentially
energy H as H increases.
is all you
need to The acceleration factor between a higher temperature T2 and a lower
know to temperature T1 is given by
calculate
temperature
acceleration
8.1.5.2. Eyring
The Eyring
model has a Henry Eyring's contributions to chemical reaction rate theory have led
theoretical to a very general and powerful model for acceleration known as the
basis in Eyring Model. This model has several key features:
chemistry
● It has a theoretical basis from chemistry and quantum mechanics.
and quantum
mechanics ● If a chemical process (chemical reaction, diffusion, corrosion,
Models with Note that the general Eyring model includes terms that have stress and
multiple temperature interactions (in other words, the effect of changing
stresses temperature varies, depending on the levels of other stresses). Most
generally models in actual use do not include any interaction terms, so that the
have no relative change in acceleration factors when only one stress changes
interaction does not depend on the level of the other stresses.
terms -
which means In models with no interaction, you can compute acceleration factors for
you can each stress, and multiply them together, This would not be true if the
multiply physical mechanism required interaction terms - but, at least to first
acceleration approximations, it seems to work for most examples in the literature.
factors due
to different
stresses
= lnV, = -B.
The Exponential Voltage Model
In some cases, voltage dependence fits an exponential model better.
Again, these are just simplified two stress Eyring models with the
appropriate choice of constants and functions of voltage.
The Electromigration Model
Electromigration is a semiconductor failure mechanism where open
failures occur in metal thin film conductors due to the movement of
ions toward the anode. This ionic movement is accelerated high
temperatures and high current density. The (modified Eyring) model
takes the form
with
● Nf = the number of cycles to fail
● f = the cycling frequency
● T = the temperature range during a cycle
and G(Tmax) is an Arrhenius term evaluated at the maximum
temperature reached in each cycle.
8.1.6.1. Exponential
● Formulas and Plots
● Uses of the Exponential Distribution Model
● DATAPLOT and EXCEL Functions for the Exponential
Note that the failure rate reduces to the constant for any time. The exponential distribution
is the only distribution to have a constant failure rate. Also, another name for the exponential
mean is the Mean Time To Fail or MTTF and we have MTTF = 1/ .
The Cum Hazard function for the exponential is just the integral of the failure rate or H(t) =
t.
The PDF for the exponential has the familiar shape shown below.
The
Exponential
distribution
'shape'
The
Exponential
CDF
Below is an example of typical exponential lifetime data displayed in Histogram form with
corresponding exponential PDF drawn through the histogram.
Histogram
of
Exponential
Data
Dataplot
Exponential
probability
plot
EXCEL also has built in functions for the exponential PDF and CDF. The PDF is given by
EXPONDIST(x, ,false) and the CDF is given by EXPONDIST(x, ,true). Putting in 100 for
x and .01 for will give the same answers as given by Dataplot.
8.1.6.2. Weibull
● Formulas and Plots
● Uses of the Weibull Distribution Model
● DATAPLOT and EXCEL Functions for the Weibull
where is a scale parameter (the Characteristic Life) and (gamma) is known as the
Shape Parameter and is the Gamma function with (N) = (N-1)! for integer N.
The Cum Hazard function for the Weibull is the integral of the failure rate or
A more general 3 parameter form of the Weibull includes an additional waiting time
parameter µ (sometimes called a shift or location parameter). The formulas for the 3
parameter Weibull are easily obtained from the above formulas by replacing t by (t - µ)
wherever t appears. No failure can occur before µ hours, so the time scale starts at µ, and
not 0. If a shift parameter µ is known (based, perhaps, on the physics of the failure mode)
then all you have to do is subtract µ from all the observed failure times and/or readout
times and analyze the resulting shifted data with a 2 parameter Weibull.
NOTE: Various texts and articles in the literature use a variety of different symbols for
the same Weibull parameters. For example, the characteristic life is sometimes called c
(or = nu or = eta) and the shape parameter is also called m (or = beta). To add to
the confusion, EXCEL calls the characteristic life and the shape and some authors
Special Case: When = 1, the Weibull reduces to the Exponential Model, with = 1/
= the mean time to fail (MTTF).
Depending on the value of the shape parameter , the Weibull model can empirically fit
a wide range of data histogram shapes. This is shown by the PDF example curves below.
Weibull
data
'shapes'
From a failure rate model view point, the Weibull is a natural extension of the constant
failure rate exponential model, since the Weibull has a polynomial failure with exponent
{ - 1}. This makes all the failure rate curves shown in the following plot possible.
Weibull
failure rate
'shapes'
For example, if T = 1000 and = 1.5 and = 5000, the above commands will produce
a PDF of .000123 and a CDF of .08556.
NOTE: Whenever using Dataplot for Weibull analysis, you must start by setting
MINMAX equal to 1.
To generate Weibull random numbers from a Weibull with shape 1.5 and characteristic
life 5000 use the following commands:
SET MINMAX 1
LET GAMMA = 1.5
LET SAMPLE = WEIBULL RANDOM NUMBERS FOR I = 1 1 100
LET SAMPLE = 5000*SAMPLE
Next, to see how well these "random Weibull data points" actually fit a Weibull, we plot
them on "Weibull" paper to check whether they line up following a straight line. The
commands (following the last commands above) are:
X1LABEL LOG TIME;Y1LABEL CUM PROBABILITY
PLOT WEIBULL SAMPLE
The resulting plot is below. Note the log scale used is base 10
Dataplot
Weibull
Probability
Plot
EXCEL also has Weibull CDF and PDF built in functions. EXCEL calls the shape
parameter = alpha and the characteristic life = beta. The following command
evaluates the Weibull PDF for time 1000 when the shape is 1.5 and the characteristic life
is 5000:
WEIBULL(1000,1.5,5000,FALSE)
For the corresponding CDF
WEIBULL(1000,1.5,5000,TRUE)
The returned values (.000123 and .085559, respectively) are the same as calculated by
Dataplot.
Extreme
Value
Distribution
formulas
and PDF
shapes
If the x values are bounded below (as is the case with times of failure) then the limiting
distribution is the Weibull. Formulas and uses of the Weibull have already been discussed.
PDF Shapes for the (minimum) Extreme Value Distribution (Type I) are shown in the
following figure.
Because of this relationship, computer programs and graph papers designed for the
extreme value distribution, can be used to analyze Weibull data. The situation exactly
parallels using normal distribution programs to analyze lognormal data, after first taking
natural logarithms of the data points.
Data from an extreme value distribution will line up approximately along a straight line when
this kind of plot is done. The slope of the line is an estimate of , and the "y-axis" value on
the line corresponding to the "x-axis" 0 point is an estimate of µ. For the graph above, these
turn out to be very close to the actual values of and µ.
8.1.6.4. Lognormal
● Formulas and Plots
● Uses of the Lognormal Distribution Model
● DATAPLOT and EXCEL Functions for the Lognormal
Note: If time to failure, tf, has a lognormal distribution, then the (natural) logarithm of time to
failure has a normal distribution with mean µ = ln T50 and standard deviation . This makes
lognormal data convenient to work with; just take natural logarithms of all the failure times and
censoring times and analyze the resulting normal data. Later on, convert back to real time and
lognormal parameters using as the lognormal shape and T50 = eµ as the (median) scale
parameter.
Below is a summary of the key formula's for the lognormal.
Note: A more general 3 parameter form of the lognormal includes an additional waiting time
parameter (sometimes called a shift or location parameter). The formulas for the 3 parameter
lognormal are easily obtained from the above formulas by replacing t by (t - ) wherever t
appears. No failure can occur before hours, so the time scale starts at and not 0. If a shift
parameter is known (based, perhaps, on the physics of the failure mode), then all you have to
do is subtract from all the observed failure times and/or readout times and analyze the
resulting shifted data with a 2 parameter lognormal.
Examples of lognormal PDF and failure rate plots are shown below. Note that lognormal shapes
for small sigma's are very similar to Weibull shapes when the shape parameter is large and
large sigma's give plots similar to small Weibull 's. Both distributions are very flexible and it
is often difficult to choose which to use based on empirical fits to small samples of (possibly
censored) data.
Lognormal
data 'shapes'
Lognormal
failure rate
'shapes'
Dataplot
lognormal
probability
plot
Finally, we note that EXCEL has a built in function to calculate the lognormal CDF. The
command is =LOGNORMDIST(5000,9.903487553,0.5) to evaluate the CDF of a lognormal at
time T = 5000 with = .5 and T50 = 20,000 and ln T50 = 9.903487553. The answer returned is
.002781. There is no lognormal PDF function in EXCEL. The normal PDF can be used as
follows:
=(1/5000)*NORMDIST(8.517193191,9.903487553,0.5,FALSE)
where 8.517193191 is ln 5000 and "FALSE" is needed to get PDF's instead of CDF's. The
answer returned is 3.42E-06.
8.1.6.5. Gamma
● Formulas and Plots
● Uses of the Gamma Distribution Model
● DATAPLOT and EXCEL Functions for the Gamma
The
Note: When a = 1, the gamma reduces to a exponential distribution with b = .
exponential
is a special Another well known statistical distribution, the Chi Square, is also a special case of the gamma. A
case of the Chi Square distribution with n degrees of freedom is the same as a gamma with a = n/2 and b = .5
gamma (or = 2).
The following plots give examples of gamma PDF, CDF and failure rate shapes.
Shapes for
Gamma
data
Gamma
CDF
shapes
Gamma
failure rate
shapes
EXCEL also has built in functions to evaluate the gamma pdf and cdf. The syntax is:
=GAMMADIST(t,a,1/b,FALSE) for the PDF
=GAMMADIST(t,a,1/b,TRUE) for the CDF
Formulas
and shapes
for the
Fatigue
Life model
The PDF, CDF, mean and variance for the Birnbaum-Saunders Distribution are shown below.
The parameters are: = a shape parameter; µ a scale parameter. These are the parameters
used in Dataplot, but there are other choices also common in the literature (see the parameters
used for the derivation of the model).
PDF shapes for the model vary from highly skewed and long tailed (small gamma values) to
nearly symmetric and short tailed as gamma increases. This is shown in the figure below.
Since there are many cycles, each lasting a very short time, we can replace the discrete
number of cycles N needed to reach failure by the continuous time tf needed to reach failure.
The cdf F(t) of tf is given by
Here denotes the standard normal cdf. The parameters and are an alternative way of
writing the Birnbaum-Saunders distribution that is often used ( , as compared to
the way the formulas were parameterized earlier in this section).
Note:
The critical assumption in the derivation, from a physical point of view, is that the crack
growth during any one cycle is independent of the growth during any other cycle. Also, the
growth has approximately the same random distribution, from cycle to cycle. This is a very
different situation from the proportional degradation argument used to derive a log normal
distribution model, where the rate of degradation at any point in time depends on the total
amount of degradation that has occurred up to that time.
This kind of The Birnbaum-Saunders assumption, while physically restrictive, is consistent with a
physical deterministic model from materials physics known as Miner's Rule (Miner's Rule implies that
degradation the damage that occurs after n cycles, at a stress level that produces a fatigue life of N cycles,
is is proportional to n/N). So, when the physics of failure suggests Miner's Rule applies, the
consistent Birnbaum-Saunders model is a reasonable choice for a life distribution model.
with
Miner's
Rule.
Dataplot
commands Dataplot Functions for the Birnbaum-Saunders Model
for the
Fatigue The PDF for a Birnbaum-Saunders (Fatigue Life) distribution with parameters µ, is
Life model evaluated at time t by:
LET PDF = (1/µ*FLPDF((t/µ, ).
The CDF is
LET CDF = FLCDF((t/µ), ).
To generate 100 random numbers, when µ = 5000, = 2, for example, type the following
Dataplot commands:
LET GAMMA = 2
LET DATA = FATIGUE LIFE RANDOM NUMBERS FOR
I = 1 1 100
LET DATA = 5000*DATA
Finally, we can do a Fatigue Life Probability Plot of the 100 data points in DATA by
LET GAMMA = 2
FATIGUE LIFE PROBABILITY PLOT DATA
and the points on the resulting plot (shown below) line up roughly on a straight line, as
expected for data correctly modeled by the Birnbaum-Saunders distribution.
Notes
1. We set GAMMA equal to 2 before doing the probability plot because we knew its
value. If we had real experimental data (with no censoring), first we would run PPCC
to estimate gamma. The command is: FATIGUE LIFE PPCC PLOT DATA. To see the
estimated value of gamma we would type WRITE SHAPE. Then, we would type LET
GAMMA = SHAPE before running the Fatigue Life Probability Plot.
2. The slope of the line through the points on the probability plot is an estimate of the
scale parameter µ.
The The proportional hazards model assumes we can write the changed
proportional hazard function for a new value of z as
hazard model
assumes hz(t) = g(z)h0(t)
changing a In other words, changing z, the explanatory variable vector, results in a
stress new hazard function that is proportional to the nominal hazard
variable (or function, and the proportionality constant is a function of z, g(z),
explanatory independent of the time variable t.
variable) has
the effect of A common and useful form for f(z) is the Log Linear Model which
multiplying has the equation: g(x) = eax for one variable, g(x,y) = eax+by for two
the hazard variables, etc.
rate by a
constant. Properties and Applications of the Proportional Hazards Model
1. The proportional hazards model is equivalent to the acceleration
factor concept if and only if the life distribution model is a
Weibull (which includes the exponential model, as a special
case). For a Weibull with shape parameter , and an
acceleration factor AF between nominal use fail time t0 and high
stress fail time ts (with t0 = AFts) we have g(s) = AF . In other
words, hs(t) = AF h0(t).
2. Under a log linear model assumption for g(z), without any
further assumptions about the life distribution model, it is
possible to analyze experimental data and compute maximum
likelihood estimates and use likelihood ratio tests to determine
which explanatory variables are highly significant. In order to do
this kind of analysis, however, special software is needed.
More details on the theory and applications of the proportional hazards
model may be found in Cox and Oakes (1984).
The simplest useful model for M(t) is M(t) = t and the repair rate (or
ROCOF) is the constant m(t) = . This model comes about when the
interarrival times between failures are independent and identically
distributed according to the exponential distribution, with parameter .
This basic model is also known as a Homogeneous Poisson Process
(HPP). The following formulas apply:
HPP model Despite the simplicity of this model, it is widely used for repairable
fits flat equipment and systems throughout industry. Justification for this comes,
portion of in part, from the shape of the empirical Bathtub Curve. Most systems (or
"bathtub" complex tools or equipment) spend most of their "lifetimes" operating in
curve the long flat constant repair rate portion of the Bathtub Curve. The HPP
is the only model that applies to that portion of the curve, so it is the
most popular model for system reliability evaluation and reliability test
planning.
Planning reliability assessment tests (under the HPP assumption) is
covered in a later section, as is estimating the MTBF from system
failure data and calculating upper and lower confidence limits.
Poisson Note that in the HPP model, the probability of having exactly k failures
relationship
by time T is given by the Poisson distribution with mean T (see
and
formula for P(N(T) = k) above). This can be evaluated by the Dataplot
Dataplot
expression:
and EXCEL
functions
LET Y = POIPDF(k, T)
or by the EXCEL expression:
POISSON(k, T, FALSE)
The Power
The HPP model has a the constant repair rate m(t) = . If we substitute
law model is
very flexible an arbitrary function (t) for , we have a Non Homogeneous
and contains
Poisson Process (NHPP) with Intensity Function . If
the HPP
(exponential)
model as a
special case then we have an NHPP with a Power Law intensity function (the
"intensity function" is another name for the repair rate m(t)).
Because of the polynomial nature of the ROCOF, this model is very
flexible and can model both increasing (b>1 or < 0) and decreasing
(0 < b < 1 or 0 < < 1)) failure rates. When b = 1 or = 0, the model
reduces to the HPP constant repair rate model.
Probabilities Probabilities of a given number of failures for the NHPP model are
of failure for calculated by a straight forward generalization of the formulas for the
all NHPP HPP. Thus, for any NHPP
processes
can easily be
calculated
based on the
Poisson and for the Power Law model:
formula
The Power Other names for the Power Law model are: the Duane Model and the
Law model AMSAA model. AMSAA stands for the United States Army
is also called Materials System Analysis Activity, where much theoretical work
the Duane describing the Power Law model was done in the 1970's.
Model and
the AMSAA
model
It is also The time to the first fail for a Power Law process has a Weibull
called a distribution with shape parameter b and characteristic life a. For this
Weibull reason, the Power law model is sometimes called a Weibull Process.
Process - but This name can be confusing and should be avoided. However, since it
this name is mixes a life distribution model applicable to lifetimes of a non
misleading repairable population with a model for inter arrival times of failures of a
repairable population.
For any NHPP process with intensity function m(t), the distribution
function (CDF) for the interarrival time to the next failure, given a
failure just occurred at time T, is given by
Once a
failure
occurs, the
waiting time
to the next
In particular, for the Power Law the waiting time to the next failure,
failure for
given a failure at time T, has distribution function
an NHPP
has a simple
CDF
formula
This inter arrival time CDF can be used to derive a simple algorithm for
simulating NHPP Power Law Data.
Multiply
reliabilities
and add
failure rates
where the subscript S refers to the entire system and the subscript i refers to the i-th
component.
Note that the above holds for any arbitrary component life distribution models, as long
as "independence" and "first component failure causes the system to fail" both hold.
The analogy to a series circuit is useful. The entire system has n components in series.
The system fails when current no longer flows, and each component operates or fails
independently of all the others. The schematic below shows a system with 5 components
in series "replaced" by an "equivalent" (as far as reliability is concerned) system with
Multiply
component
CDF's to get
the system
CDF for a
parallel
model
For a parallel model, the CDF Fs(t) for the system is just the product of the CDF's Fi(t)
for the components or
Rs(t) and hs(t) can be evaluated using basic definitions, once we have Fs(t).
The schematic below represents a parallel system with 5 components and the (reliability)
equivalent 1 component system with a CDF Fs equal to the product of the 5 component
CDF's.
Note: If we relax the assumption that all the components are identical,
then Rs(t) would be the sum of probabilities evaluated for all possible
terms that could be formed by picking at least r survivors and the
corresponding failures. The probability for each term is evaluated as a
product of R(t)'s and F(t)'s. For example, for n = 4 and r = 2, the system
reliability would be (abbreviating the notation for R(t) and F(t) by using
only R and F)
Rs = R1R2F3F4 + R1R3F2F4 + R1R4F2F3 + R2R3F1F4
+ R2R4F1F3 + R3R4F1F2 + R1R2R3F4 + R1R3R4F2
+ R1R2R4F3 + R2R3R4F1 + R1R2R3R4
Identical In other words, Tn = t1 + t2+ ... + tn, where each ti has CDF F(t) and Tn
backup has a CDF we denote by Fn(t). This can be evaluated using convolution
Standby formulas:
model leads
to
convolution
formulas
Note: The reduction methods described above will work for many, but not all, systems.
Some systems with a complicated operational logic structure will need a more formal
structural analysis methodology. This methodology deals with subjects such as event
trees, boolean representations, coherent structures, cut sets and decompositions, and is
beyond the present scope of this Handbook.
Repair rates Another name for reliability improvement testing is TAAF testing,
should show standing for Test, Analyze And Fix. In the semiconductor industry,
an another common name for a reliability test (trademarked by Motorola)
improvement is an IRONMAN™. The acronym IRONMAN™ stands for "Improve
trend during Reliability Of New Machines At Night" and empathizes the "around the
the course of clock" nature of the testing process.
a reliability
improvement While only one model applies when a repairable system has no
test and this improvement or degradation trends (the constant repair rate HPP model)
can be there are infinitely many models that could be used to describe a system
modeled with a decreasing repair rate (reliability growth models).
using a
NHPP model Fortunately, one or two relatively simple models have been very
successful in a wide range of industrial applications. Two models that
have previously been described will be used in this section. These
models are the NHPP Power Law Model and the NHPP Exponential
Law Model. The Power Law Model underlies the frequently used
graphical technique known as Duane Plotting.
Use of the Power Law model for reliability growth test data generally assumes the
following:
1. While the test is on-going system improvements are introduced that produce continual
improvements in the rate of system repair.
2. Over a long enough period of time the effect of these improvements can be modeled
When an
improvement 3. When the improvement test ends at test time T, and no further improvement actions are
test ends, the
MTBF stays on-going, the repair rate has been reduced to . The repair rate remains constant
constant at from then on at this new (improved) level.
its last
Assumption 3 means that when the test ends, the HPP constant repair rate model takes
achieved
over and the MTBF for the system from then on is the reciprocal of the final repair rate
value.
or . If we estimate the expected number of failures up to time T by the actual
number observed, the estimated MTBF at the end of a reliability test (following the
Power Law) is:
where T is the test time, ris the total number of test failures and is the reliability
growth slope. A formula for estimating from system failure times is given in the
Analysis Section for the Power Law model.
The n numbers Y1, Y2, . . ., Ynare the desired repair times simulated from an NHPP Power
Law process with parameters a, b (or = 1 - b and = ab).
The Dataplot Macro powersim.dp will ask the user to input the number N of repair times
desired and the parameters a and b. The program will output the N simulated repair times
and a plot of these repair times.
Example
Below powersim.dp is used to generate 13 random repair times from the NHPP Power
Law process with a = .2 and b = .4.
CALL powersim.dp
Enter number N of simulated repair times desired
13
Enter value for shape parameter a (a > 0)
.2
Enter value for shape parameter b (b > 0)
.4
FAILNUM FAILTIME
1 26
2 182
3 321
4 728
5 896
6 1268
7 1507
8 2325
9 3427
10 11871
11 11978
12 13562
13 15053
Points on a Plotting a Duane Plot is simple. If the ith failure occurs at time ti, then plot ti divided by i (the
Duane plot "y"- axis value) versus the timeti (the "x"-axis value) on log-log graph paper. Do this for all the
line up test failures and draw the best straight line you can following all these points.
following a
straight line Why does this "work"? Following the notation for repairable system models, we are plotting
if the Power estimates of {t/M(t)} versus the time of failure t. If M(t) follows the Power Law (also described in
Law model the last section), then we are plotting estimates of t/atb versus the time of fail t. This is the same
applies
as plotting versus t, where = 1-b . On log-log paper this will be a straight line with
slope and intercept (when t = 1) of - log10a.
In other words, a straight line on a Duane plot is equivalent to the NHPP Power Law Model with
a reliability growth slope of = 1 - b and an "a" parameter equal to
10-intercept.
Note: A useful empirical rule of thumb based on Duane plots made from many reliability
improvement tests across many industries is the following:
Duane plot The reliability improvement slope for virtually all reliability improvement tests will
reliability lie between .3 and .6. The lower end (.3) describes a minimally effective test -
growth perhaps the cross functional team is inexperienced or the system has many failure
slopes mechanisms that are not well understood. The higher end (.6) approaches the
should lie empirical state of the art for reliability improvement activities.
between .3
and .6
1 33 33
2 76 38
3 145 48.3
4 347 86.8
5 555 111.0
6 811 135.2
7 1212 173.1
8 1499 187.3
The
Duane plot indicates a reasonable fit to a Power Law NHPP model. The reliability improvement
slope (slope of line on Duane plot) is = .437 (using the formula given in the section on
reliability data analysis for the Power Law model) and the estimated MTBF achieved by the end
of the 1500 hour test is 1500/(8 × [1-.437]) or 333 hours.
Duane Plot Example 2:
For the simulated Power Law data used in the Example in the preceding section, the following
Dataplot commands (executed immediately after running powersim.dp) produce the Duane Plot
shown below.
XLOG ON
YLOG ON
LET MCUM = FAILTIME/FAILNUM
PLOT MCUM FAILTIME
Rule of thumb: First try a Duane plot and the Power law model. If that
shows obvious lack of fit try the Exponential Law model, estimating
parameters using the formulas in the Analysis Section for the
Exponential law. A plot of cum fails versus time, along with the
estimated M(t) curve, can be used to judge goodness of fit.
Bayesian It makes a great deal of practical sense to use all the information available, old or new,
analysis objective or subjective, when making decisions under uncertainty. This is especially true
considers when the consequences of the decisions can have a significant impact, financial or
population otherwise. Most of us make everyday personal decisions this way, using an intuitive process
parameters based on our experience and subjective judgments.
to be
random, not Main stream statistical analysis, however, seeks objectivity by generally restricting the
fixed. information used in an analysis to that obtained from a current set of clearly relevant data.
Prior knowledge is not used except to suggest the choice of a particular population model to
Old "fit" to the data, and this choice is later checked against the data for reasonableness.
information,
or subjective Lifetime or repair models, as we saw earlier when we looked at repairable and non
judgment, is repairable reliability population models, have one or more unknown parameters. The
used to come classical statistical approach considers these parameters as fixed but unknown constants to
up with a be estimated (i.e. "guessed at") using sample data taken randomly from the population of
prior interest. A confidence interval for an unknown parameter is really a frequency statement
distribution about the likelihood that numbers calculated from a sample capture the true parameter.
for these Strictly speaking, one cannot make probability statements about the true parameter since it
population is fixed, not random.
parameters.
The Bayesian approach, on the other hand, treats these population model parameters as
random, not fixed, quantities. Before looking at the current data, we use old information, or
even subjective judgments, to construct a prior distribution model for these parameters.
This model expresses our starting assessment about how likely various values of the
unknown parameters are. We then make use of the current data (via Bayes formula) to
revise this starting assessment, deriving what is called the posterior distribution model for
the population model parameters. Parameter estimates, along with confidence intervals
(known as credibility intervals), are calculated directly from the posterior distribution.
Credibility intervals are legitimate probability statements about the unknown parameters,
since these parameters now are considered random, not fixed.
It is unlikely in most applications that data will ever exist to validate a chosen prior
distribution model. Parametric Bayesian prior models are chosen because of their flexibility
and mathematical convenience. In particular, conjugate priors (defined below) are a natural
and popular choice of Bayesian prior distribution models.
Bayes Formula, Prior and Posterior Distribution Models, and Conjugate Priors
Bayes Bayes formula is a useful equation from probability theory that expresses the conditional
formula probability of an event A occurring, given that the event B has occurred (written P(A|B)), in
provides the terms of unconditional probabilities and the probability the event B has occurred, given that
mathematical A has occurred. In other words, Bayes formula inverts which of the events is the
tool that conditioning event. The formula is
combines
prior
knowledge
with current
data to and P(B) in the denominatorr is further expanded by using the so-called "Law of Total
produce a Probabilities" to write
posterior
distribution
where the events Ai are mutually exclusive and exhaust all possibilities and usually include
the event A as one of the Ai.
The same formula, written in terms of probability density function models, takes the form:
where f(x| ) is the probability model, or likelihood function, for the observed data x given
the unknown parameter (or parameters) , g( ) is the prior distribution model for and
g( |x) is the posterior distribution model for given that the data x have been observed.
When g( |x) and g( ) both belong to the same distribution family, then g( ) and
f(x| ) are called conjugate distributions and g( ) is the conjugate prior for f(x| ). For
example, the Beta distribution model is a conjugate prior for the proportion of successes p
when samples have a binomial distribution. And the Gamma model is a conjugate prior for
the failure rate when sampling failure times or repair times from an exponentially
distributed population. This latter conjugate pair (gamma, exponential) is used extensively
in Bayesian system reliability applications.
How Bayes Methodology is used in System Reliability Evaluation
Bayesian Models and assumptions for using Bayes methodology will be described in a later section.
system Here we compare the classical paradigm versus the Bayesian paradigm when system
reliability reliability follows the HPP or exponential model (i.e., the flat portion of the Bathtub Curve).
evaluation
assumes the Classical Paradigm For System Reliability Evaluation:
system ● The MTBF is one fixed unknown value - there is no “probability” associated with it.
MTBF is a ● Failure data from a test or observation period allows you to make inferences about the
random value of the true unknown MTBF
quantity
● No other data is used and no “judgment” - the procedure is objective and based solely
"chosen"
according to on the test data and the assumed HPP model
a prior Bayesian Paradigm For System Reliability Evaluation:
distribution ● The MTBF is a random quantity with a probability distribution
model ● The particular piece of equipment or system you are testing “chooses” an MTBF from
this distribution and you observe failure data that follows an HPP model with that
MTBF
● Prior to running the test, you already have some idea of what the MTBF probability
distribution looks like based on prior test data or an consensus engineering judgment
Pro's and While the primary motivation to use Bayesian reliability methods is typically a desire to
con's for save on test time and materials cost, there are other factors that should also be taken into
using account. The table below summarizes some of these "good news" and "bad news"
Bayesian considerations.
methods
Bayesian Paradigm: Advantages and Disadvantages
Pro's Con's
8.2. Assumptions/Prerequisites
This section describes how life distribution models and acceleration
models are typically chosen. Several graphical and analytical methods
for evaluating model fit are also discussed.
Detailed 2. Assumptions/Prerequisites
contents of 1. How do you choose an appropriate life distribution model?
Section 2
1. Based on failure mode
2. Extreme value argument
3. Multiplicative degradation argument
4. Fatigue life (Birnbaum-Saunders) argument
5. Empirical model fitting - distribution free (Kaplan-Meier)
approach
2. How do you plot reliability data?
1. Probability plotting
2. Hazard and cum hazard plotting
3. Trend and growth plotting (Duane plots)
3. How can you test reliability model assumptions?
1. Visual tests
2. Goodness of fit tests
3. Likelihood ratio tests
4. Trend tests
4. How do you choose an appropriate physical acceleration model?
5. What models and assumptions are typically made when Bayesian
methods are used for reliability evaluation
Models like the lognormal and the Weibull are so flexible that it is not
uncommon that both will fit a small set of failure data equally well. Yet,
especially when projecting via acceleration models to a use condition far
removed from the test data, these two models may predict failure rates
that differ by orders of magnitude. That is why it is more than an
academic exercise to try to find a theoretical justification for using one
particular distribution over another.
There are We will consider three well known arguments of this type:
several ● Extreme value argument
useful
theoretical ● Multiplicative degradation argument
arguments ● Fatigue life (Birnbaum-Saunders) model
to help Note that physical/statistical arguments for choosing a life distribution
guide the model are typically based on individual failure modes.
choice of a
model
For some Kaplan-Meier technique can be used when it is appropriate to just "let
questions, the data points speak for themselves" without making any model
an assumptions. However, you generally need lots of data for this approach
"empirical" to be useful, and acceleration modeling is much more difficult.
distribution
free
approach
can be used
On the other hand, there are many cases where failure occurs at the
weakest link of a large number of similar degradation processes or
defect flaws. One example of this occurs when modeling catastrophic
failures of capacitors caused by dielectric material breakdown. Typical
dielectric material has many "flaws" or microscopic sites where a
breakdown will eventually take place. These sites may be thought of as
competing with each other to reach failure first. The Weibull model,
as extreme value theory would predict, has been very successful as a
life distribution model for this failure mechanism.
Example of A simple example will illustrate the K-M procedure. Assume 20 units
K-M are on life test and 6 failures occur at the following times: 10, 32, 56, 98,
estimate 122, and 181 hours. There were 4 unfailed units removed from test for
calculations other experiments at the following times: 50 100 125 and 150 hours.
The remaining 10 unfailed units were removed from test at 200 hours.
The K-M estimates for this life test are:
R(10) = 19/20
R(32) = 19/20 x 18/19
R(56) = 19/20 x 18/19 x 16/17
R(98) = 19/20 x 18/19 x 16/17 x 15/16
R(122) = 19/20 x 18/19 x 16/17 x 15/16 x 13/14
R(181) = 19/20 x 18/19 x 16/17 x 15/16 x 13/14 x 10/11
A General Expression for K-M Estimates
A general expression for the K-M estimates can be written. Assume we
have n units on test and order the observed times for these n units from
t1 to tn. Some of these are actual failure times and some are running
times for units taken off test before they fail. Keep track of all the
indices corresponding to actual failure times. Then the K-M estimates
are given by:
where the "hat" over R indicates it is an estimate and S is the set of all
subscripts j such that tj is an actual failure time. The notation j S and tj
less than or equal to ti means we only form products for indices j that are
in S and also correspond to times of failure less than or equal to ti.
Once values for R(ti) are calculated, the CDF estimates are
F(ti) = 1 - R(ti)
Once values for R(ti) are calculated, the CDF estimates are F(ti) = 1 -
R(ti)
The kinds of plots we will consider for failure data from non repairable
populations are:
● Probability (CDF) plots
Later on (Section 8.4.2.1) we will also look at plots that can be used to
check acceleration model assumptions.
Note: Many of the plots discussed in this section can also be used to
get quick estimates of model parameters. This will be covered in later
sections. While there may be other more accurate ways of estimating
parameters, simple graphical estimates can be very handy, especially
when other techniques require software programs that are not readily
available.
Plot each Remember that different failure modes can and should be separated out and individually
failure analyzed. When analyzing failure mode A, for example, treat failure times from failure modes
mode B, C, etc., as censored run times. Then repeat for failure mode B, and so on.
separately
Data points When the points are plotted the analyst fits a straight line through them (either by eye, or with
line up the aid of a least squares fitting program). Every straight line on, say, Weibull paper, uniquely
roughly on corresponds to a particular Weibull life distribution model and the same is true for lognormal
a straight or exponential paper. If the points follow the line reasonably well, then the model is consistent
line when with the data - if it was your previously chosen model there is no reason to question the
the model choice. Depending on the type of paper, there will be a simple way to find the parameter
chosen is estimates that correspond to the fitted straight line.
reasonable
It is not The general idea is to take the model CDF equation and write it in such a way that a function
difficult to of F(t) is a linear equation of a function of t. This will be clear after a few examples. In the
do formulas that follow, "ln" always means "take natural logarithms, while "log" always means
probability "take base 10 logarithms".
plotting for
many a) Exponential Model: Take the exponential CDF and rewrite it as
reliability
models even
without
specially
constructed
graph
papers.
If we let y = 1/{1 - F(t)} and x = t, then log y is linear in x, with slope /ln10. This shows we
can make our own special exponential probability paper by using standard semi log paper
(with a logarithmicy-axis). Use the plotting position estimates for F(ti) described above
(without the 100 × multiplier) to calculate pairs of (xi,yi) points to plot.
If the data are consistent with an exponential model, the resulting plot will have points that
line up roughly on a straight line going through the origin with slope /ln10.
b) Weibull Model: Take the Weibull CDF and rewrite it as
If we let y = ln [1/{1-F(t)}] and x = t, then log y is linear in x with slope This shows we can
make our own Weibull probability paper by using log log paper. Use the usual plotting
position estimates for F(ti) (without the 100 × multiplier) to calculate pairs of (xi,yi) points to
plot.
If the data are consistent with a Weibull model, the resulting plot will have points that line up
roughly on a straight line with slope . This line will cross the log x-axis at time t = and the
log y axis (i.e., the intercept) at - log .
where is the inverse function for the standard normal distribution (taking a probability as
an argument and returning the corresponding "z" value).
If we let y = t and x = {F(t)}, then log y is linear in x with slope /ln10 and intercept
(when F(t) = .5) of log T50. We can make our own lognormal probability paper by using semi
log paper (with a logarithmic y-axis). Use the usual plotting position estimates for F(ti)
(without the 100 × multiplier) to calculate pairs of (xi,yi) points to plot.
If the data are consistent with a lognormal model, the resulting plot will have points that line
up roughly on a straight line with slope /ln10 and intercept T50 on the y-axis.
d) Extreme Value Distribution (Type I - for minimum): Take the extreme value distribution
CDF and rewrite it as
If we let y = -ln(1 - F(x)), then ln y is linear in x with slope 1/ and intercept -µ / . We can
use semi log paper (with a logarithmic y-axis) and plot y vs x. The points should follow a
straight line with a slope of 1/ ln10 and an intercept of - ln10. The ln 10 factors are needed
because commercial log paper uses base 10 logarithms.
DATAPLOT Example
A Dataplot Using the Dataplot commands to generate Weibull random failure times, we generate 20
Weibull Weibull failure times with a shape parameter of γ = 1.5 and α = 500. Assuming a test time of
example of T = 500 hours, only 10 of these failure times would have been observed. They are, to the
probability nearest hour: 54, 187, 216, 240, 244, 335, 361, 373, 375, and 386. First we will compute
plotting plotting position CDF estimates based on these failure times, and then a probability plot using
the "make our own paper" method.
( 2) (3) (4)
( 1) F(ti) estimate ln{1/(1-F(ti)}
Time of Fail
Fail # = i
(x) (i-.3)/20.4 (y)
1 55 .034 .035
2 187 .083 .087
3 216 .132 .142
Of course, with commercial Weibull paper we would plot pairs of points from column (2) and
column (3). With ordinary log log paper we plot (2) vs (4).
The Dataplot sequence of commands and resulting plot follow:
LET X = DATA 55 187 216 240 244 335 361 373 375 386
LET Y = DATA .035 .087 .142 .2 .262 .328 .398 .474 .556 .645
XLOG ON
YLOG ON
XLABEL LOG TIME
YLABEL LOG LN (1/(1-F))
PLOT Y X
Note that the points appear to have some curvature. This is mostly due to the very first point
on the plot (the earliest time of failure). The first few points on a probability plot have more
variability than points in the central range and less attention should paid to them when visually
testing for "straightness".
Use of least We could go on to use Dataplot to fit a straight line through the points via the commands
squares
(regression) LET YY = LOG10(Y)
technique to LET XX = LOG10(X)
fit a line FIT YY XX
through the This would give a slope estimate of 1.46, which is close to the 1.5 value used in the
points on simulation.
probability The intercept is -4.114 and setting this equal to - log we estimate = 657 (the "true"
paper
value used in the simulation was 500).
Dataplot Finally, we note that Dataplot has a built in Weibull probability paper command that can be
has a used whenever we have a complete sample (i.e. no censoring and exact failure times). First
special you have to run PPCC to get an estimate of γ = GAMMA. This is stored under SHAPE. The
Weibull full sequence of commands (with XLOG and YLOG both set to OFF) is
probability
paper SET MINMAX = 1
function for WEIBULL PPCC PLOT SAMPLE
complete SET GAMMA = SHAPE
data WEIBULL PLOT SAMPLE
samples (no
censoring)
A life test Example: Ten units were tested at high stress test for up to 250 hours. Six failures took
cum place at 37, 73, 132, 195, 222 and 248 hours. Four units were taken off test without
hazard failing at the following run times: 50, 100, 200 and 250 hours. Cum hazard values were
plotting computed in the following table:
example
(2) (5)
(1) (3) (4) (6)
1= failure Haz Val
Time of Event Rank Reverse Rank Cum Hazard Value
0=runtime (2) x 1/(4)
37 1 1 10 1/10 .10
50 0 2 9
73 1 3 8 1/8 .225
100 0 4 7
132 1 5 6 1/6 .391
195 1 6 5 1/5 .591
200 0 7 4
222 1 8 3 1/3 .924
248 1 9 2 1/2 1.424
250 0 10 1
Next ignore the rows with no cum hazard value and plot column (1) vs column (6).
The cum Hazard for the Weibull is , so a plot of y vs x on log log paper
should line up along a straight line with slope , if the Weibull model is appropriate.
The Dataplot commands and graph of this is shown below:
XLOG ON
YLOG ON
PLOT Y X
The equation of the least squares line fit through these points can be found from
LET YY = LOG10(Y)
LET XX = LOG10(X)
FIT Y X
The Weibull fit looks better, although the slope estimate is 1.27, which is not far from an
exponential model slope of 1. Of course, with a sample of just 10, and only 6 failures, it
is difficult to pick a model from the data alone.
Repair rates Models for repairable systems were described earlier. These models
are typically look at how the cumulative number of fails (or the repair rate) changes
either nearly over time. The two models used with most success throughout industry
constant over are the HPP (constant repair rate or "exponential" system model) and
time or have the NHPP Power Law process (the repair rate is the polynomial m(t) =
good or bad
trends ).
Before doing a Duane Plot, there are a few simple trend plots that often
convey strong evidence of the presence or absence of a trend in the
repair rate over time. If there is no trend, an HPP model is reasonable.
If there is an apparent improvement or degradation trend, a Duane Plot
will provide a visual check for whether the NHPP Power law model is
consistent with the data.
A few simple These simple visual graphical tests for trends are
plots can 1. Plot cumulative failures vs system age (a step function that goes
help us up every time there is a new failure). If this plot looks linear,
decide there is no obvious improvement (or degradation) trend. A
whether bending downward indicated improvement; bending upward
trends are indicates degradation.
present
2. Plot the inter arrival times between new failures (in other word,
the waiting times between failures, with the time to the first
failure used as the first "inter arrival" time). If these tend up there
is improvement, if they trend down there is degradation.
3. Plot the reciprocals of the inter arrival times. Each reciprocal is a
new failure rate estimate based only on the waiting time since the
last fail). If these trend down, there is improvement; an upward
trend indicates degradation.
Trend plots Case Study 1: Reliability Improvement Test Data Use of Trend
and a Duane Plots and Duane Plots
Plot for
actual A prototype of a new, complex piece of equipment went through a
Reliability 1500 operational hours Reliability Improvement Test. During the test
Improvement there were 10 fails. As a part of the improvement process, a cross
Test data functional Failure Review Board made sure every failure was analyzed
down to root cause and design and parts selection fixes were
implemented on the prototype. The observed failure times were: 5, 40,
43, 175, 389, 712, 747, 795, 1299 and 1478 hours, with the test ending
at 1500 hours. The reliability engineer on the Failure Review Board
first made trend plots as described above, then made a Duane plot.
These plots (using EXCEL) follow.
389 77.8
712 118.67
747 106.7
795 99.4
1299 144.3
1478 147.8
Comments: The three trend plots all show an improvement trend. The
reason it might help to try all three is that there are examples where
trends show up clearer on one of these plots, as compared to another.
Formal statistical tests on the significance of this visual evidence of a
trend will be shown in the section on Trend Tests.
Details of Let L1 be the maximum value of the likelihood of the data without the
the additional assumption. In other words, L1 is the likelihood of the data
Likelihood with all the parameters unrestricted and maximum likelihood estimates
Ratio Test substituted for these parameters.
procedure
Let L0 be the maximum value of the likelihood when the parameters are
In general,
restricted (and reduced in number) based on the assumption. Assume k
calculations
parameters were lost (i.e. L0 has k less parameters than L1)..
are difficult
and need to Form the ratio = L0/L1. This ratio is always between 0 and 1 and the
be built into
the software less likely the assumption is, the smaller will be. This can be
you use quantified at a given confidence level as follows:
Use this
formula when
there are
more than 12
repairs in the
data set
and if z > 1.282, we have at least 90% significance. If z > 1.645, we
have 95% significance and a z > 2.33 indicates 99% significance. Since
z has an approximate standard normal distribution, the Dataplot
command
LET PERCENTILE = 100* NORCDF(z)
will return the percentile corresponding to z.
That covers the (one-sided) test for significant improvement trends. If,
on the other hand, we believe there may be a degradation trend (the
system is wearing out or being over stressed, for example) and we want
to know if the data confirms this, then we expect a low value for R and
we need a table to determine when the value is low enough to be
significant. The table below gives these critical values for R.
Value of R Indicating Significant Degradation Trend (One-Sided Test)
Maximum R for Maximum R for Maximum R for
Number of
90% Evidence of 95% Evidence of 99% Evidence of
Repairs
Degradation Degradation Degradation
4 0 0 -
5 1 1 0
6 3 2 1
7 5 4 2
8 8 6 4
9 11 9 6
10 14 12 9
11 18 16 12
12 23 20 16
As before, we have r times of repair T1, T2, T3, ...Tr with the
observation period ending at time Tend>Tr. Calculate
Formal tests Case Study 1: Reliability Test Improvement Data (Continued from
generally earlier)
confirm the
subjective The failure data and Trend plots and Duane plot were shown earlier.
information The observed failure times were: 5, 40, 43, 175, 389, 712, 747, 795,
conveyed by 1299 and 1478 hours, with the test ending at 1500 hours.
trend plots Reverse Arrangement Test: The inter arrival times are: 5, 35, 3, 132,
214, 323, 35, 48, 504 and 179. The number of reversals is 33, which,
according to the table above, is just significant at the 95% level.
The Military Handbook Test: The Chi Square test statistic using the
formula given above is 37.23 with 20 degrees of freedom. The
Dataplot expression
LET PERCENTILE = 100*CHSCDF(37.23,20)
yields a significance level of 98.9%. Since the Duane Plot looked very
reasonable, this test probably gives the most accurate significance
assessment of how unlikely it is that sheer chance produced such an
apparent improvement trend (only about 1.1% probability).
Simple In general, use the simplest model (fewest parameters) you can. When
models are you have chosen a model, use visual tests and formal statistical fit tests
often the to confirm the model is consistent with your data. Continue to use the
best model as long as it gives results that "work," but be quick to look for a
new model when it is clear the old one is no longer adequate.
There are some good quotes that apply here:
Quotes from "All models are wrong, but some are useful." - George Box, and the
experts on principle of Occam's Razor (attributed to the 14th century logician
models William of Occam who said “Entities should not be multiplied
unnecessarily” - or something equivalent to that in Latin).
A modern version of Occam's Razor is: If you have two theories which
both explain the observed facts then you should use the simplest until
more evidence comes along - also called the Law of Parsimony.
Finally, for those who feel the above quotes place too much emphasis on
simplicity, there are several appropriate quotes from Albert Einstein:
"Make your theory as simple as possible, but no simpler"
"For every complex question there is a simple and wrong
solution."
Bayesian Assumptions:
assumptions
for the 1. Failure times for the system under investigation can be adequately modeled
gamma by the exponential distribution. For repairable systems, this means the HPP
exponential model applies and the system is operating in the flat portion of the bathtub
system curve. While Bayesian methodology can also be applied to non repairable
model component populations, we will restrict ourselves to the system application in
this Handbook.
2. The MTBF for the system can be thought of as chosen from a prior
distribution model which is an analytic representation of our previous
information or judgments about the system's reliability. The form of this prior
model is the gamma distribution (the conjugate prior for the exponential
model). The prior model is actually defined for = 1/MTBF, since it is easier
to do the calculations this way.
3. Our prior knowledge is used to choose the gamma parameters a and b for the
prior distribution model for . There are many possible ways to convert
"knowledge" to gamma parameters, depending on the form of the "knowledge"
- we will describe three approaches.
Several i) If you have actual data from previous testing done on the system (or a
ways to system believed to have the same reliability as the one under
choose the investigation) this is the most credible prior knowledge, and the easiest to
prior use. Simply set the gamma parameter a equal to the total number of
gamma failuress from all the previous data, and set the parameter b equal to the
parameter total of all the previous test hours.
values
ii) A consensus method for determining a and b that works well is the
following: Assemble a group of engineers who know the system and its
sub components well from a reliability viewpoint.
❍ Call the reasonable MTBF MTBF50 and the low MTBF you are
95% confident the system will exceed MTBF05. These two
numbers uniquely determine gamma parameters a and b that have
percentile values at the right locations
Note: As we will see when we plan Bayesian tests, this weak prior is
actually a very friendly prior in terms of saving test time
Many variations are possible, based on the above three methods. For example,
you might have prior data from sources that you don't completely trust. Or you
might question whether the data really applies to the system under
investigation. You might decide to "weight" the prior data by .5, to "weaken" it.
This can be implemented by setting a = .5 x the number of fails in the prior data
and b = .5 times the number of test hours. That spreads out the prior distribution
more, and lets it react quicker to new test data.
Consequences
After a new No matter how you arrive at values for the gamma prior parameters a and b, the
test is run, method for incorporating new test information is the same. The new
the information is combined with the prior model to produce an updated or
posterior
posterior distribution modelfor .
gamma
parameters Under assumptions 1 and 2, when a new test is run with T system operating
are easily
obtained hours and r failures, the posterior distribution for is still a gamma, with new
from the parameters:
prior a' = a + r, b' = b + T
parameters
by adding In other words, add to a the number of new failures and add to b the number of
the new new test hours to get the new parameters for the posterior distribution.
number of
Use of the posterior distribution to estimate the system MTBF (with confidence,
fails to "a"
or prediction, intervals) is described in the section on estimating reliability
and the new
test time to using the Bayesian gamma model.
"b" Using EXCEL To Obtain Gamma Parameters
EXCEL can We will describe how to obtain a and b for the 50/95 method and indicate the
easily solve minor changes needed when any 2 other MTBF percentiles are used. The step
for gamma by step procedure is
prior 1. Calculate the ratio RT = MTBF50/MTBF05.
parameters
when using 2. Open an EXCEL spreadsheet and put any starting value guess for a in A1
the "50/95" - say 2.
consensus
method 3. Move to B1 and type the following expression:
= GAMMAINV(.95,A1,1)/GAMMAINV(.5,A1,1)
4. Press enter and a number will appear in B1. We are going to use the
"Goal Seek" tool EXCEL has to vary A1 until the number in B1 equals
RT.
5. Click on "Tools" (on the top menu bar) and then on "Goal Seek". A box
will open. Click on "Set cell" and highlight cell B1. $B$1 will appear in
the "Set Cell" window. Click on "To value" and type in the numerical
value for RT. Click on "By changing cell" and highlight A1 ($A$1 will
appear in "By changing cell"). Now click "OK" and watch the value of
the "a" parameter appear in A1.
6. Go to C1 and type
= .5*MTBF50*GAMMAINV(.5, A1, 2)
After clicking "OK", the value in A1 changes from 2 to 2.862978. This new
value is the prior a parameter. (Note: if the group felt 250 was a MTBF10 value,
instead of a MTBF05 value, then the only change needed would be to replace
0.95 in the B1 equation by 0.90. This would be the "50/90" method.)
The figure below shows what to enter in C1 to obtain the prior "b" parameter
value of 1522.46.
The gamma prior with parameters a = 2.863 and b = 1522.46 will have
(approximately) a probability of 50% of λ being below 1/600 = .001667 and a
probability of 95% of being below 1/250 = .004. This can be checked by
typing
=GAMMDIST(.001667,2.863,(1/1522.46), TRUE)
and
=GAMMDIST(.004,2.863,(1/1522.46), TRUE)
as described when gamma EXCEl functions were introduced in Section 1.
This example will be continued in Section 3, where the Bayesian test time
needed to confirm a 500 hour MTBF at 80% confidence, will be derived.
corresponding to the r-th row and the desired confidence level column.
For example, to confirm a 200 hour MTBF objective at 90% confidence,
allowing up to 4 failures on the test, the test length must be 200 × 7.99 =
1598 hours. If this is unacceptably long, try allowing only 3 fails for a
test length of 200 × 6.68 = 1336 hours. The shortest test would allow no
fails and last 200 × 2.3 = 460 hours. All these tests guarantee a 200 hour
MTBF at 90% confidence, when the equipment passes. However, the
shorter test are much less "fair" to the supplier, in that they have a large
chance of failing a marginally acceptable piece of equipment.
Lognormal The goal is to come up with a test plan (put n units on stress test for T
test plans, hours and accept the lot if no more than r failures occur). The following
assuming assumptions are made:
sigma and ● The life distribution model is lognormal
the
acceleration. ● Sigma = is known from past testing and does not vary
Factor is appreciably from lot to lot
known ● Lot reliability varies because T50's (the lognormal median or 50th
percentile) differ from lot to lot
● The acceleration factor from high stress to use stress is a known
quantity "A"
● A stress time of T hours is practical as a line monitor
Weibull test Weibull Case (shape parameter known): The assumptions and
plans, calculations are similar to those made for the lognormal:
assuming ● The life distribution model is Weibull
gamma and
the ● Gamma = is known from past testing and does not vary
acceleration. appreciably from lot to lot
factor is
known ● Lot reliability varies because 's (the Weibull characteristic life
or 62.3 percentile) differ from lot to lot
● The acceleration factor from high stress to use stress is a known
quantity "A"
● A stress time of T hours is practical as a line monitor
● A nominal use of u (combined with ) produces an
acceptable use CDF (or use reliability function). This is
equivalent to specifying an acceptable use CDF at, say, 100,000
hours to be a given value p0and calculating u
Rules of All that can be said here are some general rules of thumb:
thumb for 1. If you can observe at least 10 exact times of failure, estimates are
general usually reasonable - below 10 fails the critical shape parameter
lognormal may be hard to estimate accurately. Below 5 failures, estimates
or Weibull are often very inaccurate.
life test
2. With readout data, even with more than 10 total failures, you
planning
need failures in three or more readout intervals for accurate
estimates.
3. When guessing how many units to put on test for how long, try
out various reasonable combinations of distribution parameters to
see if the corresponding calculated proportion of failure expected
during the test, multiplied by the sample size, gives a reasonable
number of failures.
4. As an alternative to the last rule, simulate test data from
reasonable combinations of distribution parameters and see if
your estimates from the simulated data are close to the parameters
used in the simulation. If a test plan doesn't work well with
simulated data, it is not likely to work well with real data.
Make sure The reason the above is just a first pass estimate is it will give
test time unrealistic (too short) test times when a high is assumed. A very
makes
short reliability improvement test makes little sense because a minimal
engineering
number of failures must be observed before the improvement team can
sense
detrmine design and parts changes that will "grow" reliability. And it
takes time to implement these changes and observe an improved repair
rate.
Iterative Simulation methods can also be used to see if a planned test is likely to
simulation is generate data that will demonstrate an assumed growth rate.
an aid for
test
planning
Test Test planning and operation for a (multiple) stress cell life test
planning experiment consists of the following:
means ● Pick several combinations of the relevant stresses (the stresses
picking that accelerate the failure mechanism under investigation). Each
stress levels combination is a "stress cell". Note that you are planning for only
and sample one mechanism of failure at a time. Failures on test due to any
sizes and other mechanism will be considered censored run times.
test times to
produce ● Make sure stress levels used are not too high - to the point where
enough data new failure mechanisms that would never occur at use stress are
to fit models introduced. Picking a maximum allowable stress level requires
and make experience and/or good engineering judgment.
projections ● Put random samples of components in each stress cell and run the
components in each cell for fixed (but possibly different) lengths
of time.
● Gather the failure data from each cell and use it to fit an
acceleration model and a life distribution model and use these
models to project reliability at use stress conditions.
Normal use conditions are 4 volts and 25 degrees celsius, and the high
stress levels under consideration are 6, 8,12 volts and 85o, 105o and
125o. It probably would be a waste of resources to test at (6v, 85o), or
even possibly (8v, 85o) or (6v,105o), since these cells are not likely to
have enough stress acceleration to yield a reasonable number of failures
within typical test times.
If you write all the 9 possible stress cell combinations in a 3x3 matrix
with voltage increasing by rows and temperature increasing by columns,
the result would look like the matrix below:
Matrix Leading to "Backward L Design"
6v, 85o 6v, 105o 6v, 125o
8v, 85o 8v,105o 8v,125o
12v,85o 12v,105o 12v,125o
"Backwards The combinations in bold are the most likely design choices covering
L" designs the full range of both stresses, but still hopefully having enough
are common acceleration to produce failures. This is the so-called "backwards L"
in design commonly used for acceleration modeling experiments.
accelerated
life testing. Note: It is good design practice to put more of your test units in the
Put more lower stress cells, to make up for the fact that these cells will have a
experimental smaller proportion of units failing.
units in
lower stress
cells.
and passing the test means that the failure rate the upper 100×(1- )
percentile for the posterior gamma, has to equal the target failure rate 1/M.
But this percentile is, by definition, G-1(1- ;a',b'), where G-1 is the
inverse of the gamma distribution with parameters a', b'. We can find the
value of T that satisfies G-1(1- ;a',b'), = 1/M} by trial and error, or by
using "Goal Seek" in EXCEL. However, based on the properties of the
gamma distribution, it turns out that we can calculate T directly, by using
T = .5M × G-1 (1- ; 2a',.5) - b
Excel will Solving For T = Bayesian Test Time Using EXCEL or Dataplot
easily do
the The EXCEL expression for the required Bayesian test time to confirm a
required goal of M at 100×(1-a)% confidence, allowing r failures and assuming
calculations gamma prior parameters of a and b is
= .5*M*GAMMAINV((1- ),((a+r)),2) - b
and the equivalent Dataplot expression is
LET BAYESTIME = M*GAMPPF((1- ),(a+r)) - b.
Special Case: The Prior Has a = 1 (The "Weak" Prior)
When the There is a very simple way to calculate the required Bayesian test time,
prior is a when the prior is a weak prior with a = 1. Just use the Test Length Guide
weak prior Table to calculate the classical test time. Call this Tc. The Bayesian test
with a = 1, time T is just Tc minus the prior parameter b (i.e. T = Tc - b). If the b
the
parameter was set equal to (ln 2) × MTBF50 (where MTBF50 is the
Bayesian
test is consensus choice for an "even money" MTBF), then
always T = Tc - (ln 2) x MTBF50
shorter This shows that when a weak prior is used, the Bayesian test time is always
than the less then the corresponding classical test time. That is why this prior is also
classical known as a friendly prior.
test
Note: In general, Bayesian test times can be shorter, or longer, than the
corresponding classical test times, depending on the choice of prior
parameters. However, the Bayesian time will always be shorter when the
prior parameter a is less than, or equal to, 1.
Example: Calculating a Bayesian Test Time
EXCEL A new piece of equipment has to meet a MTBF requirement of 500 hours
example at 80% confidence. A group of engineers decide to use their collective
experience to come up with a Bayesian gamma prior using the 50/95
method described in Section 2. They think 600 hours is a likely MTBF
value and they are very confident that the MTBF will exceed 250.
Following the example in Section 2, they determine that the gamma prior
parameters are a = 2.863 and b = 1522.46.
Now they want to determine an appropriate test time, so that they can
confirm a MTBF of 500 with at least 80% confidence, provided they have
no more than 2 failures.
Using an EXCEL spreadsheet, type the expression
= .5*500*GAMMAINV(.8,((a+r)),2) - 1522.46
and the required test time of 1756 hours will appear (as shown below).
Using a To remove the subjectivity of drawing a line through the points, a least
computer squares (regression) fit can be done using the equations described in the
generated section on how special papers work. An example of this for the Weibull,
line fitting using the Dataplot FIT program, was also shown in that section. A SAS
routine JMP™ example of a Weibull plot for the same data is shown later in
removes this section.
subjectivity
and can Finally, if you have exact times and complete samples (no censoring),
lead directly Dataplot has built in Probability Plotting functions and a built in
to computer Weibull paper - examples were shown in the sections on the various life
parameter distribution models.
estimates
based on the
plotting
positions
Perhaps the The statistical properties of graphical estimates (i.e. how accurate are
worst they on the average) are not good
drawback of ● they are biased
graphical
● even with large samples, they do not become minimum variance
estimation is
(i.e. most accurate) estimates
you cannot
get ● graphical methods do not give confidence intervals for the
With small There are only two drawbacks to MLE's, but they are important ones:
samples, ● With small numbers of failures (less than 5, and sometimes less
MLE's may than 10 is small), MLE's can be heavily biased and the large
not be very sample optimality properties do not apply
accurate
● Calculating MLE's often requires solving complex non-linear
and may
equations requiring specialized software. This is less of a problem
even
as time goes by, as more statistical packages are upgrading to
generate a
have MLE analysis capability every year.
line that lies
above or
below the
data points
where C is a constant that plays no role when solving for the MLE's.
Note that with no censoring, the likelihood reduces to just the product of
the densities, each evaluated at a failure time. For Type II Censored
Data, just replace T above by the random end of test time tr.
MLE for the MLE's for the Exponential Model (Type I Censoring):
exponential
model
parameter
turns out
to be just
(total # of
failures)
divided by
(total unit
test time)
Note: The MLE of the failure rate (or repair rate) in the exponential case
turns out to be the total number of failures observed divided by the total
unit test time. For the MLE of the MTBF, take the reciprocal of this or
use the total unit test hours divided by the total observed failures.
There are examples of Weibull and lognormal MLE analysis, using SAS
JMP™ software, later in this section.
You can obtain a copy of this JMP worksheet by clicking here mleex.jmp (if your
browser is configured to bring up JMP automatically, you can try out the example as you
read about it).
2. Click on Analyze, choose "Survival" and then choose "Kaplan - Meier Method". Note:
Some software packages (and other releases of JMP) might use the name "Product Limit
Method" or "Product Limit Survival Estimates", instead of the equivalent name
"Kaplan-Meier".
3. In the Box that appears, select the columns from mleex that correspond to "Time",
"Censor" and "Freq", put them in the corresponding slots on the right (see below) and
click "OK".
4. Click "OK" and the analysis results appear. You may have to use the "check mark" tab
on the lower left to select Weibull Plot (other choices are Lognormal and Exponential).
You may also have to open the tab next to the words "Weibull Plot" and select "Weibull
Estimates". The results are shown below.
Note: JMP uses the parameter for the Weibull characteristic life (as does Dataplot),
and the parameter for the shape (Dataplot uses ). The Extreme Value distribution
parameters estimates are for the distribution of "ln time to fail" and have the relationship
5. There is an alternate way to get some of the same results that can also be used to fit
models when there are additional "effects" such as temperature differences or vintage or
plant of manufacture differences. Instead of clicking "Kaplan - Meier Method" in step 2,
chose "Parametric Model" after selecting "Survival" from the "Analysis" choices. The
screen below appears. Repeat step 3 and make sure "Weibull" appears as the "Get
Model" choice. In this example, there are no other effects to "Add" ( the acceleration
model example later on will illustrate how to add a temperature effect). Click "Run
Model" to get the results below. This time, you need to use the check symbol tab to
obtain confidence limits. Only the Extreme Value distribution parameter estimates are
displayed.
Limitations Notes:
and a
warning 1. The built in reliability analysis routine that are currently a part of JMP only handle
about the exact time of failure data with possible right censoring. However, use of templates
Likelihood (provided later in the Handbook) for either Weibull or lognormal data, extends JMP
calculation analysis capabilities to handle readout (interval) data and any type of censoring or
in JMP truncation. This will be described in the acceleration model example later on.
2. The "Model Fit" screen for the Weibull model gives a value for -Loglikelihood for the
Weibull fit. This should be the negative of the maximized likelihood function. However,
JMP leaves out a term consisting of the sum of all the ln time of failures in the data set.
This does not affect the calculation of MLE's or confidence bounds but can be confusing
when comparing results between different software packages. In the example above, the
sum of the ln times is ln 55 + ln 187 + . . . + ln 386 = 55.099 and the correct maximum
log likelihood is - (20.023 + 55.099) = - 75.122.
3. The omission of the sum of the ln times of failure in the likelihood also occurs when
fitting lognormal and exponential models.
4. Different releases of JMP may, of course, operate somewhat differently. The analysis
shown here used release 3.2.2.
Conclusions
MLE analysis is an accurate and easy way to estimate life distribution parameters,
provided a good software analysis package is available. The package should also
calculate confidence bounds and loglikelihood values. JMP has this capability, as well as
several other commercial statistical analysis packages.
So, if we run several stress cells and compute T50's for each cell, a plot of the natural log
of these T50's vs the corresponding 1/kT values should be roughly linear with a slope of
H and an intercept of ln A. In practice, a computer fit of a line through these points is
typically used to get the Arrhenius model estimates. There are even commercial
Arrhenius graph papers that have a temperature scale in 1/kT units and a T50 scale in log
units, but it is easy enough to make the transformations and then use linear or log linear
papers. Remember that T is in Kelvin in the above equations. For temperature in celsius,
use the following for 1/kT: 11605/(TCELSIUS + 273.16)
An example will illustrate the procedure.
Graphical Estimation: An Arrhenius Model Example:
Arrhenius Component life tests were run at 3 temperatures: 85°C, 105°C and 125°C. The lowest
model temperature cell was populated with 100 components; the 105° cell had 50 components
example and the highest stress cell had 25 components. All tests were run until either all the units
in the cell had failed or 1000 hours was reached. Acceleration was assumed to follow an
Arrhenius model and the life distribution model for the failure mode was believed to be
lognormal. The normal operating temperature for the components is 25°C, and it is
desired to project the use CDF at 100,000 hours.
Test results:
Cell 1 (85°C): 5 failures at: 401, 428, 695, 725 and 738 hours. 95 units were censored at
1000 hours running time.
Cell 2 (105°C): 35 failures at 171, 187, 189, 266, 275, 285, 301, 302, 305, 316, 317, 324,
349, 350, 386, 405, 480, 493, 530, 534, 536, 567, 589, 598, 599, 614, 620, 650, 668,
685, 718, 795, 854, 917, and 926 hours. 15 units were censored at 1000 hours running
time.
Cell3 (125°C): 24 failures at 24, 42, 92, 93, 141, 142, 143, 159, 181, 188, 194, 199, 207,
213, 243, 256, 259, 290, 294, 305, 392, 454, 502 and 696. 1 unit was censored at 1000
hours running time.
Failure analysis confirmed that all failures were due to the same failure mechanism (if
any failures due to another mechanism had occurred, they would have been considered
censored run times in the Arrhenius analysis).
Steps to Fitting the Distribution Model and the Arrhenius Model:
● Do graphical plots for each cell and estimate T50's and sigma's as previously
discussed.
● Put all the plots on the same sheet of graph paper and check whether the lines are
roughly parallel (a necessary consequence of true acceleration).
● If satisfied from the plots that both the lognormal model and the constant sigma
from cell to cell are consistent with the data, plot the cell ln T50's versus the
11605/(TCELSIUS + 273.16) cell values, check for linearity and fit a straight line
through the points. Since the points have different degrees of precision, because
different numbers of fails went into their calculation, it is recommended that the
number of fails in each cell be used as weights in a regression program, when
fitting a line through the points.
● Use the slope of the line as the H estimate and calculate the Arrhenius A
constant from the intercept using A = eintercept .
● Estimate the common sigma across all the cells by the weighted average of the
individual cell sigma estimates. Use the number of failures in a cell divided by the
total number of failures in all cells, as that cells weight. This will allow cells with
more failures to play a bigger role in the estimation process.
Note that the lines are reasonably straight (a check on the lognormal model) and the
slopes are approximately parallel (a check on the acceleration assumption).
The cell ln T50 and sigma estimates are obtained from the FIT function as follows:
FIT Y1 X1
FIT Y2 X2
FIT Y3 X3
Each FIT will yield a cell Ao = the ln T50 estimate and A1 = the cell sigma estimate.
These are summarized in the table below.
Summary of Least Squares Estimation of Cell Lognormal Parameters
Cell Number ln T50 Sigma
1 (T = 85) 8.168 .908
2 (T = 105) 6.415 .663
3 (T = 125) 5.319 .805
The three cells have 11605/(T + 273.16) values of 32.40, 30.69 and 29.15 in cell number
order. The Dataplot commands to generate the Arrhenius plot are:
LET YARRH = DATA 8.168 6.415 5.319
LET XARRH = DATA 32.4 30.69 29.15
TITLE = ARRHENIUS PLOT OF CELL T50'S
With only three cells, it is unlikely a straight line through the points will present obvious
visual lack of fit. However, in this case, the points appear to line up very well.
Finally, the model coefficients are computed from
LET SS = DATA 5 35 24
WEIGHT = SS
FIT YARRH XARRH
This will yield a ln A estimate of -18.312 (A = e-18.312 = .1115x10-7) and a H estimate
of .808. With this value of H, the acceleration between the lowest stress cell of 85°C
and the highest of 125°C is
which is almost 14× acceleration. Acceleration from 125 to the use condition of 25°C is
3708× . The use T50 is e-18.312 x e.808x11605x1/298.16= e13.137 = 507380.
A single sigma estimate for all stress conditions can be calculated as a weighted average
of the 3 sigma estimates obtained from the experimental cells. The weighted average is
(5/64) × .908 + (35/64) × .663 + (24/64) × .805 = .74.
Fitting More Complicated models
Models Two stress models, such as the temperature /voltage model given by
involving
several
stresses can
be fit using
multiple
regression need at least 4 or five carefully chosen stress cells to estimate all the parameters. The
Backwards L design previously described is an example of a design for this model. The
bottom row of the "backward L" could be used for a plot testing the Arrhenius
temperature dependence, similar to the above Arrhenius example. The right hand column
could be plotted using y = ln T50 and x = ln V, to check the voltage term in the model.
The overall model estimates should be obtained from fitting the multiple regression
model
The Dataplot command for fitting this model, after setting up the Y, X1 = X1, X2 = X2
data vectors, is simply
FIT Y X1 X2
and the output gives the estimates for b0, b1 and b2.
Three stress models, and even Eyring models with interaction terms, can be fit by a
direct extension of these methods. Graphical plots to test the model, however, are less
likely to be meaningful as the model gets more complex.
The The Likelihood equation for a multicell acceleration model starts by computing the
maximum Likelihood functions for each cell, as was described earlier. Each cell will have
likelihood unknown life distribution parameters that, in general, are different. For example, if a
method can lognormal model is used, each cell might have its own T50 and .
be used to
estimate Under an acceleration assumption, however, all the cells contain samples from
distribution populations that have the same value of (the slope does not change for different
and stress cells). Also, the T50's are related to one another by the acceleration model ; they
acceleration all can be written using the acceleration model equation with the proper cell stresses put
model in.
parameters
at the same To form the Likelihood equation under the acceleration model assumption, simply
time rewrite each cell Likelihood by replacing each cell T50 by its acceleration model
equation equivalent and replacing each cell sigma by the same one overall . Then,
multiply all these modified cell Likelihoods together to get the overall Likelihood
equation.
Once you have the overall Likelihood equation, the maximum likelihood estimates of
sigma and the acceleration model parameters are the values that maximize this
Likelihood. In most cases, these values are obtained by setting partial derivatives of the
log Likelihood to zero and solving the resulting (non-linear) set of equations.
The method As you can see, the procedure is complicated and computationally intensive, and only
is practical if appropriate software is available. It does have many desirable features such
complicated as:
and ● the method can, in theory at least, be used for any distribution model and
requires acceleration model and type of censored data
specialized
● estimates have "optimal" statistical properties as sample sizes (i.e. numbers of
software
failures) get large
● approximate confidence bounds can be calculated
● statistical tests of key assumptions can be made using the likelihood ratio test.
Some common tests are:
❍ the life distribution model vs another simpler model with fewer parameters
(i.e. a 3-parameter Weibull versus a 2-parameter Weibull, or a 2-parameter
Weibull vs an exponential)
❍ the constant slope from cell to cell consequence of typical acceleration
models
❍ the fit of a particular acceleration model
Arrhenius The data from the 3 stress cell Arrhenius example given in the preceding section were
example analyzed using a proprietary MLE program that could fit individual cells and also do an
comparing overall Arrhenius fit. The tables below compare results.
graphical
and MLE
method
Graphical Estimates MLE's
results
ln T50 Sigma ln T50 Sigma
Note that when there were a lot of failures and little censoring, the two methods are in
fairly close agreement. Both methods were also in close agreement on the Arrhenius
model results. However, even small differences can be important when projecting
reliability numbers at use conditions. In this example, the CDF at 25°C and 100,000
hours projects to .014 using the graphical estimates, and only .003 using the MLE
estimates.
MLE The Maximum Likelihood program also tested whether parallel lines (a single sigma)
method tests was reasonable, and whether the Arrhenius model was acceptable. The three cells of
models and data passed both of these Likelihood Ratio tests easily. In addition the MLE program
gives output included confidence intervals for all estimated parameters.
confidence
intervals SAS JMP™ software (previously used to find single cell Weibull MLE's) can also be
used for fitting acceleration models. This is shown next.
Using SAS JMP™Software To Fit Reliability Models
Detailed If you have JMP on your computer, set up to run as a browser application, click here to
explanation load a lognormal template JMP spreadsheet named arrex.jmp. This template has the
of how to Arrhenius example data already entered. The template extends JMP's analysis
use JMP capabilities beyond the standard JMP routines by making use of JMP's powerful
software to "Nonlinear Fit" option (links to blank templates for both Weibull and lognormal data
fit an are provided at the end of this page).
Arrhenius
model First a standard JMP reliability model analysis for this data will be shown. By working
with windows showing JMP and the Handbook, you can try out the steps in this
analysis as you read them.
The first part of the spread sheet should look as illustrated below.
Steps For Fitting The Arrhenius Model Using JMP's "Survival" Options
1. The "Start Time" column has all the fail and censor times and "Censor" and "Freq"
were entered as shown previously. In addition, the temperatures in degrees C
corresponding to each row were entered in "Temp in C". That is all that has to be
entered on the template; all other columns are calculated as needed. In particular, the
"1/kT" column contains the standard Arrhenius 1/kT values for the different temperature
cells.
2. To get a plot of all three cells, along with individual cell lognormal parameter
estimates, choose "Kaplan - Meier" (or "Product Limit") from the "Analysis" menu and
fill in the screen as shown below.
Column names are transferred to the slots on the right by highlighting them and clicking
on the tab for the slot. Note that the "Temp in C" column is transferred to the
"Grouping" slot in order to analyze and plot each of the three temperature cells
separately.
Clicking "OK" brings up the analysis screen below. All plots and estimates are based on
individual cell data, without the Arrhenius model assumption. Note: to obtain the
lognormal plots and parameter estimates and confidence bounds, it was necessary to
click on various "tabs" or "check" marks - this may depend on the software release
level.
This screen does not give -LogLikelihood values for the cells. These are obtained from
the "Parametric Model" option in the "Survival" menu (after clicking "Analyze").
3. First we will use the "Parametric Model" option to obtain individual cell estimates.
On the JMP data spreadsheet (arrex.jmp), select all rows except those corresponding to
cell 1 (the 85 degree cell) and choose "Exclude" from the "Row" button options (or do
"ctrl+E). Then click "Analyze" followed by "Survival" and "Parametric Model". Enter
the appropriate columns, as shown below, make sure you use "Get Model" to select
"lognormal" and click "Run Model".
This will generate a model fit screen for cell 1. Repeat for cells 2 and three. The three
resulting model fit screens are shown below.
Note that the model estimates and bounds are the same as obtained in step 2, but these
screens also give -LogLikelihood values. Unfortunately, as previously noted, these
values are off by the sum of the {ln times of failure} for each cell. These sums for the
three cells are 31.7871, 213.3097 and 371.2155, respectively. So the correct cell
-LogLikelihood values for comparing with other MLE programs are 53.3546, 265.2323
and 156.5250, respectively. Adding them together yields a total -LogLikelihood of
475.1119 for all the data fit with separate lognormal parameters for each cell (no
Arrhenius model assumption).
4. To fit the Arrhenius model across the three cells go back to the survival model
screen, this time with all the data rows included and the "1/kT" column selected and put
into the "Effects in Model" box via the "Add" button. This adds the Arrhenius
temperature effect to the MLE analysis of all the cell data. The screen looks like:
The MLE estimates agree with those shown in the tables earlier on this page. The
-LogLikelihood for the model is given under "Full" in the output screen (and should be
adjusted by adding the sum of all the ln failure times from all three cells if comparisons
to other programs might be made). This yields a model -LogLikelihood of 105.4934 +
371.2155 = 476.7089.
5. The likelihood ratio test statistic for the Arrhenius model fit (which also incorporates
the single sigma acceleration assumption) is - 2Log , where is the difference
between the LogLikelihoods with and without the Arrhenius model assumption. Using
the results from steps 3 and 4, we have - 2Log = 2 × (476.709 - 475.112) = 3.194.
The degrees of freedom for the Chi Square test statistic are 6 - 3 = 3, since six
parameters were reduced to three under the acceleration model assumption. The chance
of getting a value 3.194 or higher is 36.3% for a Chi Square distribution with 3 D.O.F.,
which indicates an acceptable model (no significant lack of fit).
This completes a JMP Arrhenius model analysis of the three cells of data. Since the
Survival Modeling screen allows any "effects" to be included in the model, if different
cells of data had different voltages, the "ln V" column could be added as an effect to fit
the Inverse Power Law voltage model. In fact, several effects can be included at once, if
more than one stress varies across cells. Cross product stress terms could also be
included by adding these columns to the spreadsheet and adding them in the model as
additional "effects".
Arrhenius Steps For Fitting The Arrhenius Model Using the "Nonlinear Fit" Option and
example Special JMP Templates
using
special JMP There is another powerful and flexible tool included within JMP that can use MLE
template methods to fit reliability models. While this method requires some simple programming
and of JMP calculator equations, it offers the advantage of extending JMP's analysis
"Nonlinear capabilities to readout data (or truncated data, or any combination of different types of
Fit" data). Templates (available below) have been set up to cover lognormal and Weibull
data. The spreadsheet used above (arrex.jmp) is just the lognormal template, with the
Arrhenius data entered.
The following steps work with arrex.jmp because the "loss" columns have been set up
to calculate -LogLikelihoods for each row.
1. Load the arrex.jmp spreadsheet and Click "Analyze" on the Tool Bar and choose
"Nonlinear Fit".
2. Select the Loss (w/Temp) column and click "Loss" to put "Loss (w/Temp)" in the
box. This column on the spreadsheet automatically calculates the - LogLikelihood
values at each data point for the Arrhenius/lognormal model. Click "OK" to run the
Nonlinear Analysis.
3. You will next get a "Nonlinear Fit" screen. Select "Loss is -LogLikelihood" and click
the "Reset" and "Go" buttons to make sure you get a new analysis. The parameter
values for the constant Ln A (labeled "Con"), ∆H and sig will appear and the value of -
LogLikelihood is given under the heading "SSE". These numbers are -19.91, 0.863,
0.77 and 476.709, respectively. You can now click on "Confid Limits" to get upper and
lower confidence limits for these parameters. The stated value of "Alpha = .05" means
that the interval between the limits is a 95% confidence interval. At this point your
"Nonlinear Fit" screen looks as follows
4. Next you can run each cell separately by excluding all data rows corresponding to
other cells and repeating steps 1 through 3. For this analysis, select the "Loss (w/o
Stress)" column to put in "Loss" in step 2, since a single cell fit does not use
temperature . The numbers should match the table shown earlier on this page. The three
cell -LogLikelihood values are 53.355, 265.232 and 156.525. These add to 475.112,
which is the minimum -log likelihood possible, since it uses 2 independent parameters
to fit each cell separately (for a total of six parameters, overall).
The likelihood ratio test statistic for the Arrhenius model fit (which also incorporates
the single sigma acceleration assumption) is - 2Log λ = 2 x (476.709 - 475.112) =
3.194. Degrees of freedom for the Chi Square test statistic are 6 - 3 = 3, since six
parameters were reduced to three under the acceleration model assumption. The chance
of getting a value 3.194 or higher is 36.3% for a Chi Square distribution with 3 D.O.F.,
which indicates an acceptable model (no significant lack of fit).
For further examples of JMP reliability analysis there is an excellent collection of JMP
statistical tutorials put together by Professor Ramon Leon and one of his students, Barry
Eggleston, available on the Web at
http://www.nist.gov/cgi-bin/exit_nist.cgi?url=http://web.utk.edu/~leon/jmp/.
Data entry How To Use JMP Templates For Lognormal or Weibull Data (Including
on JMP Acceleration Model Analysis)
templates
for general With JMP installed to run as a browser application, you can click on weibtmp.jmp or
reliability lognmtmp.jmp and load (and save for later use) blank templates similar to the one
data shown above, for either Weibull or lognormal data data analysis. Here's how to enter
any kind of data on either of the templates.
Typical Data Entry
1. Any kind of censored or truncated or readout data can be entered. The rules are as
follows for the common case of (right) censored reliability data:
i) Enter exact failure times in the "Start Time" column, with "0" in the
"Cens" column and the number of failures at that exact time in the "Freq"
column.
ii) Enter temperature in degrees Celsius for the row entry in "Temp in C",
whenever data from several different operating temperatures are present
and an Arrhenius model fit is desired.
iii) Enter voltages in "Volt" for each row entry whenever data from several
different voltages are present and an Inverse Power Law model fit is
desired. If both temperatures and voltages are entered for all data rows, a
combined two stress model can be fit.
iv) Put censor times (where an unfailed units are removed from test, or no
longer observed) in both the "Start Time" and "Stop Time" column, and
enter "1" in the "Cens" column. Put the number of censored units in the
"Freq" column.
v) If readout (also known as interval) data is present, put the interval start
time and stop time in the corresponding columns and "2" in the "Cens"
column. Put the number of failures during the interval in the "Freq"
column. If the number of failures is zero, it doesn't matter if you include
the interval, or not.
Using The Templates For Model Fitting
Pick the appropriate template; weibtmp.jmp for a Weibull fit, or lognmtmp.jmp for a
lognormal fit.
After all the data have been entered, pick the appropriate "Loss" column and use the
Non Linear Fit analysis option (steps 1 - 4 above).
The "Loss (w/o Stress)" column is appropriate when only a straight life
distribution fit is desired, with no acceleration model.
The "Loss (w/Temp)" column handles an Arrhenius model (no other stress
taken into account). The "Loss (w/volts)" column handles an Inverse
Voltage Power Law model (no other stress taken into account).
The "Loss (Temp/Volt)" column handles a combined Arrhenius/Inverse
Voltage Power Law model fit.
All analyses are done the same as the 4 step "Non Linear Fit" example shown above.
A few tricks are needed to handle the rare cases of truncated data or left censored data -
these are described next.
How to JMP Template Data Entry For Truncated or Left Censored Weibull or
handle Lognormal Data
truncated or
left Left censored data means all exact times of failure below a lower cut off time T0 are
censored unknown, but the number of these failures is known. Merely enter an interval with start
data using time 0 and stop time T0 on the appropriate template and put "2" in the "Cens" column
JMP and the number in the "Freq" column.
templates
Left truncated data means all data points below a lower cut off point T0 are unknown,
and even the number of such points is unknown. This situation occurs commonly for
measurement data, when the measuring instrument has a lower threshold detection limit
at T0. Assume there are n data points (all above T0) actually observed. Enter the n
points as you normally would on the appropriate template ("Cens" gets 0 and "Freq"
gets 1) and add a start time of T0 with a "Cens" value of 1 and a "Freq" value of -n (yes,
minus n!).
Right truncated data means all data points above an upper cut off point T1 are
unknown, and even the number of such points is unknown. Assume there are n data
points (all below T1) actually observed. Enter the n points as you normally would on the
appropriate template ("Cens" gets 0 and "Freq" gets 1) and add a start time of 0 and a
stop time of T1 with a "Cens" value of 2 and a "Freq" value of -n (yes, minus n!)
● Advantages
● Drawbacks
● A simple method
● A more accurate approach for a special case
● Example
More details can be found in Nelson (1990, pages 521-544) or Tobias and Trindade (1995,
pages 197-203).
You need a Two common assumptions typically made when degradation data are modeled are the
measurable following:
parameter 1. A parameter D, that can be measured over time, drifts monotonically (upwards, or
that drifts downwards) towards a specified critical value DF. When it reaches DF, failure occurs.
(degrades)
2. The drift, measured in terms of D, is linear over time with a slope (or rate of
linearly to a
degradation) R, that depends on the relevant stress the unit is operating under and also
critical
the (random) characteristics of the unit being measured. Note: it may be necessary to
failure value
define D as a transformation of some standard parameter in order to obtain linearity -
logarithms or powers are sometimes needed.
The figure below illustrates these assumptions by showing degradation plots of 5 units on
test. Degradation readings for each unit are taken at the same four time points and straight
lines fit through these readings on a unit by unit basis. These lines are then extended up to a
critical (failure) degradation value. The projected times of failure for these units are then read
off the plot. The are: t1, t2, ...,t5.
Plot of
linear
degradation
trends for 5
units read
out at four
time points
In many practical situations, D starts at 0 at time zero, and all the linear theoretical
degradation lines start at the origin. This is the case when D is a "% change" parameter, or
failure is defined as a change of a specified magnitude in a parameter, regardless of its
starting value. Lines all starting at the origin simplify the analysis, since we don't have to
characterize the population starting value for D, and the "distance" any unit "travels" to reach
failure is always the constant DF. For these situations, the degradation lines would look as
follows:
Often, the
degradation
lines go
through the
origin - as
when %
change is the
measurable
parameter
increasing to
a failure
level
It is also common to assume the effect of measurement error, when reading values of D, has
relatively little impact on the accuracy of model estimates.
Advantages of Modeling Based on Degradation Data
Modeling 1. Every degradation readout for every test unit contributes a data point. This leads to
based on large amounts of useful data, even if there are very few failures.
complete 2. You don't have to run tests long enough to obtain significant numbers of failures.
samples of
3. You can run low stress cells that are much closer to use conditions and obtain
measurement
meaningful degradation data. The same cells would be a waste of time to run if failures
data, even
were needed for modeling. Since these cells are more typical of use conditions, it
with low
makes sense to have them influence model parameters.
stress cells,
offers many 4. Simple plots of degradation vs time can be used to visually test the linear degradation
advantages assumption.
Drawbacks to Modeling Based on Degradation Data
failure data. This probably means that the failure mechanism depends on more than a
simple continuous degradation process.
Because of the last listed drawback, it is a good idea to have at least one high stress cell
where enough real failures occur to do a standard life distribution model analysis. The
parameters obtained can be compared to the predictions from the degradation data analysis,
as a "reality" check.
A Simple Method For Modeling Degradation Data
A simple 1. As shown in the figures above, fit a line through each unit's degradation readings. This
approach is can be done by hand, but using a least squares regression program is better (like
to extend Dataplot's "LINEAR FIT Y X" or EXCEL's line fitting routines).
each units 2. Take the equation of the fitted line, substitute DF for Y and solve for X. This value of
degradation X is the "projected time of fail" for that unit.
line until a
3. Repeat for every unit in a stress cell until a complete sample of (projected) times of
projected
failure is obtained for the cell.
"failure
time" is 4. Use the failure times to compute life distribution parameters for a cell. Under the fairly
obtained typical assumption of a lognormal model, this is very simple. Take natural logarithms
of all failure times and treat the resulting data as a sample from a normal distribution.
Compute the sample mean and the sample standard deviation. These are estimates of
ln T50 and , respectively, for the cell.
5. Assuming there are k cells with varying stress, fit an appropriate acceleration model
using the cell ln T50's, as described in the graphical estimation section. A single sigma
estimate is obtained by taking the square root of the average of the cell estimates
(assuming the same number of units each cell). If the cells have nj units on test, where
the nj's are not all equal, use the pooled sum of squares estimate across all k cells
calculated by
A More Accurate Regression Approach For the Case Where D = 0 at time 0 and the
"Distance To Fail" DF is the Same for All Units
Models can Let the degradation measurement for the i-th unit at the j-th readout time in the k-th stress
be fit using cell be given by Dijk, and let the corresponding readout time for this readout be denoted by tjk
all the . That readout gives a degradation rate (or slope) estimate of Dijk / tjk. This follows from the
degradation linear assumption or:
readings and
linear (Rate of degradation) × (Time on test) = (Amount of degradation)
regression
Based on that readout alone, an estimate of the natural logarithm of the time to fail for that
unit is
yijk = ln DF - (ln Dijk - ln tjk).
This follows from the basic formula connecting linear degradation with failure time
(rate of degradation) × (time of failure) = DF
by solving for (time of failure) and taking natural logarithms.
where the xk values are 1/KT. Here T is the temperature of the k-th cell, measured in Kelvin
(273.16 + degrees Celsius) and K is Boltzmann's constant (8.617 × 10-5 in eV/ unit Kelvin).
Use a linear regression program to estimate a = ln A and b = h. If we further assume tf has
a lognormal distribution, the mean square residual error from the regression fit is an
An example A component has a critical parameter that studies show degrades linearly over time at a rate
using the that varies with operating temperature. A component failure based on this parameter occurs
regression when the parameter value changes by 30% or more. Fifteen components were tested under 3
approach to different temperature conditions (5 at 65o, 5 at 85o and the last 5 at 105o). Degradation
fit an percent values were read out at 200, 500 and 1000 hours. The readings are given by unit in
Arrhenius the following three temperature cell tables.
model
65 Degrees C
200 hr 500 hr 1000 hr
Unit 1 .87 1.48 2.81
Unit 2 .33 .96 2.13
Unit 3 .94 2.91 5.67
Unit 4 .72 1.98 4.28
Unit 5 .66 .99 2.14
85 Degrees C
200 hr 500 hr 1000 hr
Unit 1 1.41 2.47 5.71
Unit 2 3.61 8.99 17.69
Unit 3 2.13 5.72 11.54
Unit 4 4.36 9.82 19.55
Unit 5 6.91 17.37 34.84
105 Degrees C
200 hr 500 hr 1000 hr
Unit 1 24.58 62.02 124.10
Unit 2 9.73 24.07 48.06
Unit 3 4.74 11.53 23.72
Unit 4 23.61 58.21 117.20
Unit 5 10.90 27.85 54.97
Note that one unit failed in the 85 degree cell and 4 units failed in the 105 degree cell.
Because there were so few failures, it would be impossible to fit a life distribution model in
any cell but the 105 degree cell, and therefore no acceleration model can be fit using failure
data. We will fit an Arrhenius/Lognormal model, using the degradation data.
Dataplot Solution:
Dataplot From the above tables, first create a data row of 45 degradation values starting with the first
easily fits the row in the first table and proceeding to the last row in the last table. Put these in a text file
model to the called DEGDAT. DEGDAT has one row of 45 numbers looking like the following: .87, .33,
degradation .99, .72, .66, 1.28, .96, 2.91, 1.98, .99, . . . , 124.10, 48.06, 23.72, 117.20, 54.97.
data
Next, create a text file TEMPDAT, containing the corresponding 45 temperatures. TEMP has
Other 15 repetitions of 65, followed by 15 repetitions of 85 and then 15 repetitions of 105.
regression
programs Finally, create a text file TIMEDAT, containing the corresponding readout times. These are
would work 200, 200, 200, 200, 200, 500, 500, 500, 500, 500, 1000, 1000, 1000, 1000, repeated 3 times.
equally well Assuming the data files just created are placed in the Dataplot directory, the following
commands will complete the analysis:.
READ DEGDAT. DEG
READ TEMPDAT. TEMP READ TIMEDAT. TIME LET YIJK = LOG(30) -
(LOG(DEG) - LOG(TIME)) LET XIJK = 100000/(8.617*(TEMP + 273.16))
LINEAR FIT YIJK XIJK
The output is (with unnecessary items edited out)
LEAST SQUARES POLYNOMIAL FIT
SAMPLE SIZE N = 45
DEGREE = 1
PARAMETER ESTIMATES (APPROX ST. DEV) tVALUE
1 A0 -18.9434 (1.833) -10
2 A1 .818774 (.5641e-01) 15
Arrhenius The Arrhenius example from the graphical estimation and the MLE
model estimation sections ended by comparing use projections of the CDF at
projection 100,000 hours. This is a projection of the first type. We know from the
example Arrhenius model assumption that the T50 at 25°C is just
with
Dataplot
commands
Using the graphical model estimates for ln A and we have
T50 at use = e-18.312 × e.808 × 11605/298.16 = e13.137 = 507380
and combining this T50 with the common sigma of .74 allows us to
easily estimate the CDF or failure rate after any number of hours of
operation at use conditions. In particular, the Dataplot command
LET Y = LGNCDF((T/T50),sigma)
evaluates a lognormal CDF at time T, and
LET Y = LGNCDF((100000/507380),.74)
returns the answer .014 given in the MLE estimation section as the
graphical projection of the CDF at 100,000 hours at a use temperature of
25°C.
If the life distribution model had been Weibull, the same type of
analysis would be done by letting the characteristic life parameter
vary with stress according to the acceleration model, while the shape
● the shape parameter (sigma for the lognormal, gamma for the
Weibull) is also known and does not change significantly from lot
to lot.
With these assumptions, we can take any proportion fails we see from a
high stress test and project a use CDF or failure rate. For a T hour high
stress test and an acceleration factor of A from high stress to use stress,
an observed proportion p is converted to a use CDF at 100,000 hours for
a lognormal model as follows:
LET T50STRESS = T *LGNPPF(p, )
LET CDF = LGNCDF((100000/(A*T50STRESS)), )
If the model is Weibull, the Dataplot commands are
LET ASTRESS = T*WEIPPF(p, )
LET CDF = WEICDF((100000/(A*ASTRESS)), )
analysis software package being used and this software covers the
models and situations of interest to the analyst.
Lifetime Regression Comparisons
Lifetime regression is similar to maximum likelihood and likelihood
ratio methods. Each sample is assumed to come from a population with
the same shape parameter and a wide range of questions about the scale
parameter (which is often assumed to be a "measure" of lot to lot or
vendor to vendor quality) can be formulated and tested for significance.
For a complicated, but realistic example, assume a company
manufactures memory chips and can use chips with some known defects
("partial goods") in many applications. However, there is a question of
whether the reliability of "partial good" chips is equivalent to "all good"
chips. There exists lots of customer reliability data to answer this
question - however the data is difficult to analyze because it contains
several different vintages with known reliability differences as well as
chips manufactured at many different locations. How can the partial
good vs all good question be resolved?
A lifetime regression model can be set up with variables included that
change the scale parameter based on vintage, location , partial vs all
good, and any other relevant variables. Then, a good lifetime regression
program will sort out which, if any, of these factors are significant and,
in particular, whether there is a significant difference between "partial
good" and "all good".
Software that will do lifetime regression is not widely available at this
time, however.
90% 95%
Num Lower for Upper for Lower for Upper for
Fails MTBF MTBF MTBF MTBF
0 0.3338 - 0.2711 -
Formulas Confidence bounds for the typical Type I censoring situation are obtained
for from chi-square distribution tables or programs. The formula for
confidence calculating confidence intervals is:
bound
factors -
even for
"zero fails"
case
These bounds are exact for the case of one or more repairable systems on
test for a fixed time. They are also exact when non repairable units are on
test for a fixed time, and failures are replaced with new units during the
course of the test. For other situations, they are approximate.
When there are zero failures during the test or operation time, only a
(one-sided) MTBF lower bound exists, and this is given by
MTBFlower = T/-ln
The interpretation of this bound is the following: if the true MTBF were
any lower than MTBFlower, we would have seen at least one failure
during T hours of test with probability at least 1- . Therefore, we are
100×(1- ) confident that the true MTBF is not lower than MTBFlower.
Dataplot A lower 100×(1- /2) confidence bound for the MTBF is given by
and
EXCEL LET LOWER = T*2/CHSPPF([1- /2],[2*(r+1)])
calculation where T is the total unit or system test time and r is the total number of
of failures.
confidence
limits The upper 100×(1- /2) confidence bound is
LET UPPER = T*2/CHSPPF( /2,[2*r])
and (LOWER, UPPER) is a 100×(1- ) confidence interval for the true
MTBF.
The same calculations can be done with EXCEL built in functions with
the commands
=T*2/CHINV([ /2],[2*(r+1)]) for the lower bound and
=T*2/CHINV([1- /2],[2*r]) for the upper bound.
Note that the Dataplot CHSPPF function requires left tail probability
inputs (i.e. /2 for the lower bound and 1- /2 for the upper bound),
while the EXCEL CHINV function requires right tail inputs (i.e. 1- /2
for the lower bound and /2 for the upper bound).
Example
Example A system was observed for two calendar months of operation, during
showing which time it was in operation for 800 hours and it had 2 failures.
how to
calculate The MTBF estimate is 800/2 = 400 hours. A 90% confidence interval is
confidence given by (400×.3177, 400×5.6281) = (127, 2251). The same interval
limits could have been obtained using the Dataplot commands
LET LOWER = 1600/CHSPPF(.95,6)
LET UPPER = 1600/CHSPPF(.05,4)
or the EXCEL commands
=1600/CHINV(.05,6) for the lower limit
=1600/CHINV(.95,4) for the upper limit.
Note that 127 is a 95% lower limit for the true MTBF. The customer is
usually only concerned with the lower limit and one-sided lower limits
are often used for statements of contractual requirements.
Zero fails What could we have said if the system had had no failures? For a 95%
confidence lower confidence limit on the true MTBF, we either use the 0 fails factor
limit from the 90% confidence interval table and calculate 800 × .3338 = 267
calculation or we use T/-ln = 800/-ln .05 = 267.
The estimated MTBF at the end of the test (or observation) period is
where is the upper 100×(1- /2) percentile point of the standard normal
distribution.
● TIMES = the actual system times of failure (the start of test is time 0)
Dataplot This case study was introduced in section 2, where we did various plots of the
macro used data, including a Duane Plot. The case study was continued when we discussed
to fit Power trend tests and verified that significant improvement had taken place. Now we
Law to Case will use Powerest.dp to complete the case study data analysis.
Study 1 data
The observed failure times were: 5, 40, 43, 175, 389, 712, 747, 795, 1299 and
1478 hours, with the test ending at 1500 hours. The Dataplot session looks as
follows:
LET T = 1500
LET R = 10
LET TIMES = DATA 5 40 43 175 389 712 747 795
1299 1478
CALL POWEREST.DP
IF NO, ENTER 0
IF YES, ENTER 1
1
Review of The goal of Bayesian reliability procedures is to obtain as accurate a posterior distribution as
Bayesian possible, and then use this distribution to calculate failure rate (or MTBF) estimates with
procedure confidence intervals (called credibility intervals by Bayesians). The figure below summarizes
for the the steps in this process.
gamma
exponential
system
model
How to Once the test has been run, and r failures observed, the posterior gamma parameters are:
estimate a' = a + r, b' = b + T
the MTBF
and a (median) estimate for the MTBF, using EXCEL, is calculated by
with
bounds, = 1/GAMMAINV(.5, a', (1/ b'))
based on Some people prefer to use the reciprocal of the mean of the posterior distribution as their estimate
the
for the MTBF. The mean is the minimum mean square error (MSE) estimator of , but using
posterior
the reciprocal of the mean to estimate the MTBF is always more conservative than the "even
distribution
money" 50% estimator.
A lower 80% bound for the MTBF is obtained from
= 1/GAMMAINV(.8, a', (1/ b'))
and, in general, a lower 100×(1- )% lower bound is given by
= 1/GAMMAINV((1-), a', (1/ b')).
A two sided 100×(1- )% credibility interval for the MTBF is
[{= 1/GAMMAINV((1- /2), a', (1/ b'))},{= 1/GAMMAINV(( /2), a', (1/ b'))}].
Finally, = GAMMADIST((1/M), a', (1/b'), TRUE) calculates the probability the MTBF is greater
than M.
Example
A Bayesian A system has completed a reliability test aimed at confirming a 600 hour MTBF at an 80%
example confidence level. Before the test, a gamma prior with a = 2, b = 1400 was agreed upon, based on
using testing at the vendor's location. Bayesian test planning calculations, (allowing up to 2 new
EXCEL to failures) called for a test of 1909 hours. When that test was run, there actually were exactly two
estimate failures. What can be said about the system?
the MTBF
and The posterior gamma CDF has parameters a' = 4 and b' = 3309. The plot below shows CDF
calculate values on the y-axis, plotted against 1/ = MTBF, on the x-axis. By going from probability, on
upper and the y-axis, across to the curve and down to the MTBF, we can read off any MTBF percentile
lower point we want. (The EXCEL formulas above will give more accurate MTBF percentile values
bounds than can be read off a graph.)
The test has confirmed a 600 hour MTBF at 80% confidence, a 495 hour MTBF at 90 %
confidence and (495, 1897) is a 90 percent credibility interval for the MTBF. A single number
(point) estimate for the system MTBF would be 901 hours. Alternatively, you might want to use
the reciprocal of the mean of the posterior distribution (b'/a') = 3309/4 = 827 hours as a single
estimate. The reciprocal mean is more conservative - in this case it is a 57% lower bound, as
=GAMMADIST((4/3309),4,(1/3309),TRUE) shows.
Crow, L.H. (1982), "Confidence Interval Procedures for the Weibull Process With
Applications to Reliability Growth," Technometrics, 24(1):67-72.
Crow, L.H. (1990), "Evaluating the Reliability of Repairable Systems," Proceedings
Annual Reliability and Maintainability Symposium, pp. 275-279.
Crow, L.H. (1993), "Confidence Intervals on the Reliability of Repairable Systems,"
Proceedings Annual Reliability and Maintainability Symposium, pp. 126-134
Duane, J.T. (1964), "Learning Curve Approach to Reliability Monitoring," IEEE
Transactions On Aerospace, 2, pp. 563-566.
Gumbel, E. J. (1954), Statistical Theory of Extreme Values and Some Practical
Applications, National Bureau of Standards Applied Mathematics Series 33, U.S.
Government Printing Office, Washington, D.C.
Hahn, G.J., and Shapiro, S.S. (1967), Statistical Models in Engineering, John Wiley &
Sons, Inc., New York
Hoyland, A., and Rausand, M. (1994), System Reliability Theory, John Wiley & Sons,
Inc., New York
Johnson, N.L., Kotz, S. and Balakrishnan, N. (1994), Continuous Univariate
Distributions Volume 1, 2nd edition, John Wiley & Sons, Inc., New York
Johnson, N.L., Kotz, S. and Balakrishnan, N. (1995), Continuous Univariate
Distributions Volume 2, 2nd edition, John Wiley & Sons, Inc., New York
Kaplan, E.L., and Meier, P. (1958), "Nonparametric Estimation From Incomplete
Observations," Journal of the American Statistical Association, 53: 457-481.
Kalbfleisch, J.D., and Prentice, R.L. (1980), The Statistical Analysis of Failure Data,
John Wiley & Sons, Inc., New York
Kielpinski, T.J., and Nelson, W.(1975), "Optimum Accelerated Life-Tests for the
Normal and Lognormal Life Distributins," IEEE Transactions on Reliability, Vol. R-24,
5, pp. 310-320
Klinger, D.J., Nakada, Y., and Menendez, M.A. (1990), AT&T Reliability Manual, Van
Nostrand Reihhold, Inc, New York
Kolmogorov, A.N. (1941), "On A Logarithmic Normal Distribution Law Of The
Dimensions Of Particles Under Pulverization," Dokl. Akad Nauk, USSR 31, 2, pp
99-101.
Kovalenko, I.N., Kuznetsov, N.Y., and Pegg, P.A. (1997), Mathematical Theory of
Reliability of Time Dependent Systems with Practical Applications, John Wiley & Sons,
Inc., New York
Landzberg, A.H., and Norris, K.C. (1969), "Reliability of Controlled Collapse