Failure Rates, MTBFS, and All That
Failure Rates, MTBFS, and All That
htm
Given any function g(x) with g(0) = 0 and increasing monotonically to infinity as x goes to
infinity, we can define a cumulative probability function by
As discussed in Failure Rates, MTBFs, and All That , the "rate" of occurrence for a given
density distribution is
This enables us to define a probability density distribution with any specified rate function
R(t). One useful two-parameter family of rate functions is given by
where a and b are constants. This is the so-called "Weibull" family of distributions, named
after the Swedish engineer Waloddi Weibull (1887-1979) who popularized its use for
reliability analysis, especially for metallurgical failure modes. Weibull's first paper on the
subject was published in 1939, but the method didn't attract much attention until the 1950's.
Interestingly, the "Weibull distribution" had already been studied in the 1920's by the
statistician Emil Gumbel (1891-1966), who is best remembered today for his confrontation
with the Nazis in 1931 when they organized a campaign to force him out of his
professorship at Heidelberg University for his outspoken pacifist and anti-Nazi views.
The constant a is called the scale parameter, because it scales the t variable, and the
constant b is called the shape parameter, because it determines the shape of the rate
function. (Occasionally the variable t in the above definition is replaced with t-g, where g is
a third parameter, used to define a suitable zero point.) If b in greater than 1 the rate
increases with t, whereas if b is less than 1 the rate decreases with t. If b = 1 the rate is
constant, in which case the Weibull distribution equals the exponential distribution. The
shapes of the rate functions for the Weibull family of distributions are illustrated in the
figure below
1 of 8 5/10/2010 11:37 AM
Weibull Analysis http://www.mathpages.com/home/kmath122/kmath122.htm
Clearly for any positive a and b parameters the function g(t) monotonically increases from 0
to infinity as t increases from 0 to infinity, so it yields a valid cumulative density
distribution, namely
Dividing both sides by n, and re-arranging terms, this can be written in the form
2 of 8 5/10/2010 11:37 AM
Weibull Analysis http://www.mathpages.com/home/kmath122/kmath122.htm
Taking the natural log of both sides and negating both sides, we have
If the number N(t) of failures is very small compared with the total number n of widgets,
the inner natural log on the left hand side can be approximated by the first-order term of the
power series expansion, which gives simply N(t)/n, so we have
However, it's usually just as easy to use the exact equation (5) rather than the approximate
equation (6).
In general, given a population of n widgets, each with the same failure density f(t), the
probability for each individual widget being failed at time tm is F(tm) = N(tm)/n. Denoting
this value by f, the probability that exactly j widgets are failed and n - j are not failed at time
tm is
3 of 8 5/10/2010 11:37 AM
Weibull Analysis http://www.mathpages.com/home/kmath122/kmath122.htm
This represents the probability that the jth (of n) failure has occurred by the time tm, and of
course the complement is the probability that the jth failure has not yet occurred by the time
tm. Therefore, given that the jth failure occurs at tm, the "median" value of F(tm) = f is
given by putting P[³j;n] = 0.5 in the above equation and solving for f. This value is called
the median rank, and can be computed numerically. An alternative approach is to use the
remarkably good approximate formula
This is the value (rather than j/n) that should be assigned to N(tj)/n for the jth failure.
By simple linear regression we can perform a least-squares fit of this sequence of k = 5 data
points to a line. In terms of variables xj = ln(tj) and vj = ln(ln(1/(1-(j-0.3)/(n+0.4))), the
estimated Weibull parameters are given by
For our example with k = 5 data points, we get b = 0.609 and a = (4.14)106 hours. Using
equation (4), we can then forecast the expected number of failures into the future, as shown
in the figured below.
4 of 8 5/10/2010 11:37 AM
Weibull Analysis http://www.mathpages.com/home/kmath122/kmath122.htm
Notice that at the time t = a we expect 63.21% of the population to have failed, regardless
of the value of b. The estimated failure rate for each individual widget as a function of time
is
A distribution of this kind is said to exhibit an "infant mortality" characteristic, because the
failure rate is initially very high, and then drops off.
For an alternative derivation of equation (6) for small N(t)/n, notice that the overall failure
rate for the whole population of n widgets in essentially nR(t) because the number of
functioning widgets does not change much as long as N(t)/n is small. Therefore, the
expected number of failures by the time T for a coherent population of n widgets is
5 of 8 5/10/2010 11:37 AM
Weibull Analysis http://www.mathpages.com/home/kmath122/kmath122.htm
Dividing both sides by n and taking the natural log of both sides gives, again, equation (6).
However, it should be noted that this applies only for times t such that N(t) is small
compared with n. This is because we are analyzing the failure rate without replacement, so
the value of n (the size of the unfailed population) is actually decreasing with time. By
limiting our considerations to small values of N(t) (compared with n) we make this effect
negligible. Now, it might seem that we could just as well assume replacement of the failed
units, so that the size of the overall population remains constant, and indeed this is typically
how analyses are conducted for devices with exponential failure distributions, for which the
failure rate is constant. However, for the Weibull distribution it is not so easy, because the
failure rate of each widget is a function of the age of that particular widget. If we replace a
unit that has failed at 10000 hours with a new unit, the overall failure rate of the total
population changes abruptly, either up or down, depending on whether b is less than or
greater than 1.
The dependence of the failure rate R(t) on the time of each individual widget is also the
reason we considered a coherent population of widgets whose ages are all synchronized.
The greatly simplifies the analysis. In more realistic situations the population of widgets will
be changing, and the "age" of each widget in the population will be different, as will the rate
at which it accumulates operational hours as a function of calendar time. More generally
we could consider a population of widgets such that each widget has it's own "proper time"
tj given by tj = mj(t -gj) for all t greater than gj, where t is calendar time, gj is the birth date
of the jth widget, and mj is the operational usage factor. This proper time is then the time
variable for the Weibull density function for the jth widget, and the overall failure rate for
the whole population at any given calendar time is composed of all the individual failure
rates. In this non-coherent population, each widget has its own distinct failure distribution.
So far there have been three failures, widgets 2, 4, and 5. The other seven widgets are
continuing to accumulate operational hours at their respective rates. These are sometimes
called "right censured" data points, or "suspensions", because we imagine that the testing
has been suspended on these seven units prior to failure. We don't know when they will
fail, so we can't directly use them as data points to fit the distribution, but we would still like
to make some use of the fact that they accumulated their respective hours without failure.
The usual approach in reliability analysis is to first rank all the data points according to their
accumulated hours, as shown below.
6 of 8 5/10/2010 11:37 AM
Weibull Analysis http://www.mathpages.com/home/kmath122/kmath122.htm
Next we assign adjust ranks to the widgets that have actually failed. Letting kj denote the
overall rank of the jth failure, and letting r(j) denote the adjusted rank of the jth failure (with
r(0) defined as 0), the adjusted rank of the jth failure is given by the formula
Fitting these three points using linear regression (as discussed above), we get the Weibull
parameters a = 1169 and b = 4.995. The expected number of failures (which is just n times
the cumulative distribution function) is shown below.
7 of 8 5/10/2010 11:37 AM
Weibull Analysis http://www.mathpages.com/home/kmath122/kmath122.htm
This shows a clear "wear out" characteristic, consistent with the observed failures (and
survivals). The failure rate is quite low until the unit reaches about 500 hours, at which
point the rate begins to increase, as shown in the figure below.
The examples discussed above are classical applications of the Weibull distribution, but the
Weibull distribution is also sometimes used more loosely to model the "maturing system"
effect of a high level system being introduced into service. In such a context the variation
in the failure rate is attributed to gradual increase in familiarity of the operators with the
system, improvements in maintenance, incorporation of retro-fit modifications to the design
or manufacturing processes to fix unforeseen problems, and so on. Whether the Weibull
distribution is strictly suitable to model such effects is questionable, but it seems to have
become an accepted practice. In such applications it is common to lump all the
accumulated hours of the entire population together, as if every operating unit has the same
failure rate at any given time. This makes the analysis fairly easy, since it avoids the need
to consider "censored" data, but, again, the validity of this assumption is questionable.
8 of 8 5/10/2010 11:37 AM