20 Reliability Testing and Verification
20 Reliability Testing and Verification
5,400
Open access books available
132,000
International authors and editors
160M Downloads
154
Countries delivered to
TOP 1%
most cited scientists
12.2%
Contributors from top 500 universities
Jaroslav Menčík
http://dx.doi.org/10.5772/62377
Abstract
This chapter describes various methods for reduction of uncertainties in the determina‐
tion of characteristic values of random quantities (quantiles of normal and Weibull distri‐
bution, tolerance limits, linearly correlated data, interference method, Monte Carlo
method, bootstrap method).
Reliability tests are often indispensable. The material properties, needed in design, can only
sometimes be found in data sheets. If they are not available, they must be obtained by test‐
ing, for example the strength of a new alloy or concrete or the fatigue resistance of a vehicle
part. Also, the manufacturers of electrical components must provide the reliability data for
catalogs (e.g. the failure rate and the data characterizing the influence of some factors, such
as temperature or vibrations). It is also impossible to predict with 100% accuracy the proper‐
ties of a new bridge, an engine or a complex system consisting of many parts, whose proper‐
ties vary more or less around the nominal values. In all these cases, tests are often necessary
to verify whether the object has the demanded properties or if it conforms to the standards.
Also, the information on loads (e.g. wind velocities in an unknown area) must often be ob‐
tained by measurement.
The reliability tests can be divided into two groups: those for providing detailed informa‐
tion on properties of new materials or components, and those for the verification of the
expected values. The former are more extensive, as they must provide the mean value and
statistical parameters characterizing the random variability. The extent of verification tests is
smaller.
© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution,
and reproduction in any medium, provided the original work is properly cited.
156 Concise Reliability for Engineers
In this chapter, the reliability tests of mass-produced components will be described first,
followed by the tests of large or complex structures or components and the tests of strength
and fatigue resistance.
The most important reliability characteristics are the mean failure rate and the mean time to
failure or between failures. The tests can be done so that several components are loaded in a
usual way (e.g. by electric current), and the times to failure of individual pieces are measured.
As the time to failure of some samples can be very long, the test is sometimes terminated after
failure of several pieces, at time tt. The total cumulated time of operation to failure is calculated
generally as
where tf,j is the time to failure of j-th piece, r is the number of failed specimens, and m is the
number of pieces that have survived the test, whose duration was tt. The total number of all
checked samples is n = r + m. If all pieces have failed during the test, m = 0 and the term mtt
falls out. (Also other test arrangements are possible, for example with replacing the failed
pieces by good ones; see [1] or the corresponding IEC standards listed in Appendix 2. The
mean time to failure is calculated as
The individual times to failure vary, and this must also be characterized. If failures occur due
to various reasons, an exponential distribution of times to failure is often assumed. A simple
check of this is the standard deviation σ. For exponential distribution, the standard deviation
has the same value as the mean μ (in an ideal case; in real tests it can somewhat differ). If the
difference between μ and σ is larger, a statistical test should be made to check whether the
exponential distribution is suitable. Common for this purpose are the goodness-of-fit tests (e.g.
Kolmogorov-Smirnov or the χ2 test); see [2 - 4]. If exponential distribution is not suitable,
another distribution can be better, e.g. Weibull.
If an exponential distribution is acceptable, the estimate of the mean failure rate can be obtained
easily as the reciprocal value of the mean time to failure,
l = 1 / MTTF. (3)
The two-sided confidence interval for the true mean failure rate λ is [1]:
Reliability Testing and Verification 157
http://dx.doi.org/10.5772/62377
if the testing continued come time after the rth failure. In Equations (4) and (5), λ is the
calculated mean value of λ, the subscripts L and U denote the lower and upper confidence
limit, χ21 –α/2(2r) is the (1-α/2)-critical value of the chi-square distribution for 2r degrees of
freedom, χ2α/2(2r) is the α/2-critical value for 2r degrees of freedom, and χ2α/2(2r+2) is the α/2-
critical value for 2r+2 degrees of freedom. The probability that λ will lie within this confidence
interval is γ = 1 – α. Often, we are interested only in the maximum expectable failure rate; the
pertinent formula for the upper limit of one-sided interval is
c 2a (2r )
lU = , (6)
2r
the probability that the actual failure rate will be higher is now α. As the mean time to failure
failure is the reciprocal of the failure rate, the corresponding two-sided confidence interval for
the mean time to failure is obtained as
2r 2r
tL = 2
MTTF £ t £ 2 MTTF = tU , (7)
c a / 2 (2r ) c 1-a / 2 (2r )
if the test was terminated after the rth failure (and analogously for a longer test).
The determination and importance of confidence limits will be illustrated on the following
examples.
Example 1
Ten electrical components were tested to determine the failure rate. The tests were terminated
after tT = 500 h. During this time, six components failed (r = 6), in times: 65, 75, 90, 120, and 410
h. Four components survived the test. Estimate the mean time to failure and failure rate and
construct two-sided confidence intervals (for confidence α = 90%).
158 Concise Reliability for Engineers
Solution. The mean value and standard deviation of the times to failure of the six failed
components were 168.66 and 136.33 h, respectively. It is thus possible to assume exponential
distribution.
The cumulated duration of tests, calculated after [1], was:
The mean time to failure is tmean = ttot/r = 3010/6 = 501.67 h, and the mean failure rate is λmean =
λ̄ = 1/tmean = 1/501.67 = 1.993 × 10-3 h-1.
The lower and upper confidence limits for λ were calculated, with respect that the tests were
terminated before the failure of all samples, according to Equation (5). With r = 6 and α = 10%,
the critical values are χ2 0.95(12) = 5.226 and χ2 0.5(14) = 23.685. Inserting them, together with λmean
= 1.993 × 10-3 h-1 into (5) gives λL = 8.68 × 10– 4 h– 1 and λU = 3.93 × 10–3 h– 1. The confidence limits
for the mean time to failure are tL = 1/λU = 254.4 h and tU = 1/λL = 1152.1 h. The mean time to
failure thus can lie within the interval tmean ∈ (254 h; 1152 h).
As we can see from this example, the confidence interval obtained from only six failures is
very wide. If it should be narrower (to get more accurate estimate), it is necessary either to
make a longer test so that more parts of the tested group fail or to increase the number of parts
tested simultaneously or both.
Example 2
The above testing has continued until the time tt = 1000 h. During this time, two more pieces
failed, at the times t7 = 520 h and t8 = 760 h.
Solution. The same procedure as above has given the following results: Τ = 4290 h and r = 8,
so that the mean time to failure is now tmean = tt /r = 4290/8 = 536 h and the mean failure rate
λmean = 1/536 = 1.865 × 10– 3 h– 1. Also, the confidence interval will respect that more pieces have
failed. The critical values now are χ2 0.95(16) = 7.962 and χ2 0.5(18) = 28.869. With all these values,
the lower and upper limits of failure rate are λL = 9.28×10– 4 h– 1 and λU = 3.4×10– 3 h– 1. The mean
time to failure tmean thus can be expected to lie within the interval (297 h; 1078 h).
The whole test lasted twice as long as the previous one, but the new confidence interval is only
slightly narrower. If significantly more accurate estimates should be achieved, much longer
tests or with substantially higher number of tested pieces must be done. Thus, when preparing
the tests for the determination of failure rate, one should estimate in advance the duration of
the test, the number of tested pieces, and the number of pieces that can fail — all this for the
acceptable probability α that the actual maximum failure rate would be higher than that
obtained from the test.
The rearrangement of the expression for the upper limit of confidence interval for λ gives the
following relationship between the expected failure rate λ0, the number of tested samples n,
test duration tt, and the number of failed components r [1]:
Reliability Testing and Verification 159
http://dx.doi.org/10.5772/62377
If the number of failed samples does not exceed r, the actual failure rate is not higher than λ0,
the risk of wrong prediction being α.
As it follows from the product n × tt in Equation (8), the number of tested parts n is equivalent
to the test duration tt. This means that the same information can be obtained by testing, for
example, 10 specimens for 1000 h or 1000 specimens for 10 h. If the tested objects are expensive,
one would prefer testing fewer specimens for longer time. However, at least several pieces
should always be tested to reduce the risk that the only piece chosen at random for the test
was especially good or especially bad.
The following table, based on Equation (8), shows the values of the product n × tt for the various
numbers of failed parts during the tests; the probability of a wrong result is α = 10%.
Table 1. Extent of tests for various failure rate and the number of failed pieces.
For example, the reliability testing of components with assumed exponential distribution,
failure rate λ = 10-4 h-1 and the test terminated after the fifth failure, needs n × tt = 79,936 ≈ 80,000
pieces × hour. Thus, for example, 100 components should be tested 800 h or 800 components
for 100 h. If the expected failure rate were λ = 10-6 h-1, then n × tt ≈ 8,000,000 pieces × hour, so
that 10,000 components must be tested for 800 h or 100 components for 80,000 h. One can see
that testing for proving the reliability of very reliable components becomes very difficult or
impracticable. Therefore, various accelerated tests are often used. One way, suitable for the
items working periodically with pauses between the operations, such as switches or valves,
eliminates the idle times: the switch is permanently switched on and off.
Another way to obtain the demanded reliability information sooner uses a higher intensity of
load (e.g. higher mechanical load, higher electric stress or electric current, or more severe
environment (e.g. higher temperature or vibrations). If this approach should be effective, one
must know the mechanism of degradation and the relationship between the load intensity and
the rate of degradation. For example, the rate of chemical processes, which are the cause of
some failures, often depends on the temperature according to the Arrhenius equation:
æ DE ö
r = C exp ç - ÷, (9)
è kT ø
160 Concise Reliability for Engineers
C is a constant, ∆E is the activation energy, k is the Boltzmann constant, and T is the absolute
temperature (K). If the times to failure have exponential distribution, the failure rates or times
to failure are related with the absolute temperatures as follows [1]:
l1 t é DE æ 1 1 öù
= 2 = ê çç - ÷÷ ú , (10)
l2 t1 êë k è T2 T1 ø úû
Equation (10) can be used for the determination of necessary temperature change from T1 to
T2 if the test duration should be reduced from t1 to t2.
Similarly, the number of cycles to fatigue failure of periodically loaded components can be
reduced by increasing the characteristic stress or load amplitude P. The basic relationship,
based on the Wöhler-like curve [Equation (1) in Chapter 6], is
t1 / t2 = C ( P1 / P2 ) ;
n
(11)
C and n are constants for a given material and environment. Similar relationships can be used
for finding the increased load for shortened tests of components exposed to creep or static
fatigue (stress enhanced corrosion), with rates depending on some power of the load.
Today, mass-produced electronic and electrical components are tested in special chambers and
under special conditions enabling acceptably short duration of the tests. More about these tests,
denoted HALT (for highly accelerated life testing) or HASS (for highly accelerated stress
screening), can be found in the literature, for example [5].
Sorting tests
These tests aim at sorting out “weak” items that could fail shortly after being put into service.
However, they must not cause excessive degradation of properties in “good” components (i.e.
they should not shorten their life significantly). Sorting tests can be nondestructive or destruc‐
tive. Nondestructive tests use visual observation, X-ray, ultrasound or magnetic inspection,
and special electrical or other measurements. Destructive tests can be arranged in several
ways, for example proof tests that use short-time overloading by mechanical or electrical stress
exceeding the nominal value so that the weak parts are destroyed during the test. Other ways
for revealing the weak parts are artificial aging under increased temperature, cyclic loading
by varying temperatures (this causes additional thermal stresses that can reveal hidden defects
or weak joints), the use of burn-in period with 75% to 100% of nominal load acting several tens
of hours before putting into service, special kinds of mechanical loading, such as impacts,
vibrations of certain amplitude, and frequency, overloading of rotating parts by centrifugal
forces and others.
2. Acceptance sampling
This operation, common in series production, ensures that only those batches of items will be
released to the customer or to the next operation, which are either perfect or contain only very
Reliability Testing and Verification 161
http://dx.doi.org/10.5772/62377
small proportion of out-of-tolerance parts. Before this control is introduced, a test plan must
be prepared, which contains:
a. 100% control. Every component or item must pass the inspection, and those that do not
fulfill certain parameters are discarded. This control is most expensive, but it should be
the safest. Nevertheless, if the evaluation depends on human senses (visual check, for
example) here, a small probability of erroneous decisions also exists. The inspector can
overlook a defect or, vice versa, he can denote a good item as defective, especially if the
number of tested items is very high, which can lead to his fatigue. An example is the check
for internal flaws using X-rays, with the images interpreted via observation by naked eye.
Also, 100% control cannot be done if every test ends with the destruction of the tested
piece, even if it is good (e.g. the check of the airbag deployment system in cars).
b. Random inspection. Only several pieces, chosen at random, are tested (e.g. 1% of the
batch). The entire lot is accepted or rejected according to the result of the inspection. This
kind of acceptance is much less demanding than 100% control, but it has been criticized
that it is rather subjective and not sufficiently reliable. If, for example, a batch of 10,000
pieces contains 1% of defective piece and it was decided that 1% will be tested, then 100
pieces must be checked. One percent of 100 is one piece. However, it can happen that the
checked sample will contain not exactly one defective piece, but two or three or even none.
This uncertainty has led to the development of the following method based on the
probability theory.
c. Statistical acceptance. Several variants exist. The principle will be explained on the so-
called single sampling plan. A sample of n pieces is taken at random from the lot and
tested. The number z of out-of-tolerance pieces, found in the sample, is compared with
the so-called decisive number c. If it is lower, z < c, the whole lot is accepted; if z > c, it is
rejected. The values of c for various expected proportions p of unsuitable pieces and extent
n of the tested sample can be found in the standards for statistical acceptance [6] or
calculated, with a consideration of further important parameters, AQL (acceptable quality
level) and LQL (limiting quality level, also called the lot tolerance percent defective,
LTPD). LQL gives the maximum fraction of defectives, acceptable, on average, in the
batches denoted as good. The principle of determination of the decisive number c is as
follows. If the fraction of defectives in the population is p, the number z of the defectives
that can appear in a random sample of size n has binomial or Poisson’s distribution (for
low probabilities p in the latter case). It is thus possible to calculate the cumulative
probabilities for z = 0, 1, 2, 3,... The decisive number c is such that only very low probability
β exists that the lot, whose test has given z ≤ c, will contain higher percentage of defective
162 Concise Reliability for Engineers
than LQL. The probability β is called customer’s risk and means the risk that an unsatis‐
factory lot will be accepted as good. On the other hand, also a producer’s risk α exists,
such that a good lot, with less defective pieces than AQL, will be rejected. Usually, 5% or
1% is chosen for both α and β.
The curve showing how the probability of accepting the lot decreases with increasing propor‐
tion of defectives in the sample is called the operating characteristic curve (OCC). Figure 1
shows examples of OCCs for two different decisive numbers.
The rejected batch is either discarded or 100% checked. In the latter case, the good pieces are
added to other good items. This makes the average quality of the batches composed in this
way better, so that the quality demands in the tests may slightly be reduced.
100%
P
b = 95% (a = 5%)
80%
OCC 1
60%
40%
OCC 2
20%
b = 5% (a = 95%)
0%
0 1 2 pa2 3pa1 4p 5 6
p (%)
b1
Figure 1. Operating characteristic curve (OCC). P - probability of acceptance; p - percentage of defectives in the popu‐
lation; α – producer’s risk; β – customer’s risk. Subscripts 1 and 2 denote curves OCC1 and OCC2.
Also other schemes exist. For example, a double sampling scheme uses two decisive numbers,
c1 and c2. If the number z of defectives in the first sample is smaller than c1, the lot is accepted,
and if it is higher than c2, it is rejected. If c1 < z ≤ c2, another sample is taken and the total number
of defective in both samples is checked, etc. Further modifications, such as multiple sampling
or sequential sampling, exist as well. For more, see [6].
However, doubts are sometimes cast on the cost-effectivity of statistical control. On the one
hand, this control costs money. On the other hand, losses can arise due to possible defective
pieces hidden in the batches checked as good. Deming [7] has pointed out that if the cost for
inspection of one piece is k1 and the average cost of a failure caused by not inspecting is k2 and
the average fraction of defective is p, then, if pk2 < k1, the lowest total costs (control costs plus
costs caused by failures) will be achieved without any testing. If pk2 > k1, full (100%) inspection
Reliability Testing and Verification 163
http://dx.doi.org/10.5772/62377
should be used, especially for higher ratios pk2/k1. However, the situation is often not so simple;
the fraction p of defectives can vary, 100% testing can be impracticable for too high investment
costs or if all tests end with destruction, etc.
The statistical acceptance was very popular in the second half of the 20th century but not so
much today. There are two reasons: the demands on quality and reliability are much higher
today than 50 years ago and the allowable probabilities are often of the order 1:106, much lower
than the degree of confidence common in statistical sampling. Moreover, the controlling
devices are much more powerful today. The incorporation of automated test equipment (ATE)
into production line enables 100% control.
These tests will be illustrated on two cases: bridges and large components exposed to fatigue,
such as parts of heavy vehicles (e.g. locomotives).
The assumed service life of road and railway bridges is many tens of years and sometimes
more. During this time, the structure deteriorates and its safety decreases. Also, the loading
pattern can change in a long time (new kinds of vehicles and changed traffic demands). For
these reasons, bridges must sometimes be repaired or reconstructed. In such case, thorough
inspections are done at suitable time, including load tests in important cases. In these tests,
the bridge is usually loaded by a group of trucks loaded by sand or concrete blocks as much
as possible so that the load-carrying capacity of the bridge is attained. During the tests,
deformations and stresses at selected points are measured and compared with the values
obtained by computer analysis of the structure – to see if the actual response (e.g. deflection
of some parts of the bridge) corresponds to the assumed response. In some cases, dynamic
properties are also studied (i.e. the response to periodic or dynamic loading). If the actual
condition is worse than allowed, measures must be taken for improvement.
Large parts of mechanical structures, such as vehicles or aircrafts (sometimes these objects as
a whole), are mechanically loaded with the purpose to find whether the actual response
(deformations and stresses at selected points) corresponds to the values assumed in design.
Also, dynamic response is investigated. Exceptionally, the object is loaded until the destruc‐
tion. In the past, the measurements were often the only reliable source of information of the
stresses and behavior. Today, the methods of stress analysis are much better and much
information can be obtained by computer simulation as early as in the design stage. Therefore,
today the tests serve rather for confirmation whether the demanded parameters have been
achieved.
The test loads are often imposed by electrohydraulic cylinders attached to the tested object.
Often, special test stands are used, consisting of a massive frame with hydraulic cylinders,
clamping equipment, and a controlling unit. The work of the stand is controlled by a computer.
This enables one to program the demanded loading sequences. Sometimes, the load program
is based on a record made during a test vehicle driving on real roads or on a test track containing
164 Concise Reliability for Engineers
typical examples of road surfaces. The test vehicle is equipped with sensors (usually strain
gauges fixed at certain points of the car body) and the measured data are recorded. These data
must be transformed to the data for the control of the load cylinders of the testing stand. The
reason is that these cylinders are often attached to the tested structure at different points than
were those used in the test vehicle driven on the track. Also, the data recorded with one test
vehicle are sometimes used for the testing of other types of vehicles. The test stand can repeat
the recorded load sequence again and again, so that also fatigue resistance can be tested in this
way.
These tests are often arranged according to various standards. In this paragraph, we thus limit
our attention to some probabilistic aspects of these tests.
Strength tests. The individual values vary, so that the number of tests should be adjusted to
the purpose of the measurement and to the scatter of individual values. If only approximate
information on the average strength is needed, three tests may be sufficient; the standard
deviation can serve for the estimation of confidence interval of the mean strength. However,
especially for brittle materials with high scatter of individual values, the knowledge of
the ”minimum“ strength is often demanded. This is determined as a lowprobability quantile.
For this purpose, more tests must be done, often several tens or more. From these tests, the
parameters of strength distribution are determined. Often, Weibull distribution, but also log-
normal distribution, is assumed. The determination of parameters and quantiles of Weibull
distribution was described in Chapters 11 and 18. The parameters of log-normal distribution
are found in several steps. In the first step, logarithms are taken from the measured values,
then the average and standard deviation are calculated from the transformed data, and finally
they are transformed back to the original system of units. The question of which distribution
is better can be solved by means of statistical tests of goodness of fit [2 – 4].
Generally, many values are necessary to obtain reliable values of low-probability quantiles of
strength. (Remember that 1% quantile corresponds to the minimum of 100 values.)
Fatigue tests. The main purpose of fatigue tests is the determination of fatigue limit (if it exists)
and finding the relationship between the characteristic stress (S) and the time or number of
cycles to failure (Nf). As for the fatigue limit, everything from the above paragraph on strength
tests remains valid. The S N relationship is obtained by making the tests under various
characteristic stress amplitudes and fitting the data by a suitable function, for example [8, 9]:
N f = A S– w , (12)
or a similar expression. Now, two possibilities exist depending on the number of tests that
were done or could be done with respect to the available money and time. If only several tests
have been performed, all measured Nf(S) values are fitted by the regression function (12). The
Reliability Testing and Verification 165
http://dx.doi.org/10.5772/62377
consequences of the scatter of individual values are depicted in Figure 1 in Chapter 18. The
regression function, obtained by the least-squares method, gives such Nf values that proba‐
bility 50% exists that the true number of cycles to failure under a chosen stress will be 50%
lower (!) than the number obtained from the regression function. The ”safe“ Nf,α values, for
which acceptably low probability α would exist that the component or construction can fail
earlier, may be found as boundary values of the pertinent confidence band for all S-N data;
see Chapter 18.
If more values (e.g. tens) are available for each stress level, a more accurate procedure can be
used. The data for individual stress levels are rank-ordered in ascending order. Each value
corresponds to some quantile of time to failure for a given stress level. For example, the shortest
time of 10 values obtained for the same stress corresponds approximately to 10% quantile of
the time to failure. Now, only the Nf,α(S) values, corresponding to the same quantile α, are fitted
by regression function (12). The “safety” of the prediction of the time to failure with this
function equals 1 – α. It is also possible to fit all measured data by function of type (12) with
additional parameters characterizing the probability that the actual number of cycles to failure
will be lower than that calculated via modified Equation (12).
Author details
Jaroslav Menčík
Department of Mechanics, Materials and Machine Parts, Jan Perner Transport Faculty,
University of Pardubice, Czech Republic
References
[1] Bednařík J et al. Reliability techniques in electronic practice (In Czech: Technika spo‐
lehlivosti v elektronické praxi). Praha: SNTL; 1990. 336 p.
[2] Freund J E, Perles B E. Modern elementary statistics. 12th ed. New Jersey: Prentice-
Hall; 2006. 576 p.
[3] Suhir E. Applied Probability for Engineers and Scientists. New York: McGraw-Hill;
1997. 593 p.
[4] Montgomery D C, Runger G C. Applied Statistics and Probability for Engineers. 4th
ed. New York: John Wiley; 2006. 784 p.
[7] Deming W E. Out of the Crisis. Reprint Edition. Cambridge MA: The MIT Press;
2000. 485 p.
[8] Fuchs H O, Stephens R I. Metal Fatigue in Engineering. New York: Wiley and Sons;
1980. 336 p.