Robustness Valdiation Step by Step
Robustness Valdiation Step by Step
Robustness Validation
– step by step
Published by:
Robustness Validation Forum
ZVEI - German Electrical and Electronic Manufacturers ‘Association e.V.
Electronic Components and Systems (ECS) Division
Lyoner Straße 9
60528 Frankfurt am Main, Germany
Phone: +49 69 6302 402
Fax: +49 69 6302 407
E-mail: [email protected]
www.zvei.org/ecs
Contact person:
Dr.-Ing. Rolf Winter
This document may be reproduced free of charge in any format or medium providing it is
reproduced accurately and not used in a misleading context. The material must be
acknowledged as ZVEI copyright and the title of the document has to be specified. A
complimentary copy of the document where ZVEI material is quoted has to be provided.
Every effort was made to ensure that the information given herein is accurate, but no legal
responsibility is accepted for any errors, omissions or misleading statements in this
information. The Document and supporting materials can be found on the ZVEI website at:
www.zvei.org under the rubric "Publikationen" or directly under
www.zvei.org/RobustnessValidation
Foreword
The quality of electronic products we buy and the competitiveness of the electronics
industry depend on being able to make sound quality and reliability predictions.
Qualification measures must be useful and accurate data to provide added value.
Increasingly, manufacturers of semiconductor components must be able to show that they
are producing meaningful results for the reliability of their products under defined mission
profiles from the whole supply chain.
In recent years the Robustness Validation Methodology is becoming more and more
commonly applied in the electronics industry. Reliability experts at the component
manufacturers are often in charge of creating meaningful data for these lifetime predictions.
For non-experts it is often difficult to understand the basic work flow and the basic
mathematics behind this data.
I would like to thank all RV Forum members and colleagues for actively supporting the
robustness validation approach by this Step-by-Step brochure.
Accelerated testing
The goal is to generate knowledge about the lifetime of a component in a reasonable
timeframe by accelerating the degradation. So the failures which might occur after 12 years
continuous operation in field show up e.g. after 15 hours. The concept of accelerated testing
is based on knowledge of failure mechanisms and their acceleration models.
In the following example we will use voltage as an identified stressor. You now have to
evaluate the lifetime with respect to failure mechanisms stimulated by this stressor. In this
example we chose the leakage of a dielectric layer as failure mechanism.
1
J.W. McPherson Ph.D, Reliability Physics and Engineering Time-To-Failure Modeling, Springer 2010 ISBN
978-1-4419-6347-5
2
Robustness Validation Handbook, ZVEI 2007 (details regarding stressors see page 15ff)
Outcome:
What is to be investigated
Product parameter which is measured
Stressor on product
Measurement value
What is the minimum (resolution) and maximum value to be measured during or after
stress? When to stop the measurement (at which time or max value)?
Measurement Intervals
The parameter must be measured in sufficient small time intervals to see the change of the
parameter happening. So if there will be a parameter change within 5 hours expected, then
a measuring interval of 1 hour is too large. A measurement interval of 10 min will be more
appropriate. More detailed information you will gain as you do the pre-evaluation. Maybe
several tests are necessary until you get the right conditions. Also it’s essential to know if the
degradation behaviour typically is linear or logarithmic exponential.
So for a molded product the temperature must not exceed the melting point of the molding
compound.
Measurement of degradation of parameter x over time at V1. Figure 1 shows the resulting
degradation curve in a linear plot. After 36 h a remarkable increase of parameter value x
(= leakage current of the isolating layer) can be seen. The typical value of an undegraded
device is assumed as I<0.1µA.
The leakage current should be limited so that the device is not totally destroyed and can be
analysed after end of test. In this case the test ends after I=60µA is reached, with I=55µA as
the last measured value. At the end you have found a stress voltage which generates
sufficient degradation in a feasible amount of stress time. From the data of Figure 1 one can
also conclude that a measurement every hour is sufficient to resolve the degradation
behaviour.
Outcome
What is the general measurement parameter behaviour
What are the best measurement intervals
What are the max stressor values
Topic: Defining the complete test setup and performing the test
After having gained a typical product behaviour due to a stress-parameter by doing a pre-
evaluation you can do the investigations on a higher number of samples. This will result in a
better characterisation of the product and its statistical behaviour. Not every part degrades
in the same way. There is some part to part variation, which is extremely important to
characterise the product, e.g. defining the lifetime t63 and the Weibull slope β.
On the other hand the sample size should not be too high as the information gained will not
linearly increase with sample size. So depending on the exact situation a 10x higher sample
size will only double data accuracy. Increase sample will also increase the testing costs
requiring a sound cost-benefit-ratio.
So the answer on the question „What is the right sample size?“ cannot be given in general.
In any case it must be large enough to characterize the behaviour. If it turns out that it is too
small additional samples have to be taken.
On the example
In this case the test is started with the sample size 16 (usually there is some experience
existing which sample size is appropriate, with the assumption that these are sufficient for
the investigations, at the end of the experiment it will show up if this sample size was the
right one). Each device is labelled with a number from D1 to D16.
The stress conditions are the same as for the pre-evaluation and are defined as:
Stress voltage: V1
t meas,max = 40 h
fmeas = 1h-1
Imax = 60µA
Remark: Imax = 60µA is not the failure criterion. This is the maximum current which is
detected. This may be caused by limitation of the measurement tool, the device itself etc. It
is the value up to which the current is logged.
Under these conditions the rest of the complete sample of 16 devices D2-D16 is measured
and the results transferred to the data base (see Table 1). For each device the parameter x is
measured with a frequency of 1 hour - starting with the first data point after 1 h of constant
voltage stress. A rough check of the data shows that the devices behave slightly different.
Table 1: Measurement data for 16 devices (the complete table can be found in the appendix)
t (h) D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12 D13 D14 D15 D16
1 0,1 0,2 0,1 0,3 0,1 0,1 0,1 0,1 0,1 0,1 0,1 0,1 0,1 0,1 0,1 0,1
2 0,1 0,2 0,1 0,3 0,1 0,1 0,1 0,1 0,1 0,1 0,1 0,1 0,1 0,1 0,1 0,1
3 0,2 0,2 0,1 0,3 0,1 0,1 0,2 0,1 0,1 0,1 0,1 0,1 0,1 0,1 0,1 0,1
4 0,1 0,2 0,1 0,3 0,2 0,2 0,1 0,2 0,1 0,1 0,1 0,1 0,1 0,1 0,1 0,1
5 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
6 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Outcome
All necessary boundary conditions for the tests are defined
The test is performed
Measurement data is available
The above graph shows the change of the leakage current over time. Up to now the failure
criterion is not determined, the value which separates well from defective devices.
Assumption for this example: the failure criterion defined for now should be I=25µA
(a more detailed discussion of a failure criterion you will find in the following chapter 5).
The measurement should be done beyond the defined failure criterion to see the behaviour
of the device. In the following example 60µA (Imax) was chosen.
The goal is now to generate a time-to-fail lifetime distribution. For this we can determine the
fail-time for each device from Table 1. We determine the time when each device has
reached the leakage current I=25µA
Next step: make a so called cdf (cumulative failure function). This is to sum up the failures up
to 100 %. As we have 16 devices, each device contributes by 6,3 %
For extrapolating the data to a target failure rate the statistical model and the measured
slope would be needed.
Outcome
A condensed, general description the lifetime behaviour
Lifetime distribution
„What is a failure?“ To answer that question we need to have a criterion – the failure
criterion.
This means that the criterion when degradation becomes a failure has to be defined based
on the specification, the application or customer requirements. In some cases a different
failure criterion can result in a different failure distribution and therefore different lifetime.
To demonstrate the effect of different failure criteria an example with three cases is
generated. The three criteria are:
Ifail = 1µA
Ifail = 5µA
Ifail l = 25µA
TTF (1µA) 5 6 11 13 13 16 20 22 22 26 32 35 36 37 37 40
TTF (5µA) 15 22 22 25 30 31 33 34 34 36 36 37 37 37 39 40
TTF (25µA) 34 34 34 34 35 35 35 35 37 37 38 38 38 39 40 40
From Table 7 and Figure 5 you can conclude after which time a certain percentage of a
sample would have failed depending on the failure criterion. In the Figure above the data is
plotted in a cumulative failure distribution without applying any further statistical model.
You can see that depending on the failure criterion, the failure distribution of one data set
could look quite different.
From this we see that choosing different failure criteria will lead to different conclusions (to
anticipate: different lifetimes and different slopes, see next chapter). So it is very important,
which failure criterion is used.
The following discussion is based on data with I=25µA, which is the assumed specification in
this scenario.
A failure criterion cannot derived purely from the distribution. It must be defined by given
criteria based on application.
Please keep in mind that in real life there is no freedom to choose the failure criterion,
because it is has to be deduced from the performance specification.
Outcome
having the correct failure criterion
Slope β
It was found out that one specific slope defines one specific failure mode. Different
slopes indicate different failure modes.
The two parameters both t63 and β are needed to answer the question „What is the survival
probability or reliability?“.
High slope indicates that the failures occur in a narrow band around t63. High t63 of course
means that the devices fail after a long stress time. Therefore high t63 and high β
demonstrate high and stable reliability.
If you have high t63 and low β some portion of devices will fail quite soon due to the low β.
Lower T and high β in contrast may in some cases deliver a higher reliability in the field.
Bathtub curve
Weibull diagram
cum
early life useful life wear out
failures
extrinsic fails random fails intrinsic fails
ß<1 ß=1 ß>1
time
Remark: from mathematics on the Weibull formula β=1 results in an exponential distribution
with its constant failure rate. So area 2 is also to be described by the exponential
distribution.
extrinsic failure
[lat. extrinsecus = coming from the outside]
A failure mechanism that is directly attributable to an unintended defect and deviations
created during manufacturing.
With increasing time the extrinsic failures become less. There is simply a limited amount of
devices not manufactured that way. At the same time the wear mechanism increases more
and more.
If extrinisic and intrinsic failure rate is fairly apart, then you will get a straight line which
together with the random failure rate defines the useful life.
If the decaying extrinsic and the rising intrinsic curve are close together there is only a small
or no useful life. This indicates that there is a wrong design or wrong materials or
inappropriate manufacturing tools or a combination of those.
There are always extrinsic, intrinsic and random failures. But in the beginning of the lifecycle
the extrinsic failures can be much more dominant, so you will not see many intrinsic or
random failures. For higher lifetimes it’s vice versa. The following 2 curves shall illustrate
this.
3
Handbook of Robustness Validation for Semiconductor Components, ZVEI 2007
A manufacturer is always interested that his device is functioning over the expected lifetime
not to receive complaints from the customer. So his main interest is on the useful life
timeframe (after of course having eliminated the extrinisc fails). In this time frame the failure
rate is typically low and constant. So the design has to be adjusted that way to guarantee
these requirements.
For lifetime investigations we are mainly interested in wear out mechanisms which can be
identified by steep slopes >1. If you have a real wear out behavior it will be seen from
Weibull analysis together with a physical analysis. Note that a physical analysis is always
needed to support and to verify the triggering failure mechanism and the associated
statistical model
To illustrate this, Figure 13 shows a Weibull distribution. For failure criterion I=5µA and
I=25µA the slope is the same, also for the second half of I=1µA. But I=1µA also shows a
different slope, indicating another failure mode. An analysis reveals that for both slopes the
value is β >1, defining a wear out behavior. As a comparison, Figure 5 with linear scaling is
shown again.
20
10
3
2
1
1 10
time
Figure 13: Weibull graph for different failure criteria (correspondent to Figure 5)
Figure 5: (again) cumulative time to fail graph for different failure criteria
Figure 14 shows a typical trumpet like curve for a 90 % confidence interval of the failure
distribution measured for the I=25µA failure criterion.
90
80
70
60
50
40
30
percent
20
10
3
2
1
20 25 30 35 40 45 50
time to fail (h)
Outcome
interpretation of the test results
behaviour of system is known
lifetime and failure prediction possible
Changing the stress intensity results in a parallel shift with the same slope β (acceleration or
deceleration of degradation). The slope β MUST be the same, as same β`s indicate the same
failure mechanism. In turn, if the slopes differ from each other most likely the failure modes
are different too. If one is interested in the lifetime behaviour regarding different stress
levels, it is of course crucial that all data on degradation is caused by the same failure mode.
For different stressor types it has been found, that distinct extrapolation models and
descriptions can be applied. And with these models the field behaviour can be described:
If not known whether the above models can describe the device behaviour it has to be
evaluated. Maybe in specific cases the above cannot be applied. Then a phenomenological
approach can be chosen. So the models are the first choice.
The curves for different stress intensities must have the same slope!
The same slope indicates the same failure mode and mechanism
Different slopes indicate different failure modes and so the results are not
comparable
Example
The stressor for the below experiment (Table 9) is voltage. After evaluating the failure
distribution at 5V, two curves have been measured for 4V and 6V. The 4V curve has the
same slope as 5V, but at 6V the failure mode seems to be changed. Therefore another data
point at 5.5V has been measured showing consistent behaviour with the data measured at 4
and 5V.
Voltage acceleration
Weibull - 90%-Confidence Level
99
Variable
90 V1=4,0V
V2=5,0V
80
70 V3=5,5V
60 V4=6,0V
50
40
30
precent
20
10
3
2
1
0,01 0,10 1,00 10,00 100,00 1000,00 10000,00
TTF (h)
Figure 16: Lifetimes for different voltage stresses, data from Figure 15 in Weibull plot.
You see from the graph and table that the slopes for V1, V2 and V3 are similar, but not for
V4. So the test results from V4 cannot be used for further investigation and calculation and
must be excluded.
Outcome
lifetime vs. stress quantified
which stress results can be used for further evaluation, which can not
there are different lifetime models
a Weibull distribution describes how devices fail for one given stressor value, a
lifetime model describes the sensitivity to the stressor (each stressor value has
a specific Weibull-distribution), both are necessary to describe the lifetime
behaviour
The next step is to choose a lifetime model. This is already known from basic investigations
using generic devices or first choice to deal with.
So for the above we choose the Eyring-model. Here the lifetime T (= T63) is defined by
with
The formula states that the higher the voltage the shorter the lifetime (which makes sense!).
This results in
which is a straight line with the slope –α and the y-axis-intercept lnA. Therefore plotting the
logarithm of T63 as the Y-axis and the voltage on die x-axis one can determine the Eyring
parameter α.
We can use the fitting curve from Excel and get for the Eyring parameter α = 4. Further on
we see that the correlation coefficient R is very good (as R2 in this case is 1) In case that R is
smaller than ca. 0,9 the fit is quite weak. This can for example mean that the chosen lifetime
acceleration model is the wrong one and you have to choose a different one. Or there is
some failure in the experiment or calculation which leads to wrong values.
So we now have the formula for the life time depending on Vuse
T = 1,776x1010 h x exp(-4Vuse)
In most cases such a model is not valid for an unlimited voltage range. There are given
limitations as seen on the graph below.
Outcome
correct lifetime model defined
characteristic values for the lifetime model calculated
Topic: How to determine life time in the field with the data gained so far?
Now we have all information to calculate the device behaviour in the field using the
conditions for the field. In our example this would be Vuse = 3 V.
The question is: After which time in operation at Vuse = 3 V cumulative 10ppm target of the
devices have failed?
We have now done HALT tests with voltage acceleration of 4 V, 5 V, and 5,5 V. This data we
can use to calculate the lifetime or each device for Vuse = 3V using the above acceleration
factors.
acceleration factor
for 3V 54,6 2981 22026
With this we have now 45 TTF (Time To Failure) values with which we can do the Weibull
calculation for Vuse = 3 V
90
80
70
60
50
40
percent
30
20
10
Form 18,72
5 Skala 113230
3 N 45
2
1
80000 90000 100000 110000 120000 130000 140000
TTF (h)
95
80
50
20
2
percent
Form 18,72
Skala 113230
0,01
N 45
We see on the extrapolation that cumulative 0,01 % is reached after 69200 hours.
Considering the 90 % confidence interval the range is from 61700 to 77700 hours. This wide
variation is caused by the sample size. The first failure is at 2 % for the chosen sample size.
0,01 % is factor 200 apart from this 2 % and so there is a greater uncertainty on the real
value.
Sample size
Accuracy is limited by sample size. The smaller the sample size, the less accurate are the
results. To find the proper sample size for the given test is the first prerequisite to avoid
accuracy limitations and a waste of time and money. Worst case the test has to be expanded
with further samples.
Representative sample
The samples used should to cover all variations and fluctuations for a process which are for
example lot to lot variations or within lot variations, tool influences.... These variations are
also related to the slope and width of the distribution, respectively.
Extrapolation
Using 50-100 samples a cumulative failure of 1 % can be identified directly. But every
extrapolation to lower failures has a certain uncertainty. This is based on the fact that you
assume the behavior of the tested devices is representative for the whole production. This,
however, is just an assumption.
Outcome
lifetime in the field calculated
can device fulfil requirements or not
For further reading on more specific questions of lifetime measurement, the following books
and publications are recommended:
AIAG Potential Failure mode and effects Analysis FMEA (4th Edition)
ISBN 978-1-60534-136-1
The New Weibull Handbook Fifth Edition, Reliability and Statistical Analysis for Predicting
Life, Safety, Supportability, Risk, Cost and Warranty Claims [Spiral-bound]
Dr. Robert. Abernethy (Author, Editor, Illustrator)
ISBN-10: 0965306232 ISBN-13: 978-0965306232
Accelerated testing and Validation Testing, Engineering and Management Tools for lean
development
Alex Porter, Elsevier 2004, ISBN 0-7506-7653-1
For the cdf graph you usually expect usually one straight line. But sometimes this is not the
case and one is encountered with deviations from such a straight line. Often these
deviations are not caused by measurement failures, but have a physical background.
A typical reason for such outlier behaviour can be single extrinsic (early life) failures in a
failure distribution of intrinsic wear out.
Figure 8: Two different failure modes with different slopes β and lifetimes t63
Figure 10: Failures at early life (potential extrinsic distribution before wear out). The CDF
reaches a certain value before intrinsic failures occur.