Greg Larsen
G. A. Larsen Consulting
Ops A La Carte
G.A. Larsen Consulting 2/23/2012 1
Discriminate among products
Monitor performance of production process
Manufacturing process improvement
Specification setting
G.A. Larsen Consulting 2/23/2012 2
Acquitting the guilty and condemning the
innocent – the Lord detests them both.
Proverbs 17:15
G.A. Larsen Consulting 2/23/2012 3
A cell phone manufacturer uses Agilent equipment to
test whether each phone is functioning properly just
prior to shipment.
The test system consists of a fixture which secures the
phone and a rack of measurement equipment which
tests over 40 functions.
G.A. Larsen Consulting 2/23/2012 4
System Configuration Layout
6626A PS
Opt 908 Rack Kit
40102A Filler Panel E3663A Rail Kit
437B 5062-4080 Rack Kit
E3663A Rail Kit
D2806B 15" Monitor
w / E4475A Rack
Mount Kit E3909A Vectra Rack Kit
E4079A Keyboard-Mouse Tray
(Retractable)
FINAL ALIGNMENT
EMERGENCY
Emergency stop button
PNEUMATIC SHUTOFF
Barcode Fixture Drawer Opens Out
Reader
Two Hand Tie Down Buttons Close
Fixture And Start Alignment
46298N Work Surface
E1301A "B" Size
Opt 908 Rack Kit
(Rails Supplied)
83206A TDMA Opt AXK Rack Kit
Cellular Adapter E3663A Rail Kit
8920B
RF Kick Panel Assembly
Test Set Z2002-83208-1
83236A Opt AX4 Rack Kit
UP-Converter E3663A Rail Kit
G.A. Larsen Consulting 2/23/2012 5
Questions arose on an installed base of 20 test
systems as to whether the measurement process had
excessive noise.
In particular, the phone manufacturer felt the false
failure rate was too high.
A measurement system analysis was conducted.
G.A. Larsen Consulting 2/23/2012 6
24 phones were randomly selected
6 test systems were selected
2 repeated measurements were taken on each
phone and test system
42 phone parameters were measured on each of
the 288 design points
The data were used to assess the adequacy of the
test process.
G.A. Larsen Consulting 2/23/2012 7
Classic truth table
Augmented with measurement error distributions
Warranted product specs versus production test limits
G.A. Larsen Consulting 2/23/2012 8
TEST RESULT
Fail Pass
False Yield
Good Valid
Failure (1-P)
DUT
Defect
Bad Missed
Valid Rate (P)
Fault
The two types of test errors
G.A. Larsen Consulting 2/23/2012 9
G.A. Larsen Consulting 2/23/2012 10
Prod. Test Test Prod.
Spec. Limit Limit Spec.
Distribution
of the
Production
Process
Distribution of Distribution of
Test Errors Test Errors
Nominal
G.A. Larsen Consulting 2/23/2012 11
Different ways to assess the performance of a
measurement process
Classical “Gauge R&R”
ISO (International Standards Organization)
DOE/ANOVA
G.A. Larsen Consulting 2/23/2012 12
Typically 2-factor random effects model without
interaction
“Parts” and “Operators” plus replication
Uses ranges to estimate sources of variation
Not as precise
Normality required since the factor (d2) to convert from
a range to a standard deviation depends on it
Typically no interval estimates
Sample sizes tend to be small (e.g. 10 parts, 3
operators, 3 reps)
G.A. Larsen Consulting 2/23/2012 13
International Standards Organization
Evaluates measurement uncertainty using a
mathematical model
Law of propagation of uncertainty (1st order Taylor
series)
Usually no measurements are taken directly on the
“measureand” (i.e. the response of interest)
To avoid understating the estimated uncertainty, all
major sources of variation must be included
G.A. Larsen Consulting 2/23/2012 14
Measurement error estimated by direct observation on
the response
ANOVA model is general in that it can accommodate
multiple factors in different designs
Lends itself to both point and interval estimates of the
measurement variation
Fixed or random effects
G.A. Larsen Consulting 2/23/2012 15
Measurement:
The value obtained from measuring the device
under test (DUT) a single time.
Measurement = True Value + Bias + Measurement Error
True Value:
The accepted correct value of the parameter being
measured.
Bias:
The difference between the average of repeated
measurements of a parameter on a single DUT, and
the true value.
G.A. Larsen Consulting 2/23/2012 16
Measurement Error:
The deviation between an individual measurement of a
parameter on a DUT, and the average of many repeated
measurements of the same DUT. The standard deviation
of measurement error is often called precision.
Standard:
A measurement standard for which the true value is
known.
Accuracy:
The difference between an individual measurement and
the true value. For a measurement to be accurate, both
the bias and the measurement error must be small.
G.A. Larsen Consulting 2/23/2012 17
Bias Precision
Single
True Value Average
Measurement
Accuracy
G.A. Larsen Consulting 2/23/2012 18
30 30
20 20
10 10
0 0
Y
Y
-10 -10
-20 -20
-30 -30
-30 -20 -10 0 10 20 30 -30 -20 -10 0 10 20 30
X X
30 30
20 20
10 10
0 0
Y
Y
-10 -10
-20 -20
-30 -30
-30 -20 -10 0 10 20 30 -30 -20 -10 0 10 20 30
X X
G.A. Larsen Consulting 2/23/2012 19
Reproducibility:
The variation in measurements due to all sources of
measurement variation except repeatability.
Repeatability:
The variation of repeated measurements using a single
test system and an individual DUT.
Measurement System Variation:
Variation arising from all sources in the measurement
system. For example, variation among test systems,
among fixtures, among repeated measurements.
G.A. Larsen Consulting 2/23/2012 20
Total Variation
DUT to DUT Measurement System
Interaction Repeatability
Test System to Test System
G.A. Larsen Consulting 2/23/2012 21
G.A. Larsen Consulting 2/23/2012 22
Repeatability
Static:
Measurements taken in succession without changing
the setup.
Dynamic:
The DUT is removed from the measurement device
and repeated measurements are separated in time.
Within DUT variability
If present, included in repeatability
To resolve, substitute a standard for the DUT and
isolate the pure repeatability of the measurement
system
G.A. Larsen Consulting 2/23/2012 23
A division of Hewlett-Packard that made tape drives for
computer backup designed a drive with higher storage
capacity that used a new head design.
The head is the part of the drive that makes contact
with the tape and magnetically records information on
the tape.
The heads are manufactured by an outside supplier and
must be tested to make sure they meet performance
specifications.
To do this, a head tester was developed by HP and
installed at the supplier’s production facility to test the
heads prior to shipment.
G.A. Larsen Consulting 2/23/2012 24
Nine heads were selected at random from the
manufacturing process that produces the heads.
Three different tape cartridges were selected at
random to record measurements in the head tester.
Thus, heads are “parts”, and the tapes are
“operators”.
Six repeated measurements were made on each
head/tape combination for a total of 162 points in
the design.
The data were used to determine whether the
tester had sufficient performance to discriminate
among heads.
G.A. Larsen Consulting 2/23/2012 25
16 response variables were measured, one of which
is “reverse resolution”
In the process of writing information to magnetic
tape, voltage is measured at two different
frequencies.
The reverse resolution is a ratio of these two
voltages when the drive writes in the reverse
direction.
Ideally, this ratio should be 100%. The specification
limits are from 90% to 110%.
G.A. Larsen Consulting 2/23/2012 26
G.A. Larsen Consulting 2/23/2012 27
Yijk Pi O j (PO)ij Eijk
i=1,….,I; j=1,….,J; k=1,….K
is a constant
Pi, Oj, (PO)ij, Eijk are iid normal with means of zero and
variances P2 , O2 , PO
2 ,
E2
These are called variance components
G.A. Larsen Consulting 2/23/2012 28
G.A. Larsen Consulting 2/23/2012 29
Measure Symbol
Repeatability E2
Measurement O2 PO
2
E2
Error
S/N Ratio
P2 /
P/T Ratio /(USL LSL)
G.A. Larsen Consulting 2/23/2012 30
S/N Ratio
Criterion: Lower bound of a 90% confidence interval for
>5
P/T Ratio
Criterion: Upper bound of a 90% confidence for < .05
G.A. Larsen Consulting 2/23/2012 31
G.A. Larsen Consulting 2/23/2012 32
Randomization
Alternative designs
Fixed vs. random operators
Sample size
Comparison across time/location
G.A. Larsen Consulting 2/23/2012 33
Completely randomized design provides protection
from the effects of uncontrolled variables which
may be changing
However, a random sequence is likely to require the
same DUT to be measured multiple times in a short
time period. If heat affects the measurement, a
wait time has to be introduced which lengthens the
total time to run the experiment
Regardless of the degree of randomization,
probably best to spread the repeated
measurements out in time (dynamic)
G.A. Larsen Consulting 2/23/2012 34
Single factor
Multiple DUTs and repeated measurements on each
DUT
Two factor crossed (parts, operators)
Three factor crossed (parts, operators, fixtures)
Nested
DUTs are circuit boards. Measurements vary by
location and so location is nested within board.
Different test systems and fixtures, but fixtures cannot
be moved among systems. Thus, fixture is nested within
system.
Nested factorial
G.A. Larsen Consulting 2/23/2012 35
A factor is considered random if its levels are a random
sample from the population of all levels
A factor is fixed if the levels are a non-random sample
or if the levels consist of the entire population
In this context DUTs should be representative of the
whole population which is likely large, so DUT is a
random factor
G.A. Larsen Consulting 2/23/2012 36
The other factors (e.g., operators) may be considered
random or fixed depending on the situation
Methods for computing CI’s depend on whether effects
are fixed or random
Requires a modification for the degrees of freedom for
the fixed effect
G.A. Larsen Consulting 2/23/2012 37
G.A. Larsen Consulting 2/23/2012 38
Usually impacted by practical considerations
The “classic” 10 parts, 3 operators and 3 reps probably
gives poor precision on the variance component
estimates
G.A. Larsen Consulting 2/23/2012 39
Single factor design
Use a lot more DUTs than reps and get the total design
size > 100
Two factor design
Use 2 reps
Use at least 6 levels for the other factors (if random)
Use at least 30 DUTs if objective is to estimate the
manufacturing process capability in the absence of
measurement error
Depending on the application, a 12x6x2 factorial design
might be a good starting point
G.A. Larsen Consulting 2/23/2012 40
Compare “before and after” to verify an improvement
of the measurement process
Compare a competitor’s test process to an Agilent
solution
Comparison of same measurement process used at two
different locations
Compare using the ratio of S/N ratios
Confidence interval can be constructed using
Cochran’s interval (Satterthwaite)
G.A. Larsen Consulting 2/23/2012 41
In our head test example, the initial MSA revealed
poor head tester performance on 2 of 16 test variables
After some changes, a 2nd MSA was done to determine
whether the tester had actually improved
The 2nd MSA had the same 9 heads, 3 tapes, and 3
reps
Results showed significant improvement and the
performance on all 16 test variables was now adequate
Several additional testers were produced, qualified,
and installed in the head supplier’s production process
G.A. Larsen Consulting 2/23/2012 42
If this interval excludes 1 then
reject hypothesis that S/N ratios
are the same.
G.A. Larsen Consulting 2/23/2012 43
False failures and missed faults
Bi-variate normal integration
Setting test limits
Simulation of more complex test processes
G.A. Larsen Consulting 2/23/2012 44
Estimating the Cost of Measurement Error
Estimating the two test errors
• False failures
• Missed faults
FAIL PASS
FF
GOOD P(FAIL|GOOD) P(PASS|GOOD) P(GOOD)
1-
MF
BAD P(FAIL|BAD) P(PASS|BAD) P(BAD)
1-
SL TL TU SU
P(FAIL) P(PASS)
G.A. Larsen Consulting 2/23/2012 45
Y = True Value ~ N( P , P2 ) Product Specs (SL, SU)
X = Measured Value
X=Y+E 2
E ~ N(0, m ) Test Limits (TL, TU)
f(x,y) = joint pdf
f(x) = marginal pdf for x
f(y) = marginal pdf for y
G.A. Larsen Consulting 2/23/2012 46
Fail Pass Fail
Bad MF
USpec
FF
Y
Good
True FF
LSpec
Bad MF
Tl Tu
X = Y + Error
Measured
G.A. Larsen Consulting 2/23/2012 47
SL
P(BAD) =
f ( y )dy
SU
f ( y )dy
TL
P(FAIL) = f ( x)dx
TU
f ( x)dx
SL TU TU
P(PASS, BAD) = Joint = TL f ( x, y )dxdy
SU TL
f ( x, y )dxdy
P(FAIL) - P(BAD) + P(PASS, BAD)
P(FAIL|GOOD) = = 1 - P(BAD)
P(PASS, BAD)
P(PASS|BAD) = = P(BAD)
P(FAIL, GOOD) = Joint = (1 - P(BAD))
G.A. Larsen Consulting 2/23/2012 48
Earlier a pair of metrics were suggested to quantify the adequacy of
the measurement process: S/N and P/T.
Criteria have also been suggested for the test error rates. The idea is
to compare the computed error rates with those that would be
expected by random chance. If the computed error rates are lower,
there is evidence that the measurement process is a better
discriminator than pure chance.
Two indices are defined:
FFindex =
_____Joint_______
P(BAD)*(1 - P(BAD))
MFindex =
____Joint _______
P(BAD)*(1 - P(BAD))
• Values < 1 suggest the measurement process is capable.
G.A. Larsen Consulting 2/23/2012 49
Criteria is to minimize the sum of the test error costs
Cost of MF
Cost Ratio = CR =
Cost of FF
SN = SP/Sm
(SU - SL)
K= 2*SP
Z = PROBIT(1 - 1/(1 + CR))
B= Z 1 SN 2 K
SN
TL = SL + B*Sm and TU = SU - B*Sm
G.A. Larsen Consulting 2/23/2012 50
Production test limits can also be set so that a target test
error rate is achieved
For example, the limits and the false failure rate can be
determined such that the missed fault rate is equal to 1%
This is done with an iterative application of the
numerical integration described earlier
G.A. Larsen Consulting 2/23/2012 51
So far we have been discussing a single test instance on
a single test parameter
The test process often involves re-test, and/or test and
repair loops.
Multiple measurements may be averaged.
There may be many parameters tested on the same
DUT
For these situations, FF, MF and reject rates are
simulated.
G.A. Larsen Consulting 2/23/2012 52
Pass
Production Test
Fail
Re-test without
Ship Pass removal from
fixture
Fail
Test & repair with
NTF
different test
Pass
system
Fail
Scrap Fail Repair & test Pass
G.A. Larsen Consulting 2/23/2012 53
Software application to aid sales
Qualification of test solution prior to implementation
Characterize/improve existing test process
Monitor stability of the measurement process
G.A. Larsen Consulting 2/23/2012 54
“Measurement Systems Analysis”, Automotive industry Action Group
(AIAG), 2nd Edition, 1995.
“Confidence Intervals on Measures of Variability in R&R Studies”,
Burdick and Larsen, Journal of Quality Technology, 29(3), 1997.
“Analysis of a Two-Factor R&R Study With Fixed Operators”, Dolezal,
Burdick and Birch, Journal of Quality Technology, 30(2), 1998.
“Two-Way Random-Effects Analyses and Gauge R&R Studies”,
Vardeman and Van Valkenburg, Technometrics, 41(3), 1999.
“Comparing Variability of Two Measurement Processes using R&R
Studies”, Burdick, Allen and Larsen, Journal of Quality Technology,
34(1), 2002.
“The Economic Impact of Measurement Error”, Mader, Prins and
Lampe, Quality Engineering, 11(4), 1999.
“On Setting Test Limits Relative to Specification Limits”, Grubbs and
Coon, Industrial Control, 10(5), 1954.
G.A. Larsen Consulting 2/23/2012 55
“Measurement System Analysis - the Usual Metrics can be Non-
Informative”, Larsen, Quality Engineering, 15(2), 2002.
“Measurement System Analysis in a Production Environment with
Multiple Test Parameters”, Larsen, Quality Engineering, 16(2), 2003.
“A Review of Methods for Measurement System Capability Analysis”,
Burdick, Borror and Montgomery, Journal of Quality Technology, 35(4),
2003.
“Confidence Intervals for Misclassification Rates in a Gauge R&R
Study”, Burdick, Park, Montgomery, and Borror, Journal of Quality
Technology, 37(4), 2005.
Design and Analysis of Gauge R&R Studies, Burdick, Borror and
Montgomery, ASA-SIAM Series on Statistics and Applied Probability,
SIAM, Philadelphia, ASA, Alexandria, VA, 2005.
“Capability Measures for Measurement Systems Analysis”, Larsen, in
Encyclopedia of Statistics in Quality and Reliability, Ruggeri, F.,
Kenett, R. and Faltin, F.W. (eds). John Wiley & Sons Ltd, Chichester,
UK, pp 1070-1074. 2007.
G.A. Larsen Consulting 2/23/2012 56
MSA on Pass/Fail Data
“Gauge Capability for Pass-Fail Inspection”, Boyles, Technometrics,
43(2), 2001.
MSA when Measurements are Destructive
• “Improving and Applying Destructive Gauge Capability”, Bergeret,
Maubert, Sourd, and Puel, Quality Engineering, 14(1), 2001.
• “Gauge R&R Studies for Destructive Measurements”, De Mast and Trip,
Journal of Quality Technology, 37(1), 2005.
MSA when Metric of Interest is Derived from Other Variables
• “Approving Measurement Systems when Using Derived Values”,
Majeske and Gearhart, Quality Engineering, 18(4), 2006.
Effect of Measurement Error on Control Charts
• “Effect of Measurement Error on Shewhart Control Charts”, Linna and
Woodall, Journal of Quality Technology, 33(2), 2001.
G.A. Larsen Consulting 2/23/2012 57