Non-Replicable GRR CaseStudy
Non-Replicable GRR CaseStudy
June 5, 2002
ABSTRACT
Gage studies provide an estimate of how much of the observed process variation is
due to measurement system variation. This is typically done by a methodical
procedure of measuring, then re-measuring the same parts by different appraisers.
This cannot be done with a non-replicable (destructive) measurement system
because the measurement procedure cannot be replicated on a given part after it
has been destroyed. Here a method using ANOVA is used in a case study which
demonstrates one possible way to determine measurement variation in a nonreplicable system.
This paper is intended to serve as additional guidelines for the analysis of measurement systems.
www.aiag.org/publications/quality/msa3.html
Appraiser 2
Part #
Trial 1
Trial 2
Trial 3
Trial 1
Trial 2
Trial 3
1
2
3
4
5
6
7
8
9
10
10
10
10
10
10
10
Study Approach
The first thing that must be done before tackling a non-replicable GRR
study is to ensure that all the conditions surrounding the measurement
testing atmosphere are defined, standardized and controlled appraisers
should be similarly qualified and trained, lighting should be adequate and
consistently controlled, work instructions should be detailed and
operationally defined, environmental conditions should be controlled to
an adequate degree, equipment should be properly maintained and
calibrated, failure modes understood, etc. Figure 2 in the MSA manual,
Measurement System Variability Cause and Effect Diagram, p. 15, and
the Suggested Elements for a Measurement System Development
Checklist, pp. 36 38, may assist in this endeavor.
Second, there is a good deal of prerequisite work that must be done
before doing a non-replicable study. The production process must be
stable and the nature of its variation understood to the extent that units
may be appropriately sampled for the non-replicable study where is the
process homogeneous and where is it heterogeneous?
Another
consideration: if the overall process appears to be stable AND
CAPABLE, and all the surrounding pre-requisites have been met, it may
not make sense to spend the effort to do a non-replicable study since the
overall capability includes measurement error if the total product
variation and location is OK, the measurement system may be considered
acceptable.
Standard GRR procedures and analysis methods must be changed and
certain other assumptions must be made before conducting a nonreplicable measurement systems analysis. The plan for sampling parts to
be used in a non-replicable GRR needs some structure. Since the original
part cannot be re-measured due to its destruction, other similar
(homogeneous) parts must be chosen for the study (for the other trials
and other appraisers) and an assumption must be made that they are
duplicate or identical parts. In other words, as the duplicate parts are
re-measured across other trials and by other appraisers, we will pretend
that the same part is being measured. Refer to Figure 2. Part 1 is now
Part 1-1, 1-2, 1-3, 1-4, 1-5, 1-6, for this 10-2-3 layout. Six very similar,
assumed to be identical, parts are used to represent Part 1, and so on for
all 10 parts. The assumption must be made that all the parts sampled
consecutively (within one batch) are identical enough that they can be
treated as if they are the same. If the particular process of interest does
not satisfy this assumption, this method will not work.
Appraiser 1
Appraiser 2
Part #
Trial 1
Trial 2
Trial 3
Trial 1
Trial 2
Trial 3
1A1F
2A2F
3A3F
4A4F
5A5F
6A6F
7A7F
8A8F
9A9F
10A10F
1-1
1-2
1-3
1-4
1-5
1-6
2-1
2-2
2-3
2-4
2-5
2-6
3-1
3-2
3-3
3-4
3-5
3-6
4-1
4-2
4-3
4-4
4-5
4-6
5-1
5-2
5-3
5-4
5-5
5-6
6-1
6-2
6-3
6-4
6-5
6-6
7-1
7-2
7-3
7-4
7-5
7-6
8-1
8-2
8-3
8-4
8-5
8-6
9-1
9-2
9-3
9-4
9-5
9-6
10-1
10-2
10-3
10-4
10-5
10-6
Care must be taken in choosing these duplicate parts. Typically for the
parts that represent part number 1 in a study, each duplicate is selected
in a way that it is as much alike the original part as possible. Likewise
for part number 2, and number 3, 4, 5, etc. These parts should be
produced under production conditions as similar as possible. Consider
the 5 Ms +E4 and make them all as alike as possible. Generally, if
parts are taken from production in a consecutive manner, this
requirement is met.
However, the parts chosen to represent part number 2, for example, must
be chosen to be unlike part number 1, part number 3, 4, 5, etc. So
between part numbers, the 5 Ms +E must be unlike each other. These
differences must be forced to be between part numbers. The total
number of duplicate parts selected for each row must equal the number
of appraisers times the number of trials.5 In Figure 2, groups of parts
within each row are assumed to be identical, but groups of parts between
rows are assumed to be different.
Part variation may be expressed as part-to-part, shift-to-shift, day-to-day,
lot-to-lot, batch-to-batch, week-to-week, etc. With parts the minimum
variation would be part-to-part this represents the minimum possible
amount of time between each part. When parts are not sampled
consecutively (i.e., part-to-part), there is more opportunity for variation
to occur different production operators, different raw material, different
components, changes in environment, etc.
So, within a row it is desirable to minimize variation by taking parts
consecutively, thus representing part-to-part variation. Between rows it
is desirable to maximize variation by taking parts from different lots,
batches, etc. There may be economic, time or other constraints involved
which will impose limits on the length of time we can wait to take
3
4
5
The necessary randomized presentation to the appraiser is not shown here for the sake of clarity.
Man, Machine, Material, Method, Measurement plus Environment. Measurement may seem redundant here, but there may be times where
two or more identical measurement systems are used to gain the same information and this should be considered in any study.
When the source of measurement variation is thought to be due to equipment only, using different appraisers may not be required.
samples for the between-row data the process must be run, the PPAP
must be submitted, etc. When constraints arise and interfere with doing
things the right way, the results may be subject to modified
interpretation.
Another statistical assumption that must be made for this type of study is
that the measurement error is normally distributed. This is a prerequisite
for any ANOVA (Analysis of Variance).
ANOVA is a better analysis tool for a non-replicable measurement
systems analysis than the average and range method. ANOVA has the
power to examine interactions that the average range method will not
catch.
As a precautionary note, the results from this type of study will
contain some process variation because the identical parts are
not really identical. This may come into play when interpreting the
results in terms of the error percentage related to process variation
or tolerance. The better the methodology used to achieve an
understanding of the production process and its corresponding
measurement system, the more meaningful this non-replicable
measurement study will be.
CASE STUDY
A stamped part goes through a critical weld assembly process which
must be destructively tested on an ongoing basis. The process has an inhouse progressive stamping die which produces the steel stampings.
This process is followed by robotic MIG welding (attaching an outside
purchased steel rod to the stamping) in one of 6 different weld stations
each of which has 4 parallel weld fixtures (only one of which will be
used per assembly). Each weld fixture is assigned a letter designation, A
through X. This process has been in production long enough that it has
been studied and analyzed for stability and capability. Each of the 24
weld stations is providing a stable and capable process, however some
are better than others. In an effort to improve the overall process, the
measurement system would be analyzed using this non-replicable MSA
methodology which had been recently introduced to the supplier.
Study Format
represent the full range of the process. There was enough early
confidence in the measurement system to make this judgment.
Similarity (homogeneity) within each row was created by taking 6
consecutively produced stampings (6 is chosen to meet the 2 appraisers x
3 trials requirement), then welding those 6 parts consecutively through
the same weld fixture. Dissimilarity (heterogeneity) between rows was
created by taking groups of 6 consecutive stampings from different coils
of steel at a time separated by a few hours, then running them
consecutively through a different weld fixture at a different time.
The rod component which is welded to the stamping is received in bulk
and has already been determined to not play a major role in pull test
variation. Therefore, in this study there was no effort made to maintain
similarity and dissimilarity issues with the rod component.
Previous studies using common problem solving tools had shown that a
manual positioning and clamping system used on the testing machine
was appraiser dependent, so a new and better positioning system with
hydraulic clamps was installed. Parts are located into the machine with
positive locators and hydraulic clamps. A hook on the testing machine
grabs the rod and mechanically pulls on the rod to destruction. A digital
readout on the machine displays the peak pulloff force in pounds and
reads to one decimal place. From this readout, the data is recorded and
the failure mode noted (weld must pull metal from the stamping).
Although the appraiser dependency was assumed to be resolved, this
study still used two appraisers to verify that assumption.
A total of 60 parts were required to do this study. There were 10 groups
of similar parts, 2 appraisers and 3 trials; 10 x 2 x 3 = 60. Parts were
first gathered off the stamping operation, carefully numbered and
quarantined until all 60 parts had been collected. These 10 groups of
parts were selected at 3 hour intervals, over 3 days of production, in
order to force some difference between each group of parts.
Then, each similar stamped group of parts was run through a different
weld fixture. Parts introduced to each weld fixture were presented in
random order within each group of 6.
Appraiser 1
Part #
Trial 1
Trial 2
1-1R1-6R
2-1P2-6P
3-1H3-6H
4-1G4-6G
5-1E5-6E
6-1F6-6F
7-1M7-6M
8-1O8-6O
9-1Q9-6Q
10-1T10-6T
1-6R53
2-2P27
Appraiser 2
Trial 3
Trial 1
Trial 2
Trial 3
1-5R25
1-3R7
1-4R40
1-1R34
1-2R32
2-6P35
2-5P57
2-4P12
2-3P43
2-1P17
3-1H21
3-2H36
3-6H1
3-5H56
3-4H10
3-3H26
4-4G46
4-6G42
4-3G8
4-2G28
4-1G55
4-5G30
5-4E5
5-3E20
5-1E13
5-6E54
5-2E39
5-5E50
6-1F52
6-3F3
6-4F37
6-5F29
6-2F51
6-6F45
7-6M16
7-4M11
7-1M23
7-2M6
7-3M15
7-5M14
8-6O49
8-3O60
8-1O33
8-5O41
8-2O44
8-4O19
9-5Q31
9-6Q59
9-3Q24
9-2Q4
9-4Q9
9-1Q2
10-2T22
10-5T18
10-3T47
10-4T58
10-1T48
10-6T38
Referring to Figure 3, each row shows the similar parts. 1-1R stands
for Stamping #1 of the first group of 6 stampings, which was run through
weld fixture R. 1-2R stands for Stamping #2 of the first group of
stampings, which was run through weld fixture R. 10-1T stands for
Stamping #1 of the tenth group of stampings, which was run through
weld fixture T.6
The parts were numbered with the identification shown in Figure 3.
Parts within each row were presented in a random order to both the weld
assembly operation and to the weld test operation, each with a different
random order. The order shown above is for the weld test operation; the
order for the weld assembly operation within each row was a different
random order and is not shown here. Such randomization reduces the
possibility of any bias present in the order of manufacturing and/or
testing.
Once all assembly was completed, the parts were presented to the
appraisers for destructive testing and the data were recorded. Parts were
saved, preserving the original part numbers, in case any post-analysis
needed to be done.
RESULTS
The data were put into a Minitab Gage R&R (nested) routine which
generated a nested ANOVA. A nested (vs. crossed) ANOVA is required
for this type of study because all parts are not tested by (crossed with) all
appraisers across all trials they cannot be because they are destroyed
after one test. Each appraiser cannot be crossed with each part. Other
charts were also generated by Minitab.
The first thing to look at is the Gage Run Chart.
For the sake of clarity here, the randomized order of total presentation to the appraisers running the test machine is shown as a subscript.
10200
Appr1
PULLOFF
Appr2
9200
8200
"Part"
10
PULLOFF
10200
9200
8200
"Part"
Components of Variation
Percent
100
%Contribution
%Study Var
1
50
10000
9000
4
0
Gage R&R
Repeat
Reprod
8000
Part2
Appr No
Part-to-Part
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Appr1
Appr2
1000
Appr1
By Appr No.
Appr2
UCL=928.6
10000
2
500
9000
R=360.7
LCL=0
8000
Appr No
Appr1
Appr2
10000
Appr1
Appr2
3
UCL=9525
9500
Mean=9156
9000
LCL=8787
8500
Figure 5: GRR Summary Graphics GRR Study (Nested), Weld Tester, Rod Pulloff
A graphical summary, generated by Minitab, of this study is presented in
Figure 5. Five graphs are shown and are numbered from 1 to 5.
Presentation and analysis of these charts falls in line with a standard
GRR.
Graph No. 1 shows the components of variation. Four sources are
displayed here
It is possible in Minitab to also display % Tolerance and % Process but these options were not chosen here.
Graph No. 2 is a Range (R) Chart showing the range of the readings for
each appraiser on each part. Note that there is an upper control limit
(UCL) and lower control limit (LCL) and that to be acceptable all points
should be within these limits. Out of control conditions means there is
some sort of inconsistency occurring and should be investigated before
going further.
The data in this case study is all within the control limits and is
considered acceptable with respect to the Range Chart.
Graph No. 3 is an Xbar ( X ) Chart showing the average of each
appraisers 3 readings on each part. Note that there is an upper control
limit (UCL) and lower control limit (LCL) and that to be acceptable,
approximately 50% of the points should be outside these control limits.
The distance between the control limits represents the band of
measurement system variation. If all points were within the control
limits it would mean that no significant distinction can be made between
any of the parts in the study.
In the data for this case study, 11 of 20 points (55%) are out of control
and this is considered acceptable for the Xbar Chart.
Graph No. 4 is a chart of the part-to-part variation across the study. For
this particular study, Graph 4 does not add much value.
Graph No. 5 shows the grand average for each appraiser as well as
points representing the individual readings. The horizontal line between
these points indicates a visual reference for the difference in the grand
average the flatter this line the less difference there is between these
grand averages.
In the data for this case study, the reference line is quite flat indicating
very little difference in the grand average for each appraiser.
Graphs 4 and 5 may be used as clue generators for further analysis.
Also, other graphics, such as histograms, may be used for more detailed
analysis.
While reviewing the graphics it is a good idea to at the same time review
the ANOVA summary of the GRR data. Again, this summary is the
same as a standard ANOVA table and what would be generated by a
standard GRR study.
DF
1
18
40
59
SS
15636
13965615
1853410
15834661
MS
15636
775868
46335
F
0.0202
16.7447
P
0.88869
0
Gage R&R
SOURCE
VARCOMP
46335
46335
0
243177
289513
SOURCE
Total Gage R&R
Repeatability
Reproducibility
Part-To-Part
Total Variation
STD DEVIATION
(SD)
215.256
215.256
0
493.130
538.064
% CONTRIBUTION
(OF VARCOMP)
16.00
16.00
0
84.00
100.00
STUDY VARIATION
(6 * SD)
1291.54
1291.54
0
2958.78
3228.38
% STUDY VARIATION
(% SV)
40.01
40.01
0
91.65
100.00
CONCLUSIONS
Given all the above information one must make some decisions about the
measurement system is it acceptable, is it useable, is more study
needed, does an appraiser need better training, is customer approval
required for use, etc? For acceptability, generally the GRR% is reviewed
for suitability8.
In this particular case study example the overall GRR% = 40.01% which
does not make for a clean interpretation. Is this acceptable given that the
traditional upper limit for even a marginally acceptable measurement
system is 30%?
Consider:
1. The overall process to which the GRR% is compared is actually
represented only by the parts (and weld fixtures) chosen for this
study. Only 10 of the 24 weld fixtures were included in this
study. Not all of the process is necessarily represented here. If
the process variation due to all the fixtures is much larger than
that of the 10 selected for this study, then the measurement
system may be acceptable based on MSA guidelines.
2. The data in any non-replicable study such as this will necessarily
include SOME process variation. So some portion of the 40.01
GRR% is actually process variation. It is impossible to separate
all process variation from measurement system variation with
this scheme.
3. The machine used to do the destruct pulloffs is a relatively
sophisticated and expensive piece of equipment. How much
8
ndc is also typically reviewed at this time, however for the sake of brevity it will not be shown here.
10
Acceptability Issues
9
10
11
Ironically, the better the process Cp11, the more difficult it may be to
establish acceptable non-replicable measurement system analysis
results using the methods here. Recall that a major requirement for
this method to work successfully is that one can knowingly produce
similar (homogeneous) parts (to be used within parts and operator
across trials) and dissimilar (heterogeneous) parts (to be used
between parts and operators). If the production process has a very
The average run length (ARL) at a given quality level is the average number of samples (subgroups) taken before an action signal is given.
This multiplier value would be chosen depending on the amount of risk one chooses to accept.
Technically a process with unilateral tolerance (such as demonstrated in the case study here) has no Cp. What is meant here is a process
with a relatively small amount of variation.
11
Summary
Bias and linearity are not evaluated by this method. As with any
GRR study, only repeatability and reproducibility are considered.
The measurement equipment calibration plan becomes critical to the
overall acceptability of the non-replicable measurement system.
_____________________________________________________________________________________________
Dave Benham works at DaimlerChrysler Corporation in Auburn Hills, Michigan, as a Senior Consultant in Supplier
Development. He has a B.S. in Psychology and M.A. in Education from Michigan State University. Dave is an
ASQ Certified Quality Engineer and Certified Reliability Engineer. He has worked in the automotive industry in the
field of quality for 25 years.
12
ADDITIONAL READING
Bergeret, F.; Maubert, S.; Sourd, P.; Puel, F.; Improving and Applying Destructive Gauge Capability,
Quality Engineering, Vol. 14, No. 1, September 2001, pp. 59-66.
(http://www.asq.org/info/library/faq/gagerr/articles.html#destructive)
Conklin, Joseph D., Assessing Measurement Error in a Destructive Test, 47th Annual Quality Congress,
May 1993, Boston MA, Vol. 0, No. 0, May 1993, pp. 400-405.
(http://www.asq.org/info/library/faq/gagerr/articles.html#destructive)
Ingram, David J.; Taylor, Wayne A.; Measurement System Analysis, Annual Quality Congress
Proceedings, Philadelphia, PA, Vol. 52, No. 0, May 1998, pp. 931-941.
(http://www.asq.org/info/library/faq/gagerr/articles.html#destructive)
Phillips, Aaron R.; Jeffries, Rella; Schneider, Jan; Frankoski, Stanley P.; Using Repeatability and
Reproducibility Studies to Evaluate a Destructive Test Method, Quality Engineering, Vol. 10, No. 2,
December 1997, pp. 283-290. (http://www.asq.org/info/library/faq/gagerr/articles.html#destructive)
Spiers, Beryl, Analysis of Destructive Measuring Systems, 43rd Annual Quality Congress, May 1989,
pp. 22-27. (http://www.asq.org/info/library/faq/gagerr/articles.html#destructive)
Wheeler, D. J., Evaluating the Measurement Process When the Testing is Destructive, TAPPI, 1990
Polymers Lamination & Coatings Conference Proceedings, pp. 805-807 (www.tappi.org;
plc90905.PDF).
13