100% found this document useful (1 vote)
313 views28 pages

Interpret The Key Results For Attribute Agreement Analysis

The document provides steps to interpret an attribute agreement analysis. It evaluates appraiser agreement visually using graphs and statistically using kappa statistics and Kendall's coefficients. The analyses show Amanda had the most consistent and correct ratings while Eric had the least consistent and correct ratings. Most kappa values indicated good agreement within and between appraisers and the standard.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
313 views28 pages

Interpret The Key Results For Attribute Agreement Analysis

The document provides steps to interpret an attribute agreement analysis. It evaluates appraiser agreement visually using graphs and statistically using kappa statistics and Kendall's coefficients. The analyses show Amanda had the most consistent and correct ratings while Eric had the least consistent and correct ratings. Most kappa values indicated good agreement within and between appraisers and the standard.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Interpret the key results for Attribute

Agreement Analysis
Complete the following steps to interpret an attribute agreement analysis. Key output
includes kappa statistics, Kendall's statistics, and the attribute agreement graphs.

In This Topic

Step 1: Evaluate the appraiser agreement visually


To determine the consistency of each appraiser's ratings, evaluate the Within Appraisers
graph. Compare the percentage matched (blue circle) with the confidence interval for the
percentage matched (red line) for each appraiser.

To determine the correctness of each appraiser's ratings, evaluate the Appraiser vs Standard
graph. Compare the percentage matched (blue circle) with the confidence interval for the
percentage matched (red line) for each appraiser.

NOTE
Minitab displays the Within Appraisers graph only when you have multiple trials.
This Within Appraisers graph indicates that Amanda has the most consistent ratings and Eric has the least
consistent ratings. The Appraiser vs Standard graph indicates that Amanda has the most correct ratings
and Eric has the least correct ratings.

Step 2: Assess the consistency of responses for each


appraiser
To determine the consistency of each appraiser's ratings, evaluate the kappa statistics in the
Within Appraisers table. When the ratings are ordinal, you should also evaluate the Kendall's
coefficients of concordance. Minitab displays the Within Appraiser table when each
appraiser rates an item more than once.

Use kappa statistics to assess the degree of agreement of the nominal or ordinal ratings
made by multiple appraisers when the appraisers evaluate the same samples.

Kappa values range from –1 to +1. The higher the value of kappa, the stronger the
agreement, as follows:

• When Kappa = 1, perfect agreement exists.

• When Kappa = 0, agreement is the same as would be expected by chance.


• When Kappa < 0, agreement is weaker than expected by chance; this rarely occurs.

The AIAG suggests that a kappa value of at least 0.75 indicates good agreement. However,
larger kappa values, such as 0.90, are preferred.

When you have ordinal ratings, such as defect severity ratings on a scale of 1–5, Kendall's
coefficients, which account for ordering, are usually more appropriate statistics to determine
association than kappa alone.

NOTE
Remember that the Within Appraisers table indicates whether the appraisers' ratings are
consistent, but not whether the ratings agree with the reference values. Consistent ratings
aren't necessarily correct ratings.

Within Appraisers
Assessment Agreement

Appraiser # Inspected # Matched Percent 95% CI


Amanda 50 50 100.00 (94.18, 100.00)
Britt 50 48 96.00 (86.29, 99.51)
Eric 50 43 86.00 (73.26, 94.18)
Mike 50 45 90.00 (78.19, 96.67)

# Matched: Appraiser agrees with him/herself across trials.

Fleiss’ Kappa Statistics

Appraiser Response Kappa SE Kappa Z P(vs > 0)


Amanda 1 1.00000 0.141421 7.0711 0.0000
2 1.00000 0.141421 7.0711 0.0000
3 1.00000 0.141421 7.0711 0.0000
4 1.00000 0.141421 7.0711 0.0000
5 1.00000 0.141421 7.0711 0.0000
Overall 1.00000 0.071052 14.0741 0.0000
Britt 1 1.00000 0.141421 7.0711 0.0000
2 0.89605 0.141421 6.3360 0.0000
3 0.86450 0.141421 6.1129 0.0000
4 1.00000 0.141421 7.0711 0.0000
5 1.00000 0.141421 7.0711 0.0000
Overall 0.94965 0.071401 13.3002 0.0000
Eric 1 0.83060 0.141421 5.8733 0.0000
2 0.84000 0.141421 5.9397 0.0000
3 0.70238 0.141421 4.9666 0.0000
4 0.70238 0.141421 4.9666 0.0000
5 1.00000 0.141421 7.0711 0.0000
Overall 0.82354 0.071591 11.5034 0.0000
Mike 1 1.00000 0.141421 7.0711 0.0000
2 0.83060 0.141421 5.8733 0.0000
3 0.81917 0.141421 5.7924 0.0000
4 0.86450 0.141421 6.1129 0.0000
5 0.86450 0.141421 6.1129 0.0000
Overall 0.87472 0.070945 12.3295 0.0000
Kendall’s Coefficient of Concordance

Appraiser Coef Chi - Sq DF P


Amanda 1.00000 98.0000 49 0.0000
Britt 0.99448 97.4587 49 0.0000
Eric 0.98446 96.4769 49 0.0001
Mike 0.98700 96.7256 49 0.0001

Key Results: Kappa, Kendall's coefficient of concordance


Many of the kappa values are 1, which indicates perfect agreement within an appraiser between trials.
Some of Eric's kappa values are close to 0.70. You might want to investigate why Eric's ratings of those
samples were inconsistent. Because the data are ordinal, Minitab provides the Kendall's coefficient of
concordance values. These values are all greater than 0.98, which indicates a very strong association
within the appraiser ratings.

Step 3: Assess the correctness of responses for each


appraiser
To determine the correctness of each appraiser's ratings, evaluate the kappa statistics in the
Each Appraiser vs Standard table. When the ratings are ordinal, you should also evaluate the
Kendall's correlation coefficients. Minitab displays the Each Appraiser vs Standard table
when you specify a reference value for each sample.

Use kappa statistics to assess the degree of agreement of the nominal or ordinal ratings
made by multiple appraisers when the appraisers evaluate the same samples.

Kappa values range from –1 to +1. The higher the value of kappa, the stronger the
agreement, as follows:

• When Kappa = 1, perfect agreement exists.


• When Kappa = 0, agreement is the same as would be expected by chance.
• When Kappa < 0, agreement is weaker than expected by chance; this rarely occurs.

The AIAG suggests that a kappa value of at least 0.75 indicates good agreement. However,
larger kappa values, such as 0.90, are preferred.
When you have ordinal ratings, such as defect severity ratings on a scale of 1–5, Kendall's
coefficients, which account for ordering, are usually more appropriate statistics to determine
association than kappa alone.

Each Appraiser vs Standard

Assessment Agreement

Appraiser # Inspected # Matched Percent 95% CI


Amanda 50 47 94.00 (83.45, 98.75)
Britt 50 46 92.00 (80.77, 97.78)
Eric 50 41 82.00 (68.56, 91.42)
Mike 50 45 90.00 (78.19, 96.67)

# Matched: Appraiser’s assessment across trials agrees with the known


standard.

Fleiss’ Kappa Statistics

Appraiser Response Kappa SE Kappa Z P(vs > 0)


Amanda 1 1.00000 0.100000 10.0000 0.0000
2 0.83060 0.100000 8.3060 0.0000
3 0.81917 0.100000 8.1917 0.0000
4 1.00000 0.100000 10.0000 0.0000
5 1.00000 0.100000 10.0000 0.0000
Overall 0.92476 0.050257 18.4006 0.0000
Britt 1 1.00000 0.100000 10.0000 0.0000
2 0.83838 0.100000 8.3838 0.0000
3 0.80725 0.100000 8.0725 0.0000
4 1.00000 0.100000 10.0000 0.0000
5 1.00000 0.100000 10.0000 0.0000
Overall 0.92462 0.050396 18.3473 0.0000
Eric 1 0.91159 0.100000 9.1159 0.0000
2 0.81035 0.100000 8.1035 0.0000
3 0.72619 0.100000 7.2619 0.0000
4 0.84919 0.100000 8.4919 0.0000
5 1.00000 0.100000 10.0000 0.0000
Overall 0.86163 0.050500 17.0622 0.0000
Mike 1 1.00000 0.100000 10.0000 0.0000
2 0.91694 0.100000 9.1694 0.0000
3 0.90736 0.100000 9.0736 0.0000
4 0.92913 0.100000 9.2913 0.0000
5 0.93502 0.100000 9.3502 0.0000
Overall 0.93732 0.050211 18.6674 0.0000

Kendall’s Correlation Coefficient

Appraiser Coef SE Coef Z P


Amanda 0.967386 0.0690066 14.0128 0.0000
Britt 0.967835 0.0690066 14.0193 0.0000
Eric 0.951863 0.0690066 13.7879 0.0000
Mike 0.975168 0.0690066 14.1256 0.0000

Key Results: Kappa, Kendall's correlation coefficient


Most of the kappa values are larger than 0.80, which indicates good agreement between each appraiser
and the standard. A few of the kappa values are close to 0.70, which indicates that you may need to
investigate certain samples or certain appraisers further. Because the data are ordinal, Minitab provides
the Kendall's correlation coefficients. These values range from 0.951863 and 0.975168, which indicate a
strong association between the ratings and the standard values.

Step 4: Assess the consistency of responses between


appraisers
To determine the consistency between the appraiser's ratings, evaluate the kappa statistics
in the Between Appraisers table. When the ratings are ordinal, you should also evaluate the
Kendall's coefficient of concordance.

Use kappa statistics to assess the degree of agreement of the nominal or ordinal ratings
made by multiple appraisers when the appraisers evaluate the same samples.

Kappa values range from –1 to +1. The higher the value of kappa, the stronger the
agreement, as follows:

• When Kappa = 1, perfect agreement exists.


• When Kappa = 0, agreement is the same as would be expected by chance.
• When Kappa < 0, agreement is weaker than expected by chance; this rarely occurs.

The AIAG suggests that a kappa value of at least 0.75 indicates good agreement. However,
larger kappa values, such as 0.90, are preferred.

When you have ordinal ratings, such as defect severity ratings on a scale of 1–5, Kendall's
coefficients, which account for ordering, are usually more appropriate statistics to determine
association than kappa alone.

NOTE
Remember that the Between Appraisers table indicates whether the appraisers' ratings are
consistent, but not whether the ratings agree with the reference values. Consistent ratings
aren't necessarily correct ratings.

Between Appraisers

Assessment Agreement

# Inspected # Matched Percent 95% CI


50 37 74.00 (59.66, 85.37)

# Matched: All appraisers’ assessments agree with each other.

Fleiss’ Kappa Statistics

Response Kappa SE Kappa Z P(vs > 0)


1 0.954392 0.0267261 35.7101 0.0000
2 0.827694 0.0267261 30.9695 0.0000
3 0.772541 0.0267261 28.9058 0.0000
4 0.891127 0.0267261 33.3429 0.0000
5 0.968148 0.0267261 36.2248 0.0000
Overall 0.881705 0.0134362 65.6218 0.0000

Kendall’s Coefficient of Concordance

Coef Chi - Sq DF P
0.976681 382.859 49 0.0000

Key Results: Kappa, Kendall's coefficient of concordance


All the kappa values are larger than 0.77, which indicates minimally acceptable agreement between
appraisers. The appraisers have the most agreement for samples 1 and 5, and the least agreement for
sample 3. Because the data are ordinal, Minitab provides the Kendall's coefficient of concordance
(0.976681), which indicates a very strong association between the appraiser ratings.

Step 5: Assess the correctness of responses for all


appraisers
To determine the correctness of all the appraiser's ratings, evaluate the kappa statistics in
the All Appraisers vs Standard table. When the ratings are ordinal, you should also evaluate
the Kendall's coefficients of concordance.

Use kappa statistics to assess the degree of agreement of the nominal or ordinal ratings
made by multiple appraisers when the appraisers evaluate the same samples.

Kappa values range from –1 to +1. The higher the value of kappa, the stronger the
agreement, as follows:

• When Kappa = 1, perfect agreement exists.


• When Kappa = 0, agreement is the same as would be expected by chance.
• When Kappa < 0, agreement is weaker than expected by chance; this rarely occurs.

The AIAG suggests that a kappa value of at least 0.75 indicates good agreement. However,
larger kappa values, such as 0.90, are preferred.

When you have ordinal ratings, such as defect severity ratings on a scale of 1–5, Kendall's
coefficients, which account for ordering, are usually more appropriate statistics to determine
association than kappa alone.

All Appraisers vs Standard

Assessment Agreement
# Inspected # Matched Percent 95% CI
50 37 74.00 (59.66, 85.37)

# Matched: All appraisers’ assessments agree with the known standard.

Fleiss’ Kappa Statistics

Response Kappa SE Kappa Z P(vs > 0)


1 0.977897 0.0500000 19.5579 0.0000
2 0.849068 0.0500000 16.9814 0.0000
3 0.814992 0.0500000 16.2998 0.0000
4 0.944580 0.0500000 18.8916 0.0000
5 0.983756 0.0500000 19.6751 0.0000
Overall 0.912082 0.0251705 36.2362 0.0000

Kendall’s Correlation Coefficient

Coef SE Coef Z P
0.965563 0.0345033 27.9817 0.0000

Key Results: Kappa, Kendall's coefficient of concordance


These results show that all the appraisers correctly matched the standard ratings on 37 of the 50 samples.
The overall kappa value is 0.912082, which indicates strong agreement with the standard values. Because
the data are ordinal, Minitab provides the Kendall's coefficient of concordance (0.965563), which indicates
a strong association between the ratings and the standard values.

***************

The %Contribution table can be convenient because all sources of variability add up nicely to 100%.
Example:
The %Study Variation table doesn’t have the advantage of having all sources add up nicely to 100%,
but it has other positive attributes. Because standard deviation is expressed in the same units as the
process data, it can be used to form other metrics, such as Study Variation (6*standard deviation),
%Tolerance (if you enter in specification limits for your process), and %Process (if you enter in an
historical standard deviation). Of course, there are guidelines for levels of acceptability from AIAG as
well:

If the Total Gage R&R contribution in the %Study Var column (% Tolerance, %Process) is:

• Less than 10% - the measurement system is acceptable.


• Between 10% and 30% - the measurement system is acceptable depending on the
application, the cost of the measuring device, cost of repair, or other factors.
• Greater than 30% - the measurement system is unacceptable and should be improved.

If you are looking at the %Contribution column, the corresponding standards are:

• Less than 1% - the measurement system is acceptable.


• Between 1% and 9% - the measurement system is acceptable depending on the application,
the cost of the measuring device, cost of repair, or other factors.
• Greater than 9% - the measurement system is unacceptable and should be improved.

We field a lot of questions about %Tolerance as well. %Tolerance is just comparing estimates of
variation (part-to-part, and total gage) to the spread of the tolerance.

When you enter a tolerance, the output from your gage study will be exactly the same as if you
hadn't entered a tolerance, with the exception that your output will now contain a %Tolerance
column. Your results will still be accurate if you don't put in a tolerance range; however, including
the tolerance will provide you more information.

For example, you could have a high percentage in %Study Var for part-to-part, and a high number of
distinct categories. However, when you compare the variation to your tolerance, it might show that
in reference to your spec limits, the variation due to gage is high. The %Tolerance column may be
more important to you than the %Study Var column, since the %Tolerance is more specific to your
product and its spec limits.

Think of it this way: Your total variation comprises part-to-part and the gage (Reproducibility and
Repeatability). After adding a tolerance, we get to see what percentage of variation really dominates
within the tolerance bounds specified. If the ratio between the Total Gage R&R and the tolerance is
high (%Tolerance>30%), that provides insight about the types of parts being selected. It’s telling us
that the measurement tool cannot effectively decipher if the part is good or bad, because too much
measurement system variation is showing up between specifications.

I hope the answers to these common questions help you next time you’re doing Gage R&R in
Minitab!

***************

All statistics and graphs for Crossed Gage


R&R Study
DF
The degrees of freedom (DF) for each SS (sums of squares). In general, DF measures how
much information is available to calculate each SS.

SS
The sum of squares (SS) is the sum of squared distances, and is a measure of the variability
that is from different sources. Total SS indicates the amount of variability in the data from
the overall mean. SS Operator indicates the amount of variability between the average
measurement for each operator and the overall mean.

SS Total = SS Part + SS Operator + SS Operator * Part + SS Repeatability

MS
The mean squares (MS) is the variability in the data from different sources. MS accounts for
the fact that different sources have different numbers of levels or possible values.
MS = SS/DF for each source of variability

F
The F-statistic is used to determine whether the effects of Operator, Part, or Operator*Part
are statistically significant.

The larger the F statistic, the more likely it is that the factor contributes significantly to the
variability in the response or measurement variable.

P
The p-value is the probability of obtaining a test statistic (such as the F-statistic) that is at
least as extreme as the value that is calculated from the sample, if the null hypothesis is true.

Interpretation
Use the p-value in the ANOVA table to determine whether the average measurements are
significantly different. Minitab displays an ANOVA table only if you select the ANOVA option
for Method of Analysis.

A low p-value indicates that the assumption of all parts, operators, or interactions sharing
the same mean is probably not true.

To determine whether the average measurements are significantly different, compare the p-
value to your significance level (denoted as α or alpha) to assess the null hypothesis. The
null hypothesis states that the group means are all equal. Usually, a significance level of 0.05
works well. A significance level of 0.05 indicates a 5% risk of concluding that a difference
exists when it does not.

P-value ≤ α: At least one mean is statistically different

If the p-value is less than or equal to the significance level, you reject the null
hypothesis and conclude that at least one of the means is significantly different from
the others. For example, at least one operator measures differently.

P-value > α: The means are not significantly different


If the p-value is greater than the significance level, you fail to reject the null
hypothesis because you do not have enough evidence to conclude that the
population means are different. For example, you can not conclude that the
operators measure differently.

However, you also cannot conclude that the means are the same. A difference might
exist, but your test might not have enough power to detect it.

VarComp
VarComp is the estimated variance components for each source in an ANOVA table.

Interpretation
Use the variance components to assess the variation for each source of measurement
error.

In an acceptable measurement system, the largest component of variation is Part-to-


Part variation. If repeatability and reproducibility contribute large amounts of
variation, you need to investigate the source of the problem and take corrective
action.

%Contribution (of VarComp)


%Contribution is the percentage of overall variation from each variance component.
It is calculated as the variance component for each source divided by the total
variation, then multiplied by 100 to express as a percentage.

Interpretation
Use the %Contribution to assess the variation for each source of measurement error.

In an acceptable measurement system, the largest component of variation is Part-to-


Part variation. If repeatability and reproducibility contribute large amounts of
variation, you need to investigate the source of the problem and take corrective
action.

StdDev (SD)
StdDev (SD) is the standard deviation for each source of variation. The standard
deviation is equal to the square root of the variance component for that source.
The standard deviation is a convenient measure of variation because it has the same
units as the part measurements and tolerance.

Study Var (6 * SD)


The study variation is calculated as the standard deviation for each source of
variation multiplied by 6 or the multiplier that you specify in Study variation.

Usually, process variation is defined as 6s, where s is the standard deviation as an


estimate of the population standard deviation (denoted by σ or sigma). When data
are normally distributed, approximately 99.73% of the data fall within 6 standard
deviations of the mean. To define a different percentage of data, use another
multiplier of standard deviation. For example, if you want to know where 99% of the
data fall, you would use a multiplier of 5.15, instead of the default multiplier of 6.

%Study Var (%SV)


The %study variation is calculated as the study variation for each source of variation,
divided by the total variation and multiplied by 100.

%Study Var is the square root of the calculated variance component (VarComp) for
that source. Thus, the %Contribution of VarComp values sum to 100, but the %Study
Var values do not.

Interpretation
Use %Study Var to compare the measurement system variation to the total variation.
If you use the measurement system to evaluate process improvements, such as
reducing part-to-part variation, %Study Var is a better estimate of measurement
precision. If you want to evaluate the capability of the measurement system to
evaluate parts compared to specification, %Tolerance is the appropriate metric.

%Tolerance (SV/Toler)
%Tolerance is calculated as the study variation for each source, divided by the
process tolerance and multiplied by 100.

If you enter the tolerance, Minitab calculates %Tolerance, which compares


measurement system variation to the specifications.
Interpretation
Use %Tolerance to evaluate parts relative to specifications. If you use the
measurement system for process improvement, such as reducing part-to-part
variation, %StudyVar is the appropriate metric.

%Process (SV/Proc)
If you enter a historical standard deviation but use the parts in the study to estimate
the process variation, then Minitab calculates %Process. %Process compares
measurement system variation to the historical process variation. %Process is
calculated as the study variation for each source, divided by the historical process
variation and multiplied by 100. By default, the process variation is equal to 6 times
the historical standard deviation.

If you use a historical standard deviation to estimate process variation, then Minitab
does not show %Process because %Process is identical to %Study Var.

95% CI
95% confidence intervals (95% CI) are the ranges of values that are likely to contain
the true value of each measurement error metric.

Minitab provides confidence intervals for the variance components, the


%contribution of the variance components, the standard deviation, the study
variation, the %study variation, the %tolerance, and the number of distinct
categories.

Interpretation
Because samples of data are random, two gage studies are unlikely to yield identical
confidence intervals. But, if you repeat your studies many times, a certain percentage
of the resulting confidence intervals contain the unknown true measurement error.
The percentage of these confidence intervals that contain the parameter is the
confidence level of the interval.

For example, with a 95% confidence level, you can be 95% confident that the
confidence interval contains the true value. The confidence interval helps you assess
the practical significance of your results. Use your specialized knowledge to
determine whether the confidence interval includes values that have practical
significance for your situation. If the interval is too wide to be useful, consider
increasing your sample size.

Suppose that the VarComp for Repeatability is 0.044727 and the corresponding 95%
CI is (0.035, 0.060). The estimate of variation for repeatability is calculated from the
data to be 0.044727. You can be 95% confident that the interval of 0.035 to 0.060
contains the true variation for repeatability.

Number of distinct categories


The number of distinct categories is a metric that is used in gage R&R studies to
identify a measurement system's ability to detect a difference in the measured
characteristic. The number of distinct categories represents the number of non-
overlapping confidence intervals that span the range of product variation, as defined
by the samples that you chose. The number of distinct categories also represents the
number of groups within your process data that your measurement system can
discern.

Interpretation
The Measurement Systems Analysis Manual1 published by the Automobile Industry
Action Group (AIAG) states that 5 or more categories indicates an acceptable
measurement system. If the number of distinct categories is less than 5, the
measurement system might not have enough resolution.

Usually, when the number of distinct categories is less than 2, the measurement
system is of no value for controlling the process, because it cannot distinguish
between parts. When the number of distinct categories is 2, you can split the parts
into only two groups, such as high and low. When the number of distinct categories
is 3, you can split the parts into 3 groups, such as low, middle, and high.

For more information, go to Using the number of distinct categories.

Probabilities of misclassification
When you specify at least one specification limit, Minitab can calculate the
probabilities of misclassifying product. Because of the gage variation, the measured
value of the part does not always equal the true value of the part. The discrepancy
between the measured value and the actual value creates the potential for
misclassifying the part.

Minitab calculates both the joint probabilities and the conditional probabilities of
misclassification.

Joint probability

Use the joint probability when you don't have prior knowledge about the
acceptability of the parts. For example, you are sampling from the line and don't
know whether each particular part is good or bad. There are two misclassifications
that you can make:

• The probability that the part is bad, and you accept it.
• The probability that the part is good, and you reject it.
Conditional probability

Use the conditional probability when you do have prior knowledge about the
acceptability of the parts. For example, you are sampling from a pile of rework or
from a pile of product that will soon be shipped as good. There are two
misclassifications that you can make:

• The probability that you accept a part that was sampled from a pile of bad
product that needs to be reworked (also called false accept).
• The probability that you reject a part that was sampled from a pile of good
product that is about to be shipped (also called false reject).

Interpretation
Three operators measure ten parts, three times per part. The following graph
shows the spread of the measurements compared to the specification limits. In
general, the probabilities of misclassification are higher with a process that has
more variation and produces more parts close to the specification limits.

Gage R&R Study - ANOVA Method

Two-Way ANOVA Table With Interaction

Source DF SS MS F P

Part 9 88.3619 9.81799 492.291 0.000

Operator 2 3.1673 1.58363 79.406 0.000

Part * Operator 18 0.3590 0.01994 0.434 0.974

Repeatability 60 2.7589 0.04598

Total 89 94.6471

α to remove interaction term = 0.05

Two-Way ANOVA Table Without Interaction

Source DF SS MS F P

Part 9 88.3619 9.81799 245.614 0.000


Operator 2 3.1673 1.58363 39.617 0.000

Repeatability 78 3.1179 0.03997

Total 89 94.6471

Gage R&R

Variance Components

%Contribution

Source VarComp (of VarComp)

Total Gage R&R 0.09143 7.76

Repeatability 0.03997 3.39

Reproducibility 0.05146 4.37

Operator 0.05146 4.37

Part-To-Part 1.08645 92.24

Total Variation 1.17788 100.00

Process tolerance = 2.5

Gage Evaluation

Study Var %Study Var %Tolerance

Source StdDev (SD) (6 × SD) (%SV) (SV/Toler)


Total Gage R&R 0.30237 1.81423 27.86 72.57

Repeatability 0.19993 1.19960 18.42 47.98

Reproducibility 0.22684 1.36103 20.90 54.44

Operator 0.22684 1.36103 20.90 54.44

Part-To-Part 1.04233 6.25396 96.04 250.16

Total Variation 1.08530 6.51180 100.00 260.47

Number of Distinct Categories = 4

Probabilities of Misclassification

Joint Probability

Description Probability

A randomly selected part is bad but accepted 0.037

A randomly selected part is good but rejected 0.055

Conditional Probability

Description Probability

A part from a group of bad products is accepted 0.151

A part from a group of good products is rejected 0.073


Gage R&R for Measurement

The joint probability that a part is bad and you accept it is 0.037. The joint probability that a part is good
and you reject it is 0.055.

The conditional probability of a false accept, that you accept a part during reinspection when it is really
out-of-specification, is 0.151. The conditional probability of a false reject, that you reject a part during
reinspection when it is really in-specification, is 0.073.

Components of variation graph


The components of variation chart is a graphical summary of the results of a gage R&R
study.

The sources of variation that are represented in the graph are:

• Total Gage R&R: The variability from the measurement system that includes multiple
operators using the same gage.
• Repeatability: The variability in measurements when the same operator measures the same
part multiple times.
• Reproducibility: The variability in measurements when different operators measure the same
part.
• Part-to-Part: The variability in measurements due to different parts.

Interpretation

Separate colored bars represent:

%Contribution

%Contribution is the percentage of overall variation from each variance component.


It is calculated as the variance component for each source divided by the total
variation, then multiplied by 100.

%Study Variation
%Study Variation is the percentage of study variation from each source. It is
calculated as the study variation for each source divided by the total study variation,
then multiplied by 100.
%Tolerance
%Tolerance compares measurement system variation to specifications. It is calculated
as the study variation for each source divided by the process tolerance, then
multiplied by 100.

Minitab calculates this value when you specify a process tolerance range or
specification limit.

%Process
%Process compares measurement system variation to the total variation. It is
calculated as the study variation for each source divided by the historical process
variation, then multiplied by 100.

Minitab calculates this value when you specify a historical standard deviation and
select Use parts in the study to estimate process variation.

In an acceptable measurement system, the largest component of variation is


part-to-part variation.

R chart
The R chart is a control chart of ranges that displays operator consistency.
The R chart contains the following elements.

Plotted points

For each operator, the difference between the largest and smallest measurements of
each part. The R chart plots the points by operator so you can see how consistent
each operator is.

Center line (Rbar)


The grand average for the process (that is, average of all the sample ranges).

Control limits (LCL and UCL)


The amount of variation that you can expect for the sample ranges. To calculate the
control limits, Minitab uses the variation within samples.

NOTE
If each operator measures each part 9 times or more, Minitab displays
an S chart instead of an R chart.

Interpretation
A small average range indicates that the measurement system has low
variation. A point that is higher than the upper control limit (UCL)
indicates that the operator does not measure parts consistently. The
calculation of the UCL includes the number of measurements per part
by each operator, and part-to-part variation. If the operators measure
parts consistently, then the range between the highest and lowest
measurements is small, relative to the study variation, and the points
should be in control.
Xbar chart
The Xbar chart compares the part-to-part variation to the repeatability component.

The Xbar chart contains the following elements.

Plotted points

The average measurement of each part, plotted by each operator.

Center line (Xbar)


The overall average for all part measurements by all operators.

Control limits (LCL and UCL)


The control limits are based on the repeatability estimate and the number of measurements
in each average.

Interpretation
The parts that are chosen for a Gage R&R study should represent the entire range of
possible parts. Thus, this graph should indicate more variation between part averages than
what is expected from repeatability variation alone.
Ideally, the graph has narrow control limits with many out-of-control points that indicate a
measurement system with low variation.

By Part graph
This graph shows the differences between factor levels. Gage R&R studies usually arrange
measurements by part and by operator. However, with an expanded gage R&R study, you
can graph other factors.

In the graph, dots represent the measurements, and circle-cross symbols represent the
means. The connect line connects the average measurements for each factor level.

NOTE
If there are more than 9 observations per level, Minitab displays a boxplot instead of an
individual value plot.

Interpretation
Multiple measurements for each individual part that vary as minimally as possible (the dots
for one part are close together) indicate that the measurement system has low variation.
Also, the average measurements of the parts should vary enough to show that the parts are
different and represent the entire range of the process.

By Operator graph
The By Operator chart displays all the measurements that were taken in the study, arranged
by operator. This graph shows the differences between factor levels. Gage R&R studies
usually arrange measurements by part and by operator. However, with an expanded gage
R&R study, you can graph other factors.

NOTE
If there are less than 10 observations per operator, Minitab displays an individual value plot
instead of a boxplot.

Interpretation
A straight horizontal line across operators indicates that the mean measurements for each
operator are similar. Ideally, the measurements for each operator vary an equal amount.
Operator*Part Interaction graph
The Operator*Part Interaction graph displays the average measurements by each operator
for each part. Each line connects the averages for a single operator (or for a term that you
specify).

Interaction plots display the interaction between two factors. An interaction occurs when the
effect of one factor is dependent on a second factor. This plot is the graphical analog of the
F-test for an interaction term in the ANOVA table.

Interpretation
Lines that are coincident indicate that the operators measure similarly. Lines that are not
parallel or that cross indicate that an operator's ability to measure a part consistently
depends on which part is being measured. A line that is consistently higher or lower than
the others indicates that an operator adds bias to the measurement by consistently
measuring high or low.
1
Automotive Industry Action Group (AIAG) (2010). Measurement Systems Analysis Reference
Manual, 4th edition. Chrysler, Ford, General Motors Supplier Quality Requirements Task
Force.

You might also like