Statistical Intervals: Confidence, Prediction, Enclosure: José G. Ramírez, PHD, W.L. Gore and Associates Inc
Statistical Intervals: Confidence, Prediction, Enclosure: José G. Ramírez, PHD, W.L. Gore and Associates Inc
Statistical Intervals: Confidence, Prediction, Enclosure: José G. Ramírez, PHD, W.L. Gore and Associates Inc
Statistical Intervals:
Confidence, Prediction, Enclosure
José G. Ramírez, PhD, W.L. Gore and Associates Inc.
Statistical Intervals: Confidence, Prediction, Enclosure
Table of Contents
Abstract........................................................................................................ 1
1. Coaxial Cable Manufacturing Example................................................. 1
2. Confidence Intervals.............................................................................. 2
2.1. The Meaning of “Degree of Confidence”......................................... 3
2.2. Examining Confidence Intervals....................................................... 3
3. Prediction Intervals............................................................................... 4
4. Tolerance Intervals................................................................................ 6
5. How Much Can We Trust Our Claims?................................................... 8
6. Summary................................................................................................ 9
References.................................................................................................. 10
Statistical Intervals: Confidence, Prediction, Enclosure
Abstract
Statistical intervals can be confusing, even in the minds of those who use
them often. This paper uses an example to describe the differences between
confidence intervals, prediction intervals and tolerance intervals.
Let’s say you are in charge of a coaxial cable manufacturing operation, and that
one type of coaxial cable you make has a target resistance of 50 Ohms with a
standard deviation of 2 Ohms. You recently took a random and representative
sample of 40 cables from your production process over the course of a month to
characterize the resistance measurements of the coaxial cable population. Using
JMP (Analyze > Distribution) you generate a histogram (Figure 1) along with
®
basic statistics.
The average resistance is 49.86 Ohms and the standard deviation is 1.96 Ohms.
Based on these estimates of the mean and the standard deviation, it seems that
you are meeting the target values of 50 Ohms and 2 Ohms, respectively. But
what else can we say about the resistance data?
1
Statistical Intervals: Confidence, Prediction, Enclosure
2. Confidence Intervals
1 (1)
X ± t α s (1)
1− , n −1
2
n
As you can see, the confidence interval adds a margin of error to X, the estimate
of the center of the population, which is a function of a given t distribution quantile
fidence interval adds a margin of error to⎯X, the estimate of the center of the
with degrees of freedom equal to the number of observations minus one (n-1), the
a givenoftthe
unction ofestimate distribution quantile with degrees of freedom equal to the number
standard deviation s, and the sample size n. Similarly, a confidence
of
(n−1), theinterval
estimate ofstandard
for the the standard
deviation gives lowers,and
deviation andupper
the bounds
sampleforsize n. Similarly,
the variation of a
the standard deviation gives lower and upper bounds for the variation of the
the standard deviation estimate. Yes, even the estimate of noise, s, has noise in it! standard
, even the estimate of noise, s, has noise in it!
To generate a confidence interval in JMP, click on the red arrow next to the
ce intervalResistance
in JMP, click on the
title above red arrow
the histogram next
(see to 1)
Figure theandResistance title above
select Confidence the histogram
Interval.
t Confidence Interval.
The default is 95The default
percent is 95 percent
confidence confidence
intervals for intervals
both the mean for both the mean
and standard
as shown in Figureas2.
deviation The in1-Alpha
shown Figure 2. = 0.95
The indicates
1-Alpha the 95 percent
= 0.95 indicates degree
the 95 percent of confidence
degree
of confidence of the intervals.
Figure 2. 95 Percent Confidence Intervals for the Mean and Standard Deviation.
JMP provides a nice simulator for the confidence interval for the mean that helps us
visualize and understand the interpretation of degree of confidence, as described
above. The simulation, modified to mimic the resistance situation, generates 100
samples of size 40 from a normal distribution with mean 50 Ohms and standard
deviation 2 Ohms. For each sample, a 95 percent confidence interval for the mean
is calculated, using Equation (1), and the “yield” of this process (how many intervals
out of a hundred contain the true mean of 50 Ohms) is calculated. Figure 3 shows
one of these simulations for which 96 percent of the intervals include the true
population mean of 50, in agreement with the 95 percent statement. Clicking on the
New Sample button generates a new set of intervals. By doing this repeatedly, you
will see that sometimes 94 percent of the intervals contain the true mean, others
97 percent, etc, but in the long run the average confidence turns out to be 95%.
3
Statistical Intervals: Confidence, Prediction, Enclosure
JMP provides a nice simulator for the confidence interval for the mean that helps us visualize and understand
the interpretation of degree of confidence, as described above. The simulation, modified to mimic the resistance
situation, generates 100 samples of size 40 from a normal distribution with mean 50 Ohms and standard
deviation 2 Ohms. For each sample, a 95 percent confidence interval for the mean is calculated, using equation
3. Prediction Intervals
(1), and the “yield” of this process (how many intervals out of a hundred contain the true mean of 50 Ohms) is
calculated. Figure 3 shows one of these simulations for which 96 percent of the intervals include the true
population mean of 50, in agreement with the 95 percent statement. Clicking on the New Sample button
generates a new set of intervals. By doing this repeatedly you will see that sometimes 94 percent of the intervals
contain the true mean, others 97 percent, etc.
What if you want to make a claim about the resistance of a future cable, or the
Figure 3. Confidence Interval for The Mean Simulation.
average resistance of a group of cables that you are going to manufacture in the
future? The confidence intervals for the mean and standard deviation (Figure 2) that
you calculated refer to the population of cables manufactured during the month
in which the 40 cables were sampled, not to an individual observation or group
3. Prediction Intervals
of observations in the future. A prediction interval for a single future observation
cables that you are goingresembles a confidence interval for the mean, but it is wider because it takes into
What if you want to make a claim about the resistance of a future cable, or the average resistance of a group of
to manufacture in the future? The confidence intervals for the mean and standard
deviation (Figure 2) that you calculated refer to the population of cables manufactured during the month in
account the prediction noise by adding a 1 to the expression inside the square root:
which the 40 cables were sampled, not to an individual observation or group of observations in the future. A
prediction interval for a single future observation resembles a confidence interval for the mean, but it is wider
because it takes into account the prediction noise by adding a 1 to the expression inside the square root:
1 (2)
X ± t α s 1+
1− , n −1
2 n
Clicking again on the red arrow next to Resistance and selecting Prediction Interval allows you to generate a 95
percent prediction interval for one future observation (the default), as shown in Figure 4.
Clicking again on the red arrow next to Resistance and selecting Prediction Interval
Figure 4. 95 Percent Prediction Interval for One Future Observation.
allows you to generate a 95 percent prediction interval for one future observation
(the default), as shown in Figure 4.
The 95 percent lower prediction bound is 45.84 Ohms, and the 95 percent upper
prediction bound is 53.88 Ohms. The claim you can make is that, with 95 percent
confidence, you expect a future coax cable to have a resistance between 45.84
Ohms and 53.88 Ohms. Note that since the prediction is for just 1 observation the
“Individual” and “Mean” entries agree.
4
Statistical Intervals: Confidence, Prediction, Enclosure
Let’s say you get an order for a batch of 10 cables. What can we tell the customer to
expect? In other words,
You expect the resistance values of ALL 10 future cables to be between 43.95
Ohms and 55.77 Ohms. This is a simultaneous interval in the sense that the bounds
are derived so they contain the resistance values of ALL 10 future cables with the
specified confidence, 95% in this case. Because of this, the interval is wider than
the prediction interval for 1 future observation. If you are interested in a claim not
about the 10 individual cables but the performance of the batch, then you expect
the average resistance of the batch of 10 future cables to fall within 48.46 Ohms and
51.26 Ohms, and the standard deviation to be within 1.05 Ohms and 3.08 Ohms.
Note that both the prediction intervals for the mean and the standard deviation of
10 future observations contain the targets of 50 Ohms and 2 Ohms.
What about a batch of 100 or 1000 cables? What about claims about the capability
of your process? Although we can construct simultaneous prediction intervals for
100, 200,…, 1000 observations, as the number of future observations increases so
does the width of the simultaneous prediction interval. Fortunately, there is another
type of interval that helps us make claims about large batches or the capability of
our process.
5
Statistical Intervals: Confidence, Prediction, Enclosure
4. Tolerance Intervals
X ± g (1−α / 2, p , n ) s (3)
rcent 3sigma-equivalent tolerance interval is then given by the equation X g( , 0.9973, n) s.
A 95 percent ±3sigma-equivalent tolerance interval is then given by the equation
at the 0.975 refers to the confidence, while the 0.9973 (3sigma equivalent) refers to the proportion of the
on covered by the tolerance bounds. These two values make the tolerance interval a little confusing.
g on the red arrow next to the Resistance histogram and selecting Tolerance Interval brings up the
X ± g(0.975, 0.9973, n) x s. Note that the 0.975 refers to the confidence, while the
ce Intervals window to generate a 95 percent (Specify Confidence (1-Alpha)) tolerance interval to
99.73 percent (Specify Proportion to cover) of the resistance population. Figure 6 shows a 95 percent
0.9973 (3sigma equivalent) refers to the proportion of the population covered by the
e interval that covers 99.73 percent of the resistance data population.
tolerance bounds. These two values make the tolerance interval a little confusing.
6. 95 Percent Tolerance Interval to Cover 99.73 Percent of the Resistance Population.
Clicking on the red arrow next to the Resistance histogram and selecting Tolerance
Interval brings up the Tolerance Intervals window to generate a 95 percent (Specify
Confidence (1-Alpha)) tolerance interval to enclose 99.73 percent (Specify
Proportion to cover) of the resistance population. Figure 6 shows a 95 percent
tolerance interval that covers 99.73 percent of the resistance data population.
pect 99.73 percent of your resistance measurements to be within 42.52 Ohms and 57.20 Ohms. The
nal” mean 3sigma interval (= 49.86 3 1.96) [43.98; 55.74] is narrower because it does not take into
the sample size (40) and the estimation noise. Based on these tolerance bounds you can develop
ation limits for Resistance as [42 Ohms; 58 Ohms], which will contain > 99.73 percent of the resistance
on measurements.
Summary
al intervals help us to quantify the uncertainty surrounding the estimates that we calculate from our
ch as the mean and standard deviation. The three types of intervals presented here – confidence,
on and tolerance – are particularly relevant for applications found in science and engineering because
ow us to make very practical claims about our sampled data, as shown in Table 1. Because JMP makes it
us to obtain these intervals, all we have to do is apply them correctly.
6
Statistical Intervals: Confidence, Prediction, Enclosure
You expect 99.73 percent of your resistance measurements to be within 42.52 Ohms
and 57.20 Ohms. The “traditional” mean ± 3sigma interval (= 49.86 ± 3x1.96) [43.98;
55.74] is narrower because it does not take into account the sample size (40) and the
estimation noise. Based on these tolerance bounds you can develop specification
limits for Resistance as [42 Ohms; 58 Ohms], which will contain > 99.73 percent of
the resistance population measurements.
From the summary in Table 1, we can see that, with 95% confidence, the
corresponding tolerance interval to enclose 99.73% of the population is narrower
than the simultaneous prediction intervals for 100 and a 1000 observations. As
the number of future observations increases so will the width of the simultaneous
prediction interval. Therefore, for batch sizes greater than 50, it is better to use a
tolerance interval than a simultaneous prediction interval.
7
Statistical Intervals: Confidence, Prediction, Enclosure
Our calculated statistical intervals represent the state of our process at a given time;
i.e., they are snapshots in time of what the process is doing. How do we guarantee
the claims that we are going to make based on the statistical intervals? Statistical
analyses usually depend on the homogeneity assumption that the data comes
from a single universe rather than multiple ones. Another way of thinking about the
homogeneity assumption is by asking the question: how stable is the “process” that
generates our data?
The Lower Control Limit (LCL), 43.48 Ohms, and Upper Control Limit (UCL),
56.24 Ohms define the natural process variation for the resistance data. So as long
as the process remains stable we expect future resistance readings to be within
43.48 Ohms and 56.24 Ohms. Sounds like a tolerance interval, doesn’t it? In fact,
the natural process limits can be thought of as a 95% tolerance interval but, since
they are calculated using ±3sigma and not Equation 3, the coverage is usually less
than 99.73%. For the LCL and UCL displayed in Figure 7 the tolerance interval
coverage is about 98.38%.
8
Statistical Intervals: Confidence, Prediction, Enclosure
6. Summary
9
Statistical Intervals: Confidence, Prediction, Enclosure
References
Ramírez, José G., and Ramírez, Brenda S. To be published 2008. Analyzing and Interpreting
Continuous Data Using JMP: A Step-by-Step Guide. Cary, NC: SAS Institute Inc.
Hahn, Gerald J., and William Q. Meeker. 1991. Statistical Intervals: A Guide for Practitioners.
Hoboken, NJ: John Wiley and Sons, Inc.
Wheeler, Donald J. (2005). The Six Sigma Practitioner’s Guide to Data Analysis. SPC Press,
Knoxville, TN.
10
SAS Institute Inc. World Headquarters +1 919 677 8000 To contact your local JMP office, please visit www.jmp.com
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. Copyright © 2009, SAS Institute Inc. All rights reserved. 103423_530537.0209