E Handbook of Statistical Methods (NIST SEMATECH)
E Handbook of Statistical Methods (NIST SEMATECH)
E Handbook of Statistical Methods (NIST SEMATECH)
1
- 0. 3198 0. 1202
6.4.4.9. Example of Univariate Box-Jenkins Analysis
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc449.htm[4/17/2013 7:12:44 PM]
2
0. 1797 0. 1202
5 = 51. 1286
Resi dual st andar d devi at i on = 10. 9599
Test r andomness of r esi dual s:
St andar di zed Runs St at i st i c Z = 0. 4887, p- val ue =
0. 625
Forecasting Using our AR(2) model, we forcast values six time periods
into the future.
Per i od Pr edi ct i on St andar d Er r or
71 60. 6405 10. 9479
72 43. 0317 11. 4941
73 55. 4274 11. 9015
74 48. 2987 12. 0108
75 52. 8061 12. 0585
76 50. 0835 12. 0751
The "historical" data and forecasted values (with 90 %
confidence limits) are shown in the graph below.
6.4.4.10. Box-Jenkins Analysis on Seasonal Data
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc44a.htm[4/17/2013 7:12:45 PM]
6. Process or Product Monitoring and Control
6.4. Introduction to Time Series Analysis
6.4.4. Univariate Time Series Models
6.4.4.10. Box-Jenkins Analysis of Seasonal Data
Series G This example illustrates a Box-Jenkins time series analysis for
seasonal data using the series G data set in Box, Jenkins, and
Reinsel, 1994. A plot of the 144 observations is shown below.
Non-constant variance can be removed by performing a
natural log transformation.
6.4.4.10. Box-Jenkins Analysis on Seasonal Data
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc44a.htm[4/17/2013 7:12:45 PM]
Next, we remove trend in the series by taking first differences.
The resulting series is shown below.
Analyzing
Autocorrelation
Plot for
Seasonality
To identify an appropriate model, we plot the ACF of the time
series.
6.4.4.10. Box-Jenkins Analysis on Seasonal Data
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc44a.htm[4/17/2013 7:12:45 PM]
If very large autocorrelations are observed at lags spaced n
periods apart, for example at lags 12 and 24, then there is
evidence of periodicity. That effect should be removed, since
the objective of the identification stage is to reduce the
autocorrelation throughout. So if simple differencing is not
enough, try seasonal differencing at a selected period, such as
4, 6, or 12. In our example, the seasonal period is 12.
A plot of Series G after taking the natural log, first
differencing, and seasonal differencing is shown below.
6.4.4.10. Box-Jenkins Analysis on Seasonal Data
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc44a.htm[4/17/2013 7:12:45 PM]
The number of seasonal terms is rarely more than one. If you
know the shape of your forecast function, or you wish to
assign a particular shape to the forecast function, you can
select the appropriate number of terms for seasonal AR or
seasonal MA models.
The book by Box and Jenkins, Time Series Analysis
Forecasting and Control (the later edition is Box, Jenkins and
Reinsel, 1994) has a discussion on these forecast functions on
pages 326 - 328. Again, if you have only a faint notion, but
you do know that there was a trend upwards before
differencing, pick a seasonal MA term and see what comes
out in the diagnostics.
An ACF plot of the seasonal and first differenced natural log
of series G is shown below.
6.4.4.10. Box-Jenkins Analysis on Seasonal Data
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc44a.htm[4/17/2013 7:12:45 PM]
The plot has a few spikes, but most autocorrelations are near
zero, indicating that a seasonal MA(1) model is appropriate.
Model Fitting We fit a seasonal MA(1) model to the data
where 0
1
represents the MA(1) parameter and ,
1
represents
the seasonal parameter. The model fitting results are shown
below.
Seasonal
Est i mat e MA( 1) MA( 1)
- - - - - - - - - - - - - - - - - - - - - -
Par amet er - 0. 4018 - 0. 5569
St andar d Er r or 0. 0896 0. 0731
Resi dual st andar d devi at i on = 0. 0367
Log l i kel i hood = 244. 7
AI C = - 483. 4
Test the randomness of the residuals up to 30 lags using the
Box-Ljung test. Recall that the degrees of freedom for the
critical region must be adjusted to account for two estimated
parameters.
H
0
: The r esi dual s ar e r andom.
H
a
: The r esi dual s ar e not r andom.
Test st at i st i c: Q = 29. 4935
Si gni f i cance l evel : = 0. 05
Degr ees of f r eedom: h = 30 - 2 = 28
Cr i t i cal val ue: X
2
1- n, h
= 41. 3371
Cr i t i cal r egi on: Rej ect H
0
i f Q > 41. 3371
6.4.4.10. Box-Jenkins Analysis on Seasonal Data
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc44a.htm[4/17/2013 7:12:45 PM]
Since the null hypothesis of the Box-Ljung test is not rejected
we conclude that the fitted model is adequate.
Forecasting Using our seasonal MA(1) model, we forcast values 12
periods into the future and compute 90 % confidence limits.
Lower Upper
Per i od Li mi t For ecast Li mi t
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
145 424. 0234 450. 7261 478. 4649
146 396. 7861 426. 0042 456. 7577
147 442. 5731 479. 3298 518. 4399
148 451. 3902 492. 7365 537. 1454
149 463. 3034 509. 3982 559. 3245
150 527. 3754 583. 7383 645. 2544
151 601. 9371 670. 4625 745. 7830
152 595. 7602 667. 5274 746. 9323
153 495. 7137 558. 5657 628. 5389
154 439. 1900 497. 5430 562. 8899
155 377. 7598 430. 1618 489. 1730
156 417. 3149 477. 5643 545. 7760
All the anlayses in this page can be generated using R code.
6.4.5. Multivariate Time Series Models
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc45.htm[4/17/2013 7:12:46 PM]
6. Process or Product Monitoring and Control
6.4. Introduction to Time Series Analysis
6.4.5. Multivariate Time Series Models
If each time
series
observation
is a vector
of numbers,
you can
model them
using a
multivariate
form of the
Box-Jenkins
model
The multivariate form of the Box-Jenkins univariate models
is sometimes called the ARMAV model, for AutoRegressive
Moving Average Vector or simply vector ARMA process.
The ARMAV model for a stationary multivariate time series,
with a zero mean vector, represented by
is of the form
where
x
t
and a
t
are n x 1 column vectors with a
t
representing
multivariate white noise
are n x n matrices for autoregressive and moving
average parameters
E[a
t
] = 0
where
a
is the dispersion or covariance matrix of a
t
As an example, for a bivariate series with n = 2, p = 2, and q
= 1, the ARMAV(2,1) model is:
6.4.5. Multivariate Time Series Models
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc45.htm[4/17/2013 7:12:46 PM]
with
Estimation
of
parameters
and
covariance
matrix
difficult
The estimation of the matrix parameters and covariance
matrix is complicated and very difficult without computer
software. The estimation of the Moving Average matrices is
especially an ordeal. If we opt to ignore the MA
component(s) we are left with the ARV model given by:
where
x
t
is a vector of observations, x
1t
, x
2t
, ... , x
nt
at time t
a
t
is a vector of white noise, a
1t
, a
2t
, ... , a
nt
at time t
is a n x n matrix of autoregressive parameters
E[a
t
] = 0
where L
a
is the dispersion or covariance matrix
A model with p autoregressive matrix parameters is an
ARV(p) model or a vector AR model.
The parameter matrices may be estimated by multivariate
least squares, but there are other methods such as maximium
likelihood estimation.
Interesting
properties
of
parameter
matrices
There are a few interesting properties associated with the phi
or AR parameter matrices. Consider the following example
for a bivariate series with n =2, p = 2, and q = 0. The
ARMAV(2,0) model is:
6.4.5. Multivariate Time Series Models
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc45.htm[4/17/2013 7:12:46 PM]
Without loss of generality, assume that the X series is input and the Y series
are output and that the mean vector = (0,0).
Therefore, tranform the observation by subtracting their respective averages.
Diagonal
terms of
Phi matrix
The diagonal terms of each Phi matrix are the scalar estimates for each
series, in this case:
1.11
,
2.11
for the input series X,
1.22
, .
2.22
for the output series Y.
Transfer
mechanism
The lower off-diagonal elements represent the influence of the input on the
output.
This is called the "transfer" mechanism or transfer-function model as
discussed by Box and Jenkins in Chapter 11. The terms here correspond to
their terms.
The upper off-diagonal terms represent the influence of the output on the
input.
Feedback This is called "feedback". The presence of feedback can also be seen as a
high value for a coefficient in the correlation matrix of the residuals. A "true"
transfer model exists when there is no feedback.
This can be seen by expressing the matrix form into scalar form:
Delay Finally, delay or "dead' time can be measured by studying the lower off-
diagonal elements again.
If, for example,
1.21
is non-significant, the delay is 1 time period.
6.4.5.1. Example of Multivariate Time Series Analysis
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc451.htm[4/17/2013 7:12:47 PM]
6. Process or Product Monitoring and Control
6.4. Introduction to Time Series Analysis
6.4.5. Multivariate Time Series Models
6.4.5.1. Example of Multivariate Time Series Analysis
Bivariate
Gas
Furance
Example
The gas furnace data from Box, Jenkins, and Reinsel, 1994 is used to
illustrate the analysis of a bivariate time series. Inside the gas furnace, air and
methane were combined in order to obtain a mixture of gases containing
CO
2
(carbon dioxide). The input series is the methane gas feedrate described
by
Methane Gas Input Feed = 0.60 - 0.04 X(t)
the CO
2
concentration was the output series, Y(t). In this experiment 296
successive pairs of observations (X
t,
Y
t
) were collected from continuous
records at 9-second intervals. For the analysis described here, only the first
60 pairs were used. We fit an ARV(2) model as described in 6.4.5.
Plots of
input and
output
series
The plots of the input and output series are displayed below.
6.4.5.1. Example of Multivariate Time Series Analysis
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc451.htm[4/17/2013 7:12:47 PM]
Model
Fitting
The scalar form of the ARV(2) model is the following.
The equation for x
t
corresponds to gas rate while the equation for y
t
corresponds to CO
2
concentration.
The parameter estimates for the equation associated with gas rate are the
following.
Est i mat e St d. Er r . t val ue Pr ( >| t| )
a
1t
0. 003063 0. 035769 0. 086 0. 932
1. 11
1. 683225 0. 123128 13. 671 < 2e- 16
2. 11
- 0. 860205 0. 165886 - 5. 186 3. 44e- 06
1. 12
- 0. 076224 0. 096947 - 0. 786 0. 435
2. 12
0. 044774 0. 082285 0. 544 0. 589
Resi dual st andar d er r or : 0. 2654 based on 53 degr ees of f r eedom
Mul t i pl e R- Squar ed: 0. 9387
Adj ust ed R- squar ed: 0. 9341
F- st at i st i c: 203. 1 based on 4 and 53 degr ees of f r eedom
p- val ue: < 2. 2e- 16
The parameter estimates for the equation associated with CO
2
concentration
are the following.
Est i mat e St d. Er r . t val ue Pr ( >| t| )
a
2t
- 0. 03372 0. 01615 - 2. 088 0. 041641
1. 22
1. 22630 0. 04378 28. 013 < 2e- 16
2. 22
- 0. 40927 0. 03716 - 11. 015 2. 57e- 15
0. 22898 0. 05560 4. 118 0. 000134
6.4.5.1. Example of Multivariate Time Series Analysis
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc451.htm[4/17/2013 7:12:47 PM]
1. 21
2. 21
- 0. 80532 0. 07491 - 10. 751 6. 29e- 15
Resi dual st andar d er r or : 0. 1198 based on 53 degr ees of f r eedom
Mul t i pl e R- Squar ed: 0. 9985
Adj ust ed R- squar ed: 0. 9984
F- st at i st i c: 8978 based on 4 and 53 degr ees of f r eedom
p- val ue: < 2. 2e- 16
Box-Ljung tests performed for each series to test the randomness of the first
24 residuals were not significant. The p-values for the tests using CO
2
concentration residuals and gas rate residuals were 0.4 and 0.6, respectively.
Forecasting The forecasting method is an extension of the model and follows the theory
outlined in the previous section. The forecasted values of the next six
observations (61-66) and the associated 90 % confidence limits are shown
below for each series.
90% Lower Concent r at i on 90% Upper
Obser vat i on Li mi t For ecast Li mi t
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
61 51. 0 51. 2 51. 4
62 51. 0 51. 3 51. 6
63 50. 6 51. 0 51. 4
64 49. 8 50. 5 51. 1
65 48. 7 50. 0 51. 3
66 47. 6 49. 7 51. 8
90% Lower Rat e 90% Upper
Obser vat i on Li mi t For ecast Li mi t
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
61 0. 795 1. 231 1. 668
62 0. 439 1. 295 2. 150
63 0. 032 1. 242 2. 452
64 - 0. 332 1. 128 2. 588
65 - 0. 605 1. 005 2. 614
66 - 0. 776 0. 908 2. 593
6.5. Tutorials
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc5.htm[4/17/2013 7:12:48 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
Tutorial
contents
1. What do we mean by "Normal" data?
2. What do we do when data are "Non-normal"?
3. Elements of Matrix Algebra
1. Numerical Examples
2. Determinant and Eigenstructure
4. Elements of Multivariate Analysis
1. Mean vector and Covariance Matrix
2. The Multivariate Normal Distribution
3. Hotelling's T
2
1. Example of Hotelling's T
2
Test
2. Example 1 (continued)
3. Example 2 (multiple groups)
4. Hotelling's T
2
Chart
5. Principal Components
1. Properties of Principal Components
2. Numerical Example
6.5.1. What do we mean by "Normal" data?
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc51.htm[4/17/2013 7:12:48 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.1. What do we mean by "Normal" data?
The Normal
distribution
model
"Normal" data are data that are drawn (come from) a
population that has a normal distribution. This distribution is
inarguably the most important and the most frequently used
distribution in both the theory and application of statistics. If
X is a normal random variable, then the probability
distribution of X is
Normal
probability
distribution
Parameters
of normal
distribution
The parameters of the normal distribution are the mean and
the standard deviation (or the variance
2
). A special
notation is employed to indicate that X is normally distributed
with these parameters, namely
X ~ N( , ) or X ~ N( ,
2
).
Shape is
symmetric
and unimodal
The shape of the normal distribution is symmetric and
unimodal. It is called the bell-shaped or Gaussian
distribution after its inventor, Gauss (although De Moivre
also deserves credit).
The visual appearance is given below.
6.5.1. What do we mean by "Normal" data?
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc51.htm[4/17/2013 7:12:48 PM]
Property of
probability
distributions
is that area
under curve
equals one
A property of a special class of non-negative functions,
called probability distributions, is that the area under the
curve equals unity. One finds the area under any portion of
the curve by integrating the distribution between the specified
limits. The area under the bell-shaped curve of the normal
distribution can be shown to be equal to 1, and therefore the
normal distribution is a probability distribution.
Interpretation
of
There is a simple interpretation of
68.27% of the population fall between +/- 1
95.45% of the population fall between +/- 2
99.73% of the population fall between +/- 3
The
cumulative
normal
distribution
The cumulative normal distribution is defined as the
probability that the normal variate is less than or equal to
some value v, or
Unfortunately this integral cannot be evaluated in closed
form and one has to resort to numerical methods. But even
so, tables for all possible values of and would be
required. A change of variables rescues the situation. We let
Now the evaluation can be made independently of and ;
that is,
where (.) is the cumulative distribution function of the
standard normal distribution ( = 0, = 1).
Tables for the
cumulative
standard
normal
distribution
Tables of the cumulative standard normal distribution are
given in every statistics textbook and in the handbook. A rich
variety of approximations can be found in the literature on
numerical methods.
For example, if = 0 and = 1 then the area under the curve
from - 1 to + 1 is the area from 0 - 1 to 0 + 1, which
is 0.6827. Since most standard normal tables give area to the
left of the lookup value, they will have for z = 1 an area of
6.5.1. What do we mean by "Normal" data?
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc51.htm[4/17/2013 7:12:48 PM]
.8413 and for z = -1 an area of .1587. By subtraction we
obtain the area between -1 and +1 to be .8413 - .1587 =
.6826.
6.5.2. What do we do when data are non-normal
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc52.htm[4/17/2013 7:12:49 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.2. What to do when data are non-normal
Often it is
possible to
transform non-
normal data
into
approximately
normal data
Non-normality is a way of life, since no characteristic (height,
weight, etc.) will have exactly a normal distribution. One
strategy to make non-normal data resemble normal data is by
using a transformation. There is no dearth of transformations in
statistics; the issue is which one to select for the situation at
hand. Unfortunately, the choice of the "best" transformation is
generally not obvious.
This was recognized in 1964 by G.E.P. Box and D.R. Cox. They
wrote a paper in which a useful family of power transformations
was suggested. These transformations are defined only for
positive data values. This should not pose any problem because
a constant can always be added if the set of observations
contains one or more negative values.
The Box-Cox power transformations are given by
The Box-Cox
Transformation
Given the vector of data observations x = x
1
, x
2
, ...x
n
, one way
to select the power is to use the that maximizes the
logarithm of the likelihood function
The logarithm
of the
likelihood
function
where
is the arithmetic mean of the transformed data.
Confidence
bound for
In addition, a confidence bound (based on the likelihood ratio
statistic) can be constructed for as follows: A set of values
that represent an approximate 100(1- )% confidence bound for
is formed from those that satisfy
6.5.2. What do we do when data are non-normal
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc52.htm[4/17/2013 7:12:49 PM]
where denotes the maximum likelihood estimator for and
X
2
1-o, 1
is the 100(1- ) percentile of the chi-square distribution
with 1 degree of freedom.
Example of the
Box-Cox
scheme
To illustrate the procedure, we used the data from Johnson and
Wichern's textbook (Prentice Hall 1988), Example 4.14. The
observations are microwave radiation measurements.
Sample data
.15 .09 .18 .10 .05 .12 .08
.05 .08 .10 .07 .02 .01 .10
.10 .10 .02 .10 .01 .40 .10
.05 .03 .05 .15 .10 .15 .09
.08 .18 .10 .20 .11 .30 .02
.20 .20 .30 .30 .40 .30 .05
Table of log-
likelihood
values for
various values
of
The values of the log-likelihood function obtained by varying
from -2.0 to 2.0 are given below.
LLF LLF LLF
-2.0 7.1146 -0.6 89.0587 0.7 103.0322
-1.9 14.1877 -0.5 92.7855 0.8 101.3254
-1.8 21.1356 -0.4 96.0974 0.9 99.3403
-1.7 27.9468 -0.3 98.9722 1.0 97.1030
-1.6 34.6082 -0.2 101.3923 1.1 94.6372
-1.5 41.1054 -0.1 103.3457 1.2 91.9643
-1.4 47.4229 0.0 104.8276 1.3 89.1034
-1.3 53.5432 0.1 105.8406 1.4 86.0714
1.2 59.4474 0.2 106.3947 1.5 82.8832
-1.1 65.1147 0.3 106.5069 1.6 79.5521
-0.9 75.6471 0.4 106.1994 1.7 76.0896
-0.8 80.4625 0.5 105.4985 1.8 72.5061
-0.7 84.9421 0.6 104.4330 1.9 68.8106
This table shows that = .3 maximizes the log-likelihood
function (LLF). This becomes 0.28 if a second digit of accuracy
is calculated.
The Box-Cox transform is also discussed in Chapter 1 under the
Box Cox Linearity Plot and the Box Cox Normality Plot. The
Box-Cox normality plot discussion provides a graphical method
for choosing to transform a data set to normality. The criterion
used to choose for the Box-Cox linearity plot is the value of
that maximizes the correlation between the transformed x-values
and the y-values when making a normal probability plot of the
6.5.2. What do we do when data are non-normal
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc52.htm[4/17/2013 7:12:49 PM]
(transformed) data.
6.5.3. Elements of Matrix Algebra
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc53.htm[4/17/2013 7:12:50 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.3. Elements of Matrix Algebra
Elementary Matrix Algebra
Basic
definitions
and
operations of
matrix
algebra -
needed for
multivariate
analysis
Vectors and matrices are arrays of numbers. The algebra
for symbolic operations on them is different from the
algebra for operations on scalars, or single numbers. For
example there is no division in matrix algebra, although
there is an operation called "multiplying by an inverse". It
is possible to express the exact equivalent of matrix algebra
equations in terms of scalar algebra expressions, but the
results look rather messy.
It can be said that the matrix algebra notation is shorthand
for the corresponding scalar longhand.
Vectors A vector is a column of numbers
The scalars a
i
are the elements of vector a.
Transpose The transpose of a, denoted by a', is the row arrangement
of the elements of a.
Sum of two
vectors
The sum of two vectors (say, a and b) is the vector of sums
of corresponding elements.
The difference of two vectors is the vector of differences of
corresponding elements.
6.5.3. Elements of Matrix Algebra
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc53.htm[4/17/2013 7:12:50 PM]
Product of
a'b
The product a'b is a scalar formed by
which may be written in shortcut notation as
where a
i
and b
i
are the ith elements of vector a and b,
respectively.
Product of
ab'
The product ab' is a square matrix
Product of
scalar times a
vector
The product of a scalar k, times a vector a is k times each
element of a
A matrix is a
rectangular
table of
numbers
A matrix is a rectangular table of numbers, with p rows and
n columns. It is also referred to as an array of n column
vectors of length p. Thus
is a p by n matrix. The typical element of A is a
ij
, denoting
the element of row i and column j.
Matrix
addition and
Matrices are added and subtracted on an element-by-
element basis. Thus
6.5.3. Elements of Matrix Algebra
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc53.htm[4/17/2013 7:12:50 PM]
subtraction
Matrix
multiplication
Matrix multiplication involves the computation of the sum
of the products of elements from a row of the first matrix
(the premultiplier on the left) and a column of the second
matrix (the postmultiplier on the right). This sum of
products is computed for every combination of rows and
columns. For example, if A is a 2 x 3 matrix and B is a 3 x
2 matrix, the product AB is
Thus, the product is a 2 x 2 matrix. This came about as
follows: The number of columns of A must be equal to the
number of rows of B. In this case this is 3. If they are not
equal, multiplication is impossible. If they are equal, then
the number of rows of the product AB is equal to the
number of rows of A and the number of columns is equal to
the number of columns of B.
Example of
3x2 matrix
multiplied by
a 2x3
It follows that the result of the product BA is a 3 x 3 matrix
General case
for matrix
multiplication
In general, if A is a k x p matrix and B is a p x n matrix, the
product AB is a k x n matrix. If k = n, then the product BA
can also be formed. We say that matrices conform for the
operations of addition, subtraction or multiplication when
their respective orders (numbers of row and columns) are
such as to permit the operations. Matrices that do not
conform for addition or subtraction cannot be added or
subtracted. Matrices that do not conform for multiplication
cannot be multiplied.
6.5.3.1. Numerical Examples
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc531.htm[4/17/2013 7:12:51 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.3. Elements of Matrix Algebra
6.5.3.1. Numerical Examples
Numerical
examples of
matrix
operations
Numerical examples of the matrix operations described on
the previous page are given here to clarify these operations.
Sample
matrices
If
then
Matrix
addition,
subtraction,
and
multipication
and
Multiply
matrix by a
scalar
To multiply a a matrix by a given scalar, each element of
the matrix is multiplied by that scalar
Pre-
multiplying
matrix by
transpose of
a vector
Pre-multiplying a p x n matrix by the transpose of a p-
element vector yields a n-element transpose
6.5.3.1. Numerical Examples
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc531.htm[4/17/2013 7:12:51 PM]
Post-
multiplying
matrix by
vector
Post-multiplying a p x n matrix by an n-element vector
yields an n-element vector
Quadratic
form
It is not possible to pre-multiply a matrix by a column
vector, nor to post-multiply a matrix by a row vector. The
matrix product a'Ba yields a scalar and is called a quadratic
form. Note that B must be a square matrix if a'Ba is to
conform to multiplication. Here is an example of a quadratic
form
Inverting a
matrix
The matrix analog of division involves an operation called
inverting a matrix. Only square matrices can be inverted.
Inversion is a tedious numerical procedure and it is best
performed by computers. There are many ways to invert a
matrix, but ultimately whichever method is selected by a
program is immaterial. If you wish to try one method by
hand, a very popular numerical method is the Gauss-Jordan
method.
Identity
matrix
To augment the notion of the inverse of a matrix, A
-1
(A
inverse) we notice the following relation
A
-1
A = A A
-1
= I
I is a matrix of form
I is called the identity matrix and is a special case of a
diagonal matrix. Any matrix that has zeros in all of the off-
diagonal positions is a diagonal matrix.
6.5.3.1. Numerical Examples
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc531.htm[4/17/2013 7:12:51 PM]
6.5.3.2. Determinant and Eigenstructure
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc532.htm[4/17/2013 7:12:52 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.3. Elements of Matrix Algebra
6.5.3.2. Determinant and Eigenstructure
A matrix
determinant is
difficult to
define but a
very useful
number
Unfortunately, not every square matrix has an inverse
(although most do). Associated with any square matrix is a
single number that represents a unique function of the
numbers in the matrix. This scalar function of a square
matrix is called the determinant. The determinant of a
matrix A is denoted by |A|. A formal definition for the
deteterminant of a square matrix A = (a
ij
) is somewhat
beyond the scope of this Handbook. Consult any good
linear algebra textbook if you are interested in the
mathematical details.
Singular
matrix
As is the case of inversion of a square matrix, calculation
of the determinant is tedious and computer assistance is
needed for practical calculations. If the determinant of the
(square) matrix is exactly zero, the matrix is said to be
singular and it has no inverse.
Determinant
of variance-
covariance
matrix
Of great interest in statistics is the determinant of a square
symmetric matrix D whose diagonal elements are sample
variances and whose off-diagonal elements are sample
covariances. Symmetry means that the matrix and its
transpose are identical (i.e., A = A'). An example is
where s
1
and s
2
are sample standard deviations and r
ij
is
the sample correlation.
D is the sample variance-covariance matrix for
observations of a multivariate vector of p elements. The
determinant of D, in this case, is sometimes called the
generalized variance.
Characteristic
equation
In addition to a determinant and possibly an inverse, every
square matrix has associated with it a characteristic
equation. The characteristic equation of a matrix is formed
6.5.3.2. Determinant and Eigenstructure
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc532.htm[4/17/2013 7:12:52 PM]
by subtracting some particular value, usually denoted by
the greek letter (lambda), from each diagonal element of
the matrix, such that the determinant of the resulting
matrix is equal to zero. For example, the characteristic
equation of a second order (2 x 2) matrix A may be
written as
Definition of
the
characteristic
equation for
2x2 matrix
Eigenvalues of
a matrix
For a matrix of order p, there may be as many as p
different values for that will satisfy the equation. These
different values are called the eigenvalues of the matrix.
Eigenvectors
of a matrix
Associated with each eigenvalue is a vector, v, called the
eigenvector. The eigenvector satisfies the equation
Av = v
Eigenstructure
of a matrix
If the complete set of eigenvalues is arranged in the
diagonal positions of a diagonal matrix V, the following
relationship holds
AV = VL
This equation specifies the complete eigenstructure of A.
Eigenstructures and the associated theory figure heavily in
multivariate procedures and the numerical evaluation of L
and V is a central computing problem.
6.5.4. Elements of Multivariate Analysis
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc54.htm[4/17/2013 7:12:52 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.4. Elements of Multivariate Analysis
Multivariate
analysis
Multivariate analysis is a branch of statistics concerned
with the analysis of multiple measurements, made on one or
several samples of individuals. For example, we may wish
to measure length, width and weight of a product.
Multiple
measurement,
or
observation,
as row or
column
vector
A multiple measurement or observation may be expressed
as
x = [4 2 0.6]
referring to the physical properties of length, width and
weight, respectively. It is customary to denote multivariate
quantities with bold letters. The collection of measurements
on x is called a vector. In this case it is a row vector. We
could have written x as a column vector.
Matrix to
represent
more than
one multiple
measurement
If we take several such measurements, we record them in a
rectangular array of numbers. For example, the X matrix
below represents 5 observations, on each of three variables.
By
convention,
rows
typically
represent
In this case the number of rows, (n = 5), is the number of
observations, and the number of columns, (p = 3), is the
number of variables that are measured. The rectangular
array is an assembly of n row vectors of length p. This array
is called a matrix, or, more specifically, a n by p matrix. Its
6.5.4. Elements of Multivariate Analysis
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc54.htm[4/17/2013 7:12:52 PM]
observations
and columns
represent
variables
name is X. The names of matrices are usually written in
bold, uppercase letters, as in Section 6.5.3. We could just as
well have written X as a p (variables) by n (measurements)
matrix as follows:
Definition of
Transpose
A matrix with rows and columns exchanged in this manner
is called the transpose of the original matrix.
6.5.4.1. Mean Vector and Covariance Matrix
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc541.htm[4/17/2013 7:12:53 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.4. Elements of Multivariate Analysis
6.5.4.1. Mean Vector and Covariance Matrix
The first step in analyzing multivariate data is computing the
mean vector and the variance-covariance matrix.
Sample
data
matrix
Consider the following matrix:
The set of 5 observations, measuring 3 variables, can be
described by its mean vector and variance-covariance matrix.
The three variables, from left to right are length, width, and
height of a certain object, for example. Each row vector X
i
is
another observation of the three variables (or components).
Definition
of mean
vector and
variance-
covariance
matrix
The mean vector consists of the means of each variable and
the variance-covariance matrix consists of the variances of the
variables along the main diagonal and the covariances between
each pair of variables in the other matrix positions.
The formula for computing the covariance of the variables X
and Y is
with and denoting the means of X and Y, respectively.
Mean
vector and
variance-
covariance
matrix for
sample
data
matrix
The results are:
6.5.4.1. Mean Vector and Covariance Matrix
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc541.htm[4/17/2013 7:12:53 PM]
where the mean vector contains the arithmetic averages of the
three variables and the (unbiased) variance-covariance matrix
S is calculated by
where n = 5 for this example.
Thus, 0.025 is the variance of the length variable, 0.0075 is the
covariance between the length and the width variables,
0.00175 is the covariance between the length and the height
variables, 0.007 is the variance of the width variable, 0.00135
is the covariance between the width and height variables and
.00043 is the variance of the height variable.
Centroid,
dispersion
matix
The mean vector is often referred to as the centroid and the
variance-covariance matrix as the dispersion or dispersion
matrix. Also, the terms variance-covariance matrix and
covariance matrix are used interchangeably.
6.5.4.2. The Multivariate Normal Distribution
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc542.htm[4/17/2013 7:12:54 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.4. Elements of Multivariate Analysis
6.5.4.2. The Multivariate Normal Distribution
Multivariate
normal
model
When multivariate data are analyzed, the multivariate normal model is
the most commonly used model.
The multivariate normal distribution model extends the univariate normal
distribution model to fit vector observations.
Definition
of
multivariate
normal
distribution
A p-dimensional vector of random variables
is said to have a multivariate normal distribution if its density function
f(X) is of the form
where m = (m
1
, ..., m
p
) is the vector of means and is the variance-
covariance matrix of the multivariate normal distribution. The shortcut
notation for this density is
Univariate
normal
distribution
When p = 1, the one-dimensional vector X = X
1
has the normal
distribution with mean m and variance
2
Bivariate
normal
distribution
When p = 2, X = (X
1
,X
2
) has the bivariate normal distribution with a
two-dimensional vector of means, m = (m
1
,m
2
) and covariance matrix
The correlation between the two random variables is given by
6.5.4.2. The Multivariate Normal Distribution
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc542.htm[4/17/2013 7:12:54 PM]
6.5.4.3. Hotelling's <i>T</i> squared
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc543.htm[4/17/2013 7:12:54 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.4. Elements of Multivariate Analysis
6.5.4.3. Hotelling's T squared
Hotelling's
T
2
distribution
A multivariate method that is the multivariate counterpart of
Student's-t and which also forms the basis for certain
multivariate control charts is based on Hotelling's T
2
distribution, which was introduced by Hotelling (1947).
Univariate
t-test for
mean
Recall, from Section 1.3.5.2,
has a t distribution provided that X is normally distributed,
and can be used as long as X doesn't differ greatly from a
normal distribution. If we wanted to test the hypothesis that
=
0
, we would then have
so that
Generalize
to p
variables
When t
2
is generalized to p variables it becomes
with
S
-1
is the inverse of the sample variance-covariance matrix,
S, and n is the sample size upon which each
i
, i = 1, 2, ..., p,
is based. (The diagonal elements of S are the variances and
the off-diagonal elements are the covariances for the p
6.5.4.3. Hotelling's <i>T</i> squared
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc543.htm[4/17/2013 7:12:54 PM]
variables. This is discussed further in Section 6.5.4.3.1.)
Distribution
of T
2
It is well known that when =
0
with F
(p,n-p)
representing the F distribution with p degrees of
freedom for the numerator and n - p for the denominator.
Thus, if were specified to be
0
, this could be tested by
taking a single p-variate sample of size n, then computing T
2
and comparing it with
for a suitably chosen .
Result does
not apply
directly to
multivariate
Shewhart-
type charts
Although this result applies to hypothesis testing, it does not
apply directly to multivariate Shewhart-type charts (for
which there is no
0
), although the result might be used as an
approximation when a large sample is used and data are in
subgroups, with the upper control limit (UCL) of a chart
based on the approximation.
Three-
sigma limits
from
univariate
control
chart
When a univariate control chart is used for Phase I (analysis
of historical data), and subsequently for Phase II (real-time
process monitoring), the general form of the control limits is
the same for each phase, although this need not be the case.
Specifically, three-sigma limits are used in the univariate
case, which skirts the relevant distribution theory for each
Phase.
Selection of
different
control
limit forms
for each
Phase
Three-sigma units are generally not used with multivariate
charts, however, which makes the selection of different
control limit forms for each Phase (based on the relevant
distribution theory), a natural choice.
6.5.4.3.1. T<sup>2</sup> Chart for Subgroup Averages -- Phase I
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc5431.htm[4/17/2013 7:12:55 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.4. Elements of Multivariate Analysis
6.5.4.3. Hotelling's T squared
6.5.4.3.1.
T
2
Chart for Subgroup Averages --
Phase I
Estimate
with
Since is generally unknown, it is necessary to estimate
analogous to the way that is estimated when an chart is
used. Specifically, when there are rational subgroups, is
estimated by , with
Obtaining
the
i
Each
i
, i = 1, 2, ..., p, is obtained the same way as with an
chart, namely, by taking k subgroups of size n and computing
.
Here is used to denote the average for the lth subgroup of
the ith variable. That is,
with x
ilr
denoting the rth observation (out of n) for the ith
variable in the lth subgroup.
Estimating
the
variances
and
covariances
The variances and covariances are similarly averaged over the
subgroups. Specifically, the s
ij
elements of the variance-
covariance matrix S are obtained as
with s
ijl
for i j denoting the sample covariance between
variables X
i
and X
j
for the lth subgroup, and s
ij
for i = j
6.5.4.3.1. T<sup>2</sup> Chart for Subgroup Averages -- Phase I
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc5431.htm[4/17/2013 7:12:55 PM]
denotes the sample variance of X
i
. The variances (= s
iil
)
for subgroup l and for variables i = 1, 2, ..., p are computed as
.
Similarly, the covariances s
ijl
between variables X
i
and X
j
for
subgroup l are computed as
.
Compare
T
2
against
control
values
As with an chart (or any other chart), the k subgroups
would be tested for control by computing k values of T
2
and
comparing each against the UCL. If any value falls above the
UCL (there is no lower control limit), the corresponding
subgroup would be investigated.
Formula
for plotted
T
2
values
Thus, one would plot
for the jth subgroup (j = 1, 2, ..., k), with denoting a vector
with p elements that contains the subgroup averages for each
of the p characteristics for the jth subgroup. ( is the
inverse matrix of the "pooled" variance-covariance matrix,
, which is obtained by averaging the subgroup variance-
covariance matrices over the k subgroups.)
Formula
for the
upper
control
limit
Each of the k values of given in the equation above would
be compared with
Lower
control
limits
A lower control limit is generally not used in multivariate
control chart applications, although some control chart
methods do utilize a LCL. Although a small value for
might seem desirable, a value that is very small would likely
indicate a problem of some type as we would not expect
every element of to be virtually equal to every element in
.
Delete out-
of-control
points once
cause
discovered
and
As with any Phase I control chart procedure, if there are any
points that plot above the UCL and can be identified as
corresponding to out-of-control conditions that have been
corrected, the point(s) should be deleted and the UCL
recomputed. The remaining points would then be compared
with the new UCL and the process continued as long as
6.5.4.3.1. T<sup>2</sup> Chart for Subgroup Averages -- Phase I
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc5431.htm[4/17/2013 7:12:55 PM]
corrected necessary, remembering that points should be deleted only if
their correspondence with out-of-control conditions can be
identified and the cause(s) of the condition(s) were removed.
6.5.4.3.2. <i>T</i><sup>2</sup> Chart for Subgroup Averages -- Phase II
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc5432.htm[4/17/2013 7:12:56 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.4. Elements of Multivariate Analysis
6.5.4.3. Hotelling's T squared
6.5.4.3.2.
T
2
Chart for Subgroup Averages -- Phase II
Phase II
requires
recomputing
S
p
and
and
different
control
limits
Determining the UCL that is to be subsequently applied to future
subgroups entails recomputing, if necessary, S
p
and , and using a
constant and an F-value that are different from the form given for the
Phase I control limits. The form is different because different
distribution theory is involved since future subgroups are assumed to be
independent of the "current" set of subgroups that is used in calculating
S
p
and . (The same thing happens with charts; the problem is simply
ignored through the use of 3-sigma limits, although a different approach
should be used when there is a small number of subgroups -- and the
necessary theory has been worked out.)
Illustration To illustrate, assume that a subgroups had been discarded (with possibly
a = 0) so that k - a subgroups are used in obtaining and . We shall
let these two values be represented by and to distinguish them
from the original values, and , before any subgroups are deleted.
Future values to be plotted on the multivariate chart would then be
obtained from
with denoting an arbitrary vector containing the averages for
the p characteristics for a single subgroup obtained in the future. Each
of these future values would be plotted on the multivariate chart and
compared with
Phase II
control
limits
with a denoting the number of the original subgroups that are deleted
before computing and . Notice that the equation for the control
limits for Phase II given here does not reduce to the equation for the
control limits for Phase I when a = 0, nor should we expect it to since
the Phase I UCL is used when testing for control of the entire set of
subgroups that is used in computing and .
6.5.4.3.3. Chart for Individual Observations -- Phase I
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc5433.htm[4/17/2013 7:12:57 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.4. Elements of Multivariate Analysis
6.5.4.3. Hotelling's T squared
6.5.4.3.3. Chart for Individual Observations --
Phase I
Multivariate
individual
control
charts
Control charts for multivariate individual observations can
be constructed, just as charts can be constructed for
univariate individual observations.
Constructing
the control
chart
Assume there are m historical multivariate observations to be
tested for control, so that Q
j
, j = 1, 2, ...., m are computed,
with
Control
limits
Each value of Q
j
is compared against control limits of
with B( ) denoting the beta distribution with parameters p/2
and (m-p-1)/2. These limits are due to Tracy, Young and
Mason (1992). Note that a LCL is stated, unlike the other
multivariate control chart procedures given in this section.
Although interest will generally be centered at the UCL, a
value of Q below the LCL should also be investigated, as
this could signal problems in data recording.
Delete
points if
special
cause(s) are
identified
and
corrected
As in the case when subgroups are used, if any points plot
outside these control limits and special cause(s) that were
subsequently removed can be identified, the point(s) would
be deleted and the control limits recomputed, making the
appropriate adjustments on the degrees of freedom, and re-
testing the remaining points against the new limits.
6.5.4.3.3. Chart for Individual Observations -- Phase I
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc5433.htm[4/17/2013 7:12:57 PM]
6.5.4.3.4. Chart for Individual Observations -- Phase II
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc5434.htm[4/17/2013 7:12:57 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.4. Elements of Multivariate Analysis
6.5.4.3. Hotelling's T squared
6.5.4.3.4. Chart for Individual Observations --
Phase II
Control
limits
In Phase II, each value of Q
j
would be plotted against the
UCL of
with, as before, p denoting the number of characteristics.
Further
Information
The control limit expressions given in this section and the
immediately preceding sections are given in Ryan (2000,
Chapter 9).
6.5.4.3.5. Charts for Controlling Multivariate Variability
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc5435.htm[4/17/2013 7:12:58 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.4. Elements of Multivariate Analysis
6.5.4.3. Hotelling's T squared
6.5.4.3.5. Charts for Controlling Multivariate
Variability
No
satisfactory
charts for
multivariate
variability
Unfortunately, there are no charts for controlling multivariate
variability, with either subgroups or individual observations,
that are simple, easy-to-understand and implement, and
statistically defensible. Methods based on the generalized
variance have been proposed for subgroup data, but such
methods have been criticized by Ryan (2000, Section 9.4)
and some references cited therein. For individual
observations, the multivariate analogue of a univariate
moving range chart might be considered as an estimator of
the variance-covariance matrix for Phase I, although the
distribution of the estimator is unknown.
6.5.4.3.6. Constructing Multivariate Charts
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc5436.htm[4/17/2013 7:12:58 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.4. Elements of Multivariate Analysis
6.5.4.3. Hotelling's T squared
6.5.4.3.6. Constructing Multivariate Charts
Multivariate
control
charts not
commonly
available in
statistical
software
Although control charts were originally constructed and
maintained by hand, it would be extremely impractical to try
to do that with the chart procedures that were presented in
Sections 6.5.4.3.1-6.5.4.3.4. Unfortunately, the well-known
statistical software packages do not have capability for the
four procedures just outlined. However, Dataplot, which is
used for case studies and tutorials throughout this e-
Handbook, does have that capability.
6.5.5. Principal Components
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc55.htm[4/17/2013 7:12:59 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.5. Principal Components
Dimension
reduction tool
A Multivariate Analysis problem could start out with a
substantial number of correlated variables. Principal
Component Analysis is a dimension-reduction tool that
can be used advantageously in such situations. Principal
component analysis aims at reducing a large set of
variables to a small set that still contains most of the
information in the large set.
Principal
factors
The technique of principal component analysis enables us
to create and use a reduced set of variables, which are
called principal factors. A reduced set is much easier to
analyze and interpret. To study a data set that results in the
estimation of roughly 500 parameters may be difficult, but
if we could reduce these to 5 it would certainly make our
day. We will show in what follows how to achieve
substantial dimension reduction.
Inverse
transformaion
not possible
While these principal factors represent or replace one or
more of the original variables, it should be noted that they
are not just a one-to-one transformation, so inverse
transformations are not possible.
Original data
matrix
To shed a light on the structure of principal components
analysis, let us consider a multivariate data matrix X, with
n rows and p columns. The p elements of each row are
scores or measurements on a subject such as height, weight
and age.
Linear
function that
maximizes
variance
Next, standardize the X matrix so that each column mean is
0 and each column variance is 1. Call this matrix Z. Each
column is a vector variable, z
i
, i = 1, . . . , p. The main idea
behind principal component analysis is to derive a linear
function y for each of the vector variables z
i
. This linear
function possesses an extremely important property;
namely, its variance is maximized.
Linear
function is
component of
z
This linear function is referred to as a component of z. To
illustrate the computation of a single element for the jth y
vector, consider the product y = z v' where v' is a column
vector of V and V is a p x p coefficient matrix that carries
the p-element variable z into the derived n-element variable
y. V is known as the eigen vector matrix. The dimension of
6.5.5. Principal Components
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc55.htm[4/17/2013 7:12:59 PM]
z is 1 x p, the dimension of v' is p x 1. The scalar algebra
for the component score for the ith individual of y
j
, j = 1,
...p is:
y
ji
= v'
1
z
1i
+ v'
2
z
2i
+ ... + v'
p
z
pi
This becomes in matrix notation for all of the y:
Y = ZV
Mean and
dispersion
matrix of y
The mean of y is m
y
= V'm
z
= 0, because m
z
= 0.
The dispersion matrix of y is
D
y
= V'D
z
V = V'RV
R is
correlation
matrix
Now, it can be shown that the dispersion matrix D
z
of a
standardized variable is a correlation matrix. Thus R is the
correlation matrix for z.
Number of
parameters to
estimate
increases
rapidly as p
increases
At this juncture you may be tempted to say: "so what?". To
answer this let us look at the intercorrelations among the
elements of a vector variable. The number of parameters to
be estimated for a p-element variable is
p means
p variances
(p
2
- p)/2 covariances
for a total of 2p + (p
2
-p)/2 parameters.
So
If p = 2, there are 5 parameters
If p = 10, there are 65 parameters
If p = 30, there are 495 parameters
Uncorrelated
variables
require no
covariance
estimation
All these parameters must be estimated and interpreted.
That is a herculean task, to say the least. Now, if we could
transform the data so that we obtain a vector of
uncorrelated variables, life becomes much more bearable,
since there are no covariances.
6.5.5.1. Properties of Principal Components
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc551.htm[4/17/2013 7:13:00 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.5. Principal Components
6.5.5.1. Properties of Principal Components
Orthogonalizing Transformations
Transformation
from z to y
The equation y = V'z represents a transformation, where y
is the transformed variable, z is the original standardized
variable and V is the premultiplier to go from z to y.
Orthogonal
transformations
simplify things
To produce a transformation vector for y for which the
elements are uncorrelated is the same as saying that we
want V such that D
y
is a diagonal matrix. That is, all the
off-diagonal elements of D
y
must be zero. This is called
an orthogonalizing transformation.
Infinite number
of values for V
There are an infinite number of values for V that will
produce a diagonal D
y
for any correlation matrix R. Thus
the mathematical problem "find a unique V such that D
y
is diagonal" cannot be solved as it stands. A number of
famous statisticians such as Karl Pearson and Harold
Hotelling pondered this problem and suggested a
"variance maximizing" solution.
Principal
components
maximize
variance of the
transformed
elements, one
by one
Hotelling (1933) derived the "principal components"
solution. It proceeds as follows: for the first principal
component, which will be the first element of y and be
defined by the coefficients in the first column of V,
(denoted by v
1
), we want a solution such that the variance
of y
1
will be maximized.
Constrain v to
generate a
unique solution
The constraint on the numbers in v
1
is that the sum of the
squares of the coefficients equals 1. Expressed
mathematically, we wish to maximize
where
y
1i
= v
1
' z
i
and v
1
'v
1
= 1 ( this is called "normalizing " v
1
).
6.5.5.1. Properties of Principal Components
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc551.htm[4/17/2013 7:13:00 PM]
Computation of
first principal
component
from R and v
1
Substituting the middle equation in the first yields
where R is the correlation matrix of Z, which, in turn, is
the standardized matrix of X, the original data matrix.
Therefore, we want to maximize v
1
'Rv
1
subject to v
1
'v
1
= 1.
The eigenstructure
Lagrange
multiplier
approach
Let
>
introducing the restriction on v
1
via the Lagrange
multiplier approach. It can be shown (T.W. Anderson,
1958, page 347, theorem 8) that the vector of partial
derivatives is
and setting this equal to zero, dividing out 2 and factoring
gives
This is known as "the problem of the eigenstructure of
R".
Set of p
homogeneous
equations
The partial differentiation resulted in a set of p
homogeneous equations, which may be written in matrix
form as follows
The characteristic equation
Characterstic
equation of R is
a polynomial of
The characteristic equation of R is a polynomial of
degree p, which is obtained by expanding the determinant
of
6.5.5.1. Properties of Principal Components
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc551.htm[4/17/2013 7:13:00 PM]
degree p
and solving for the roots
j
, j = 1, 2, ..., p.
Largest
eigenvalue
Specifically, the largest eigenvalue,
1
, and its associated
vector, v
1
, are required. Solving for this eigenvalue and
vector is another mammoth numerical task that can
realistically only be performed by a computer. In general,
software is involved and the algorithms are complex.
Remainig p
eigenvalues
After obtaining the first eigenvalue, the process is
repeated until all p eigenvalues are computed.
Full
eigenstructure
of R
To succinctly define the full eigenstructure of R, we
introduce another matrix L, which is a diagonal matrix
with
j
in the jth position on the diagonal. Then the full
eigenstructure of R is given as
RV = VL
where
V'V = VV' = I
and
V'RV = L = D
y
Principal Factors
Scale to zero
means and unit
variances
It was mentioned before that it is helpful to scale any
transformation y of a vector variable z so that its elements
have zero means and unit variances. Such a standardized
transformation is called a factoring of z, or of R, and
each linear component of the transformation is called a
factor.
Deriving unit
variances for
principal
components
Now, the principal components already have zero means,
but their variances are not 1; in fact, they are the
eigenvalues, comprising the diagonal elements of L. It is
possible to derive the principal factor with unit variance
from the principal component as follows
6.5.5.1. Properties of Principal Components
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc551.htm[4/17/2013 7:13:00 PM]
or for all factors:
substituting V'z for y we have
where
B = VL
-1/2
B matrix The matrix B is then the matrix of factor score
coefficients for principal factors.
How many Eigenvalues?
Dimensionality
of the set of
factor scores
The number of eigenvalues, N, used in the final set
determines the dimensionality of the set of factor scores.
For example, if the original test consisted of 8
measurements on 100 subjects, and we extract 2
eigenvalues, the set of factor scores is a matrix of 100
rows by 2 columns.
Eigenvalues
greater than
unity
Each column or principal factor should represent a
number of original variables. Kaiser (1966) suggested a
rule-of-thumb that takes as a value for N, the number of
eigenvalues larger than unity.
Factor Structure
Factor
structure
matrix S
The primary interpretative device in principal components
is the factor structure, computed as
S = VL
1/2
S is a matrix whose elements are the correlations between
the principal components and the variables. If we retain,
for example, two eigenvalues, meaning that there are two
principal components, then the S matrix consists of two
columns and p (number of variables) rows.
Table showing
relation
between
variables and
principal
components
Principal Component
Variable 1 2
1 r
11
r
12
2 r
21
r
22
6.5.5.1. Properties of Principal Components
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc551.htm[4/17/2013 7:13:00 PM]
3 r
31
r
32
4 r
41
r
42
The r
ij
are the correlation coefficients between variable i
and principal component j, where i ranges from 1 to 4
and j from 1 to 2.
The
communality
SS' is the source of the "explained" correlations among
the variables. Its diagonal is called "the communality".
Rotation
Factor analysis If this correlation matrix, i.e., the factor structure matrix,
does not help much in the interpretation, it is possible to
rotate the axis of the principal components. This may
result in the polarization of the correlation coefficients.
Some practitioners refer to rotation after generating the
factor structure as factor analysis.
Varimax
rotation
A popular scheme for rotation was suggested by Henry
Kaiser in 1958. He produced a method for orthogonal
rotation of factors, called the varimax rotation, which
cleans up the factors as follows:
for each factor, high loadings (correlations) will
result for a few variables; the rest will be near
zero.
Example The following computer output from a principal
component analysis on a 4-variable data set, followed by
varimax rotation of the factor structure, will illustrate his
point.
Before Rotation After Rotation
Variable Factor
1
Factor
2
Factor
1
Factor
2
1 .853 -.989 .997 .058
2 .634 .762 .089 .987
3 .858 -.498 .989 .076
4 .633 .736 .103 .965
Communality
Formula for
communality
statistic
A measure of how well the selected factors (principal
components) "explain" the variance of each of the
variables is given by a statistic called communality. This
is defined by
6.5.5.1. Properties of Principal Components
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc551.htm[4/17/2013 7:13:00 PM]
Explanation of
communality
statistic
That is: the square of the correlation of variable k with
factor i gives the part of the variance accounted for by
that factor. The sum of these squares for n factors is the
communality, or explained variable for that variable
(row).
Roadmap to solve the V matrix
Main steps to
obtaining
eigenstructure
for a
correlation
matrix
In summary, here are the main steps to obtain the
eigenstructure for a correlation matrix.
1. Compute R, the correlation matrix of the original
data. R is also the correlation matrix of the
standardized data.
2. Obtain the characteristic equation of R which is a
polynomial of degree p (the number of variables),
obtained from expanding the determinant of |R- I|
= 0 and solving for the roots
i
, that is:
1
,
2
, ...
,
p
.
3. Then solve for the columns of the V matrix, (v
1
, v
2
,
..v
p
). The roots, ,
i
, are called the eigenvalues (or
latent values). The columns of V are called the
eigenvectors.
6.5.5.2. Numerical Example
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc552.htm[4/17/2013 7:13:01 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.5. Principal Components
6.5.5.2. Numerical Example
Calculation
of principal
components
example
A numerical example may clarify the mechanics of principal
component analysis.
Sample data
set
Let us analyze the following 3-variate dataset with 10 observations. Each
observation consists of 3 measurements on a wafer: thickness, horizontal
displacement and vertical displacement.
Compute the
correlation
matrix
First compute the correlation matrix
Solve for the
roots of R
Next solve for the roots of R, using software
value proportion
1 1.769 .590
2 .927 .899
3 .304 1.000
6.5.5.2. Numerical Example
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc552.htm[4/17/2013 7:13:01 PM]
Notice that
Each eigenvalue satisfies |R- I| = 0.
The sum of the eigenvalues = 3 = p, which is equal to the trace of
R (i.e., the sum of the main diagonal elements).
The determinant of R is the product of the eigenvalues.
The product is
1
x
2
x
3
= .499.
Compute the
first column
of the V
matrix
Substituting the first eigenvalue of 1.769 and R in the appropriate
equation we obtain
This is the matrix expression for 3 homogeneous equations with 3
unknowns and yields the first column of V: .64 .69 -.34 (again, a
computerized solution is indispensable).
Compute the
remaining
columns of
the V matrix
Repeating this procedure for the other 2 eigenvalues yields the matrix V
Notice that if you multiply V by its transpose, the result is an identity
matrix, V'V=I.
Compute the
L
1/2
matrix
Now form the matrix L
1/2
, which is a diagonal matrix whose elements
are the square roots of the eigenvalues of R. Then obtain S, the factor
structure, using S = V L
1/2
So, for example, .91 is the correlation between variable 2 and the first
principal component.
Compute the
communality
Next compute the communality, using the first two eigenvalues only
6.5.5.2. Numerical Example
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc552.htm[4/17/2013 7:13:01 PM]
Diagonal
elements
report how
much of the
variability is
explained
Communality consists of the diagonal elements.
var
1 .8662
2 .8420
3 .9876
This means that the first two principal components "explain" 86.62% of
the first variable, 84.20 % of the second variable, and 98.76% of the
third.
Compute the
coefficient
matrix
The coefficient matrix, B, is formed using the reciprocals of the
diagonals of L
1/2
B = VL
-1/2
=
Compute the
principal
factors
Finally, we can compute the factor scores from ZB, where Z is X
converted to standard score form. These columns are the principal
factors.
Principal
factors
control
chart
These factors can be plotted against the indices, which could be times. If
time is used, the resulting plot is an example of a principal factors
control chart.
6.6. Case Studies in Process Monitoring
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc6.htm[4/17/2013 7:13:01 PM]
6. Process or Product Monitoring and Control
6.6. Case Studies in Process Monitoring
Detailed
Examples
The general points of the first five sections are illustrated in
this section using data from physical science and engineering
applications. Each example is presented step-by-step in the
text, and is often cross-linked with the relevant sections of the
chapter describing the analysis in general.
Contents:
Section 6
1. Lithography Process Example
2. Aerosol Particle Size Example
6.6.1. Lithography Process
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc61.htm[4/17/2013 7:13:02 PM]
6. Process or Product Monitoring and Control
6.6. Case Studies in Process Monitoring
6.6.1. Lithography Process
Lithography
Process
This case study illustrates the use of control charts in
analyzing a lithography process.
1. Background and Data
2. Graphical Representation of the Data
3. Subgroup Analysis
4. Shewhart Control Chart
5. Work This Example Yourself
6.6.1.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc611.htm[4/17/2013 7:13:02 PM]
6. Process or Product Monitoring and Control
6.6. Case Studies in Process Monitoring
6.6.1. Lithography Process
6.6.1.1. Background and Data
Case Study for SPC in Batch Processing Environment
Semiconductor
processing
creates
multiple
sources of
variability to
monitor
One of the assumptions in using classical Shewhart SPC charts
is that the only source of variation is from part to part (or
within subgroup variation). This is the case for most continuous
processing situations. However, many of today's processing
situations have different sources of variation. The
semiconductor industry is one of the areas where the
processing creates multiple sources of variation.
In semiconductor processing, the basic experimental unit is a
silicon wafer. Operations are performed on the wafer, but
individual wafers can be grouped multiple ways. In the
diffusion area, up to 150 wafers are processed in one time in a
diffusion tube. In the etch area, single wafers are processed
individually. In the lithography area, the light exposure is done
on sub-areas of the wafer. There are many times during the
production of a computer chip where the experimental unit
varies and thus there are different sources of variation in this
batch processing environment.
The following is a case study of a lithography process. Five
sites are measured on each wafer, three wafers are measured in
a cassette (typically a grouping of 24 - 25 wafers) and thirty
cassettes of wafers are used in the study. The width of a line is
the measurement under study. There are two line width
variables. The first is the original data and the second has been
cleaned up somewhat. This case study uses the raw data. The
entire data table is 450 rows long with six columns.
Software The analyses used in this case study can be generated using
both Dataplot code and R code.
Case study
data: wafer
line width
measurements
Raw
Cl eaned
Li ne
Li ne
Casset t e Waf er Si t e Wi dt h Sequence
Wi dt h
=====================================================
1 1 Top 3. 199275 1
3. 197275
1 1 Lef 2. 253081 2
2. 249081
1 1 Cen 2. 074308 3
2. 068308
6.6.1.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc611.htm[4/17/2013 7:13:02 PM]
1 1 Rgt 2. 418206 4
2. 410206
1 1 Bot 2. 393732 5
2. 383732
1 2 Top 2. 654947 6
2. 642947
1 2 Lef 2. 003234 7
1. 989234
1 2 Cen 1. 861268 8
1. 845268
1 2 Rgt 2. 136102 9
2. 118102
1 2 Bot 1. 976495 10
1. 956495
1 3 Top 2. 887053 11
2. 865053
1 3 Lef 2. 061239 12
2. 037239
1 3 Cen 1. 625191 13
1. 599191
1 3 Rgt 2. 304313 14
2. 276313
1 3 Bot 2. 233187 15
2. 203187
2 1 Top 3. 160233 16
3. 128233
2 1 Lef 2. 518913 17
2. 484913
2 1 Cen 2. 072211 18
2. 036211
2 1 Rgt 2. 287210 19
2. 249210
2 1 Bot 2. 120452 20
2. 080452
2 2 Top 2. 063058 21
2. 021058
2 2 Lef 2. 217220 22
2. 173220
2 2 Cen 1. 472945 23
1. 426945
2 2 Rgt 1. 684581 24
1. 636581
2 2 Bot 1. 900688 25
1. 850688
2 3 Top 2. 346254 26
2. 294254
2 3 Lef 2. 172825 27
2. 118825
2 3 Cen 1. 536538 28
1. 480538
2 3 Rgt 1. 966630 29
1. 908630
2 3 Bot 2. 251576 30
2. 191576
3 1 Top 2. 198141 31
2. 136141
3 1 Lef 1. 728784 32
1. 664784
3 1 Cen 1. 357348 33
1. 291348
3 1 Rgt 1. 673159 34
1. 605159
3 1 Bot 1. 429586 35
1. 359586
3 2 Top 2. 231291 36
2. 159291
3 2 Lef 1. 561993 37
1. 487993
3 2 Cen 1. 520104 38
1. 444104
3 2 Rgt 2. 066068 39
1. 988068
3 2 Bot 1. 777603 40
1. 697603
3 3 Top 2. 244736 41
2. 162736
3 3 Lef 1. 745877 42
1. 661877
3 3 Cen 1. 366895 43
1. 280895
3 3 Rgt 1. 615229 44
6.6.1.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc611.htm[4/17/2013 7:13:02 PM]
1. 527229
3 3 Bot 1. 540863 45
1. 450863
4 1 Top 2. 929037 46
2. 837037
4 1 Lef 2. 035900 47
1. 941900
4 1 Cen 1. 786147 48
1. 690147
4 1 Rgt 1. 980323 49
1. 882323
4 1 Bot 2. 162919 50
2. 062919
4 2 Top 2. 855798 51
2. 753798
4 2 Lef 2. 104193 52
2. 000193
4 2 Cen 1. 919507 53
1. 813507
4 2 Rgt 2. 019415 54
1. 911415
4 2 Bot 2. 228705 55
2. 118705
4 3 Top 3. 219292 56
3. 107292
4 3 Lef 2. 900430 57
2. 786430
4 3 Cen 2. 171262 58
2. 055262
4 3 Rgt 3. 041250 59
2. 923250
4 3 Bot 3. 188804 60
3. 068804
5 1 Top 3. 051234 61
2. 929234
5 1 Lef 2. 506230 62
2. 382230
5 1 Cen 1. 950486 63
1. 824486
5 1 Rgt 2. 467719 64
2. 339719
5 1 Bot 2. 581881 65
2. 451881
5 2 Top 3. 857221 66
3. 725221
5 2 Lef 3. 347343 67
3. 213343
5 2 Cen 2. 533870 68
2. 397870
5 2 Rgt 3. 190375 69
3. 052375
5 2 Bot 3. 362746 70
3. 222746
5 3 Top 3. 690306 71
3. 548306
5 3 Lef 3. 401584 72
3. 257584
5 3 Cen 2. 963117 73
2. 817117
5 3 Rgt 2. 945828 74
2. 797828
5 3 Bot 3. 466115 75
3. 316115
6 1 Top 2. 938241 76
2. 786241
6 1 Lef 2. 526568 77
2. 372568
6 1 Cen 1. 941370 78
1. 785370
6 1 Rgt 2. 765849 79
2. 607849
6 1 Bot 2. 382781 80
2. 222781
6 2 Top 3. 219665 81
3. 057665
6 2 Lef 2. 296011 82
2. 132011
6 2 Cen 2. 256196 83
2. 090196
6 2 Rgt 2. 645933 84
2. 477933
6.6.1.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc611.htm[4/17/2013 7:13:02 PM]
6 2 Bot 2. 422187 85
2. 252187
6 3 Top 3. 180348 86
3. 008348
6 3 Lef 2. 849264 87
2. 675264
6 3 Cen 1. 601288 88
1. 425288
6 3 Rgt 2. 810051 89
2. 632051
6 3 Bot 2. 902980 90
2. 722980
7 1 Top 2. 169679 91
1. 987679
7 1 Lef 2. 026506 92
1. 842506
7 1 Cen 1. 671804 93
1. 485804
7 1 Rgt 1. 660760 94
1. 472760
7 1 Bot 2. 314734 95
2. 124734
7 2 Top 2. 912838 96
2. 720838
7 2 Lef 2. 323665 97
2. 129665
7 2 Cen 1. 854223 98
1. 658223
7 2 Rgt 2. 391240 99 2. 19324
7 2 Bot 2. 196071 100
1. 996071
7 3 Top 3. 318517 101
3. 116517
7 3 Lef 2. 702735 102
2. 498735
7 3 Cen 1. 959008 103
1. 753008
7 3 Rgt 2. 512517 104
2. 304517
7 3 Bot 2. 827469 105
2. 617469
8 1 Top 1. 958022 106
1. 746022
8 1 Lef 1. 360106 107
1. 146106
8 1 Cen 0. 971193 108
0. 755193
8 1 Rgt 1. 947857 109
1. 729857
8 1 Bot 1. 643580 110 1. 42358
8 2 Top 2. 357633 111
2. 135633
8 2 Lef 1. 757725 112
1. 533725
8 2 Cen 1. 165886 113
0. 939886
8 2 Rgt 2. 231143 114
2. 003143
8 2 Bot 1. 311626 115
1. 081626
8 3 Top 2. 421686 116
2. 189686
8 3 Lef 1. 993855 117
1. 759855
8 3 Cen 1. 402543 118
1. 166543
8 3 Rgt 2. 008543 119
1. 770543
8 3 Bot 2. 139370 120
1. 899370
9 1 Top 2. 190676 121
1. 948676
9 1 Lef 2. 287483 122
2. 043483
9 1 Cen 1. 698943 123
1. 452943
9 1 Rgt 1. 925731 124
1. 677731
9 1 Bot 2. 057440 125
1. 807440
9 2 Top 2. 353597 126
6.6.1.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc611.htm[4/17/2013 7:13:02 PM]
2. 101597
9 2 Lef 1. 796236 127
1. 542236
9 2 Cen 1. 241040 128
0. 985040
9 2 Rgt 1. 677429 129
1. 419429
9 2 Bot 1. 845041 130
1. 585041
9 3 Top 2. 012669 131
1. 750669
9 3 Lef 1. 523769 132
1. 259769
9 3 Cen 0. 790789 133
0. 524789
9 3 Rgt 2. 001942 134
1. 733942
9 3 Bot 1. 350051 135
1. 080051
10 1 Top 2. 825749 136
2. 553749
10 1 Lef 2. 502445 137
2. 228445
10 1 Cen 1. 938239 138
1. 662239
10 1 Rgt 2. 349497 139
2. 071497
10 1 Bot 2. 310817 140
2. 030817
10 2 Top 3. 074576 141
2. 792576
10 2 Lef 2. 057821 142
1. 773821
10 2 Cen 1. 793617 143
1. 507617
10 2 Rgt 1. 862251 144
1. 574251
10 2 Bot 1. 956753 145
1. 666753
10 3 Top 3. 072840 146
2. 780840
10 3 Lef 2. 291035 147
1. 997035
10 3 Cen 1. 873878 148
1. 577878
10 3 Rgt 2. 475640 149
2. 177640
10 3 Bot 2. 021472 150
1. 721472
11 1 Top 3. 228835 151
2. 926835
11 1 Lef 2. 719495 152
2. 415495
11 1 Cen 2. 207198 153
1. 901198
11 1 Rgt 2. 391608 154
2. 083608
11 1 Bot 2. 525587 155
2. 215587
11 2 Top 2. 891103 156
2. 579103
11 2 Lef 2. 738007 157
2. 424007
11 2 Cen 1. 668337 158
1. 352337
11 2 Rgt 2. 496426 159
2. 178426
11 2 Bot 2. 417926 160
2. 097926
11 3 Top 3. 541799 161
3. 219799
11 3 Lef 3. 058768 162
2. 734768
11 3 Cen 2. 187061 163
1. 861061
11 3 Rgt 2. 790261 164
2. 462261
11 3 Bot 3. 279238 165
2. 949238
12 1 Top 2. 347662 166
2. 015662
6.6.1.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc611.htm[4/17/2013 7:13:02 PM]
12 1 Lef 1. 383336 167
1. 049336
12 1 Cen 1. 187168 168
0. 851168
12 1 Rgt 1. 693292 169
1. 355292
12 1 Bot 1. 664072 170
1. 324072
12 2 Top 2. 385320 171
2. 043320
12 2 Lef 1. 607784 172
1. 263784
12 2 Cen 1. 230307 173
0. 884307
12 2 Rgt 1. 945423 174
1. 597423
12 2 Bot 1. 907580 175
1. 557580
12 3 Top 2. 691576 176
2. 339576
12 3 Lef 1. 938755 177
1. 584755
12 3 Cen 1. 275409 178
0. 919409
12 3 Rgt 1. 777315 179
1. 419315
12 3 Bot 2. 146161 180
1. 786161
13 1 Top 3. 218655 181
2. 856655
13 1 Lef 2. 912180 182
2. 548180
13 1 Cen 2. 336436 183
1. 970436
13 1 Rgt 2. 956036 184
2. 588036
13 1 Bot 2. 423235 185
2. 053235
13 2 Top 3. 302224 186
2. 930224
13 2 Lef 2. 808816 187
2. 434816
13 2 Cen 2. 340386 188
1. 964386
13 2 Rgt 2. 795120 189
2. 417120
13 2 Bot 2. 865800 190
2. 485800
13 3 Top 2. 992217 191
2. 610217
13 3 Lef 2. 952106 192
2. 568106
13 3 Cen 2. 149299 193
1. 763299
13 3 Rgt 2. 448046 194
2. 060046
13 3 Bot 2. 507733 195
2. 117733
14 1 Top 3. 530112 196
3. 138112
14 1 Lef 2. 940489 197
2. 546489
14 1 Cen 2. 598357 198
2. 202357
14 1 Rgt 2. 905165 199
2. 507165
14 1 Bot 2. 692078 200
2. 292078
14 2 Top 3. 764270 201
3. 362270
14 2 Lef 3. 465960 202
3. 061960
14 2 Cen 2. 458628 203
2. 052628
14 2 Rgt 3. 141132 204
2. 733132
14 2 Bot 2. 816526 205
2. 406526
14 3 Top 3. 217614 206
2. 805614
14 3 Lef 2. 758171 207
6.6.1.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc611.htm[4/17/2013 7:13:02 PM]
2. 344171
14 3 Cen 2. 345921 208
1. 929921
14 3 Rgt 2. 773653 209
2. 355653
14 3 Bot 3. 109704 210
2. 689704
15 1 Top 2. 177593 211
1. 755593
15 1 Lef 1. 511781 212
1. 087781
15 1 Cen 0. 746546 213
0. 320546
15 1 Rgt 1. 491730 214
1. 063730
15 1 Bot 1. 268580 215
0. 838580
15 2 Top 2. 433994 216
2. 001994
15 2 Lef 2. 045667 217
1. 611667
15 2 Cen 1. 612699 218
1. 176699
15 2 Rgt 2. 082860 219
1. 644860
15 2 Bot 1. 887341 220
1. 447341
15 3 Top 1. 923003 221
1. 481003
15 3 Lef 2. 124461 222
1. 680461
15 3 Cen 1. 945048 223
1. 499048
15 3 Rgt 2. 210698 224
1. 762698
15 3 Bot 1. 985225 225
1. 535225
16 1 Top 3. 131536 226
2. 679536
16 1 Lef 2. 405975 227
1. 951975
16 1 Cen 2. 206320 228
1. 750320
16 1 Rgt 3. 012211 229
2. 554211
16 1 Bot 2. 628723 230
2. 168723
16 2 Top 2. 802486 231
2. 340486
16 2 Lef 2. 185010 232
1. 721010
16 2 Cen 2. 161802 233
1. 695802
16 2 Rgt 2. 102560 234
1. 634560
16 2 Bot 1. 961968 235
1. 491968
16 3 Top 3. 330183 236
2. 858183
16 3 Lef 2. 464046 237
1. 990046
16 3 Cen 1. 687408 238
1. 211408
16 3 Rgt 2. 043322 239
1. 565322
16 3 Bot 2. 570657 240
2. 090657
17 1 Top 3. 352633 241
2. 870633
17 1 Lef 2. 691645 242
2. 207645
17 1 Cen 1. 942410 243
1. 456410
17 1 Rgt 2. 366055 244
1. 878055
17 1 Bot 2. 500987 245
2. 010987
17 2 Top 2. 886284 246
2. 394284
17 2 Lef 2. 292503 247
1. 798503
6.6.1.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc611.htm[4/17/2013 7:13:02 PM]
17 2 Cen 1. 627562 248
1. 131562
17 2 Rgt 2. 415076 249
1. 917076
17 2 Bot 2. 086134 250
1. 586134
17 3 Top 2. 554848 251
2. 052848
17 3 Lef 1. 755843 252
1. 251843
17 3 Cen 1. 510124 253
1. 004124
17 3 Rgt 2. 257347 254
1. 749347
17 3 Bot 1. 958592 255
1. 448592
18 1 Top 2. 622733 256
2. 110733
18 1 Lef 2. 321079 257
1. 807079
18 1 Cen 1. 169269 258
0. 653269
18 1 Rgt 1. 921457 259
1. 403457
18 1 Bot 2. 176377 260
1. 656377
18 2 Top 3. 313367 261
2. 791367
18 2 Lef 2. 559725 262
2. 035725
18 2 Cen 2. 404662 263
1. 878662
18 2 Rgt 2. 405249 264
1. 877249
18 2 Bot 2. 535618 265
2. 005618
18 3 Top 3. 067851 266
2. 535851
18 3 Lef 2. 490359 267
1. 956359
18 3 Cen 2. 079477 268
1. 543477
18 3 Rgt 2. 669512 269
2. 131512
18 3 Bot 2. 105103 270
1. 565103
19 1 Top 4. 293889 271
3. 751889
19 1 Lef 3. 888826 272
3. 344826
19 1 Cen 2. 960655 273
2. 414655
19 1 Rgt 3. 618864 274
3. 070864
19 1 Bot 3. 562480 275
3. 012480
19 2 Top 3. 451872 276
2. 899872
19 2 Lef 3. 285934 277
2. 731934
19 2 Cen 2. 638294 278
2. 082294
19 2 Rgt 2. 918810 279
2. 360810
19 2 Bot 3. 076231 280
2. 516231
19 3 Top 3. 879683 281
3. 317683
19 3 Lef 3. 342026 282
2. 778026
19 3 Cen 3. 382833 283
2. 816833
19 3 Rgt 3. 491666 284
2. 923666
19 3 Bot 3. 617621 285
3. 047621
20 1 Top 2. 329987 286
1. 757987
20 1 Lef 2. 400277 287
1. 826277
20 1 Cen 2. 033941 288
6.6.1.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc611.htm[4/17/2013 7:13:02 PM]
1. 457941
20 1 Rgt 2. 544367 289
1. 966367
20 1 Bot 2. 493079 290
1. 913079
20 2 Top 2. 862084 291
2. 280084
20 2 Lef 2. 404703 292
1. 820703
20 2 Cen 1. 648662 293
1. 062662
20 2 Rgt 2. 115465 294
1. 527465
20 2 Bot 2. 633930 295
2. 043930
20 3 Top 3. 305211 296
2. 713211
20 3 Lef 2. 194991 297
1. 600991
20 3 Cen 1. 620963 298
1. 024963
20 3 Rgt 2. 322678 299
1. 724678
20 3 Bot 2. 818449 300
2. 218449
21 1 Top 2. 712915 301
2. 110915
21 1 Lef 2. 389121 302
1. 785121
21 1 Cen 1. 575833 303
0. 969833
21 1 Rgt 1. 870484 304
1. 262484
21 1 Bot 2. 203262 305
1. 593262
21 2 Top 2. 607972 306
1. 995972
21 2 Lef 2. 177747 307
1. 563747
21 2 Cen 1. 246016 308
0. 630016
21 2 Rgt 1. 663096 309
1. 045096
21 2 Bot 1. 843187 310
1. 223187
21 3 Top 2. 277813 311
1. 655813
21 3 Lef 1. 764940 312
1. 140940
21 3 Cen 1. 358137 313
0. 732137
21 3 Rgt 2. 065713 314
1. 437713
21 3 Bot 1. 885897 315
1. 255897
22 1 Top 3. 126184 316
2. 494184
22 1 Lef 2. 843505 317
2. 209505
22 1 Cen 2. 041466 318
1. 405466
22 1 Rgt 2. 816967 319
2. 178967
22 1 Bot 2. 635127 320
1. 995127
22 2 Top 3. 049442 321
2. 407442
22 2 Lef 2. 446904 322
1. 802904
22 2 Cen 1. 793442 323
1. 147442
22 2 Rgt 2. 676519 324
2. 028519
22 2 Bot 2. 187865 325
1. 537865
22 3 Top 2. 758416 326
2. 106416
22 3 Lef 2. 405744 327
1. 751744
22 3 Cen 1. 580387 328
0. 924387
6.6.1.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc611.htm[4/17/2013 7:13:02 PM]
22 3 Rgt 2. 508542 329
1. 850542
22 3 Bot 2. 574564 330
1. 914564
23 1 Top 3. 294288 331
2. 632288
23 1 Lef 2. 641762 332
1. 977762
23 1 Cen 2. 105774 333
1. 439774
23 1 Rgt 2. 655097 334
1. 987097
23 1 Bot 2. 622482 335
1. 952482
23 2 Top 4. 066631 336
3. 394631
23 2 Lef 3. 389733 337
2. 715733
23 2 Cen 2. 993666 338
2. 317666
23 2 Rgt 3. 613128 339
2. 935128
23 2 Bot 3. 213809 340
2. 533809
23 3 Top 3. 369665 341
2. 687665
23 3 Lef 2. 566891 342
1. 882891
23 3 Cen 2. 289899 343
1. 603899
23 3 Rgt 2. 517418 344
1. 829418
23 3 Bot 2. 862723 345
2. 172723
24 1 Top 4. 212664 346
3. 520664
24 1 Lef 3. 068342 347
2. 374342
24 1 Cen 2. 872188 348
2. 176188
24 1 Rgt 3. 040890 349
2. 342890
24 1 Bot 3. 376318 350
2. 676318
24 2 Top 3. 223384 351
2. 521384
24 2 Lef 2. 552726 352
1. 848726
24 2 Cen 2. 447344 353
1. 741344
24 2 Rgt 3. 011574 354
2. 303574
24 2 Bot 2. 711774 355
2. 001774
24 3 Top 3. 359505 356
2. 647505
24 3 Lef 2. 800742 357
2. 086742
24 3 Cen 2. 043396 358
1. 327396
24 3 Rgt 2. 929792 359
2. 211792
24 3 Bot 2. 935356 360
2. 215356
25 1 Top 2. 724871 361
2. 002871
25 1 Lef 2. 239013 362
1. 515013
25 1 Cen 2. 341512 363
1. 615512
25 1 Rgt 2. 263617 364
1. 535617
25 1 Bot 2. 062748 365
1. 332748
25 2 Top 3. 658082 366
2. 926082
25 2 Lef 3. 093268 367
2. 359268
25 2 Cen 2. 429341 368
1. 693341
25 2 Rgt 2. 538365 369
6.6.1.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc611.htm[4/17/2013 7:13:02 PM]
1. 800365
25 2 Bot 3. 161795 370
2. 421795
25 3 Top 3. 178246 371
2. 436246
25 3 Lef 2. 498102 372
1. 754102
25 3 Cen 2. 445810 373
1. 699810
25 3 Rgt 2. 231248 374
1. 483248
25 3 Bot 2. 302298 375
1. 552298
26 1 Top 3. 320688 376
2. 568688
26 1 Lef 2. 861800 377
2. 107800
26 1 Cen 2. 238258 378
1. 482258
26 1 Rgt 3. 122050 379
2. 364050
26 1 Bot 3. 160876 380
2. 400876
26 2 Top 3. 873888 381
3. 111888
26 2 Lef 3. 166345 382
2. 402345
26 2 Cen 2. 645267 383
1. 879267
26 2 Rgt 3. 309867 384
2. 541867
26 2 Bot 3. 542882 385
2. 772882
26 3 Top 2. 586453 386
1. 814453
26 3 Lef 2. 120604 387
1. 346604
26 3 Cen 2. 180847 388
1. 404847
26 3 Rgt 2. 480888 389
1. 702888
26 3 Bot 1. 938037 390
1. 158037
27 1 Top 4. 710718 391
3. 928718
27 1 Lef 4. 082083 392
3. 298083
27 1 Cen 3. 533026 393
2. 747026
27 1 Rgt 4. 269929 394
3. 481929
27 1 Bot 4. 038166 395
3. 248166
27 2 Top 4. 237233 396
3. 445233
27 2 Lef 4. 171702 397
3. 377702
27 2 Cen 3. 04394 398
2. 247940
27 2 Rgt 3. 91296 399
3. 114960
27 2 Bot 3. 714229 400
2. 914229
27 3 Top 5. 168668 401
4. 366668
27 3 Lef 4. 823275 402
4. 019275
27 3 Cen 3. 764272 403
2. 958272
27 3 Rgt 4. 396897 404
3. 588897
27 3 Bot 4. 442094 405
3. 632094
28 1 Top 3. 972279 406
3. 160279
28 1 Lef 3. 883295 407
3. 069295
28 1 Cen 3. 045145 408
2. 229145
28 1 Rgt 3. 51459 409
2. 696590
6.6.1.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc611.htm[4/17/2013 7:13:02 PM]
28 1 Bot 3. 575446 410
2. 755446
28 2 Top 3. 024903 411
2. 202903
28 2 Lef 3. 099192 412
2. 275192
28 2 Cen 2. 048139 413
1. 222139
28 2 Rgt 2. 927978 414
2. 099978
28 2 Bot 3. 15257 415
2. 322570
28 3 Top 3. 55806 416
2. 726060
28 3 Lef 3. 176292 417
2. 342292
28 3 Cen 2. 852873 418
2. 016873
28 3 Rgt 3. 026064 419
2. 188064
28 3 Bot 3. 071975 420
2. 231975
29 1 Top 3. 496634 421
2. 654634
29 1 Lef 3. 087091 422
2. 243091
29 1 Cen 2. 517673 423
1. 671673
29 1 Rgt 2. 547344 424
1. 699344
29 1 Bot 2. 971948 425
2. 121948
29 2 Top 3. 371306 426
2. 519306
29 2 Lef 2. 175046 427
1. 321046
29 2 Cen 1. 940111 428
1. 084111
29 2 Rgt 2. 932408 429
2. 074408
29 2 Bot 2. 428069 430
1. 568069
29 3 Top 2. 941041 431
2. 079041
29 3 Lef 2. 294009 432
1. 430009
29 3 Cen 2. 025674 433
1. 159674
29 3 Rgt 2. 21154 434
1. 343540
29 3 Bot 2. 459684 435
1. 589684
30 1 Top 2. 86467 436
1. 992670
30 1 Lef 2. 695163 437
1. 821163
30 1 Cen 2. 229518 438
1. 353518
30 1 Rgt 1. 940917 439
1. 062917
30 1 Bot 2. 547318 440
1. 667318
30 2 Top 3. 537562 441
2. 655562
30 2 Lef 3. 311361 442
2. 427361
30 2 Cen 2. 767771 443
1. 881771
30 2 Rgt 3. 388622 444
2. 500622
30 2 Bot 3. 542701 445
2. 652701
30 3 Top 3. 184652 446
2. 292652
30 3 Lef 2. 620947 447
1. 726947
30 3 Cen 2. 697619 448
1. 801619
30 3 Rgt 2. 860684 449
1. 962684
30 3 Bot 2. 758571 450
6.6.1.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc611.htm[4/17/2013 7:13:02 PM]
1. 858571
6.6.1.2. Graphical Representation of the Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc612.htm[4/17/2013 7:13:04 PM]
6. Process or Product Monitoring and Control
6.6. Case Studies in Process Monitoring
6.6.1. Lithography Process
6.6.1.2. Graphical Representation of the Data
The first step in analyzing the data is to generate some
simple plots of the response and then of the response versus
the various factors.
4-Plot of
Data
Interpretation This 4-plot shows the following.
1. The run sequence plot (upper left) indicates that the
location and scale are not constant over time. This
indicates that the three factors do in fact have an
effect of some kind.
2. The lag plot (upper right) indicates that there is some
mild autocorrelation in the data. This is not
unexpected as the data are grouped in a logical order
of the three factors (i.e., not randomly) and the run
sequence plot indicates that there are factor effects.
3. The histogram (lower left) shows that most of the
data fall between 1 and 5, with the center of the data
at about 2.2.
4. Due to the non-constant location and scale and
autocorrelation in the data, distributional inferences
from the normal probability plot (lower right) are not
meaningful.
6.6.1.2. Graphical Representation of the Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc612.htm[4/17/2013 7:13:04 PM]
The run sequence plot is shown at full size to show greater
detail. In addition, a numerical summary of the data is
generated.
Run
Sequence
Plot of Data
Numerical
Summary
Sampl e si ze = 450
Mean = 2. 53228
Medi an = 2. 45334
Mi ni mum = 0. 74655
Maxi mum = 5. 16867
Range = 4. 42212
St an. Dev. = 0. 69376
Aut ocor r el at i on = 0. 60726
We are primarily interested in the mean and standard
deviation. From the summary, we see that the mean is 2.53
and the standard deviation is 0.69.
Plot response
against
individual
factors
The next step is to plot the response against each individual
factor. For comparison, we generate both a scatter plot and
a box plot of the data. The scatter plot shows more detail.
However, comparisons are usually easier to see with the
box plot, particularly as the number of data points and
groups become larger.
Scatter plot
of width
versus
cassette
6.6.1.2. Graphical Representation of the Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc612.htm[4/17/2013 7:13:04 PM]
Box plot of
width versus
cassette
Interpretation We can make the following conclusions based on the above
scatter and box plots.
1. There is considerable variation in the location for the
various cassettes. The medians vary from about 1.7 to
4.
2. There is also some variation in the scale.
3. There are a number of outliers.
Scatter plot
of width
versus wafer
6.6.1.2. Graphical Representation of the Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc612.htm[4/17/2013 7:13:04 PM]
Box plot of
width versus
wafer
Interpretation We can make the following conclusions based on the above
scatter and box plots.
1. The locations for the three wafers are relatively
constant.
2. The scales for the three wafers are relatively constant.
3. There are a few outliers on the high side.
4. It is reasonable to treat the wafer factor as
homogeneous.
Scatter plot
of width
versus site
6.6.1.2. Graphical Representation of the Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc612.htm[4/17/2013 7:13:04 PM]
Box plot of
width versus
site
Interpretation We can make the following conclusions based on the above
scatter and box plots.
1. There is some variation in location based on site. The
center site in particular has a lower median.
2. The scales are relatively constant across sites.
3. There are a few outliers.
DOE mean
and sd plots
We can use the DOE mean plot and the DOE standard
deviation plot to show the factor means and standard
deviations together for better comparison.
DOE mean
plot
6.6.1.2. Graphical Representation of the Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc612.htm[4/17/2013 7:13:04 PM]
DOE sd plot
Summary The above graphs show that there are differences between
the lots and the sites.
There are various ways we can create subgroups of this
dataset: each lot could be a subgroup, each wafer could be
a subgroup, or each site measured could be a subgroup
(with only one data value in each subgroup).
Recall that for a classical Shewhart means chart, the
average within subgroup standard deviation is used to
calculate the control limits for the means chart. However,
with a means chart you are monitoring the subgroup mean-
to-mean variation. There is no problem if you are in a
continuous processing situation - this becomes an issue if
you are operating in a batch processing environment.
We will look at various control charts based on different
subgroupings in 6.6.1.3.
6.6.1.2. Graphical Representation of the Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc612.htm[4/17/2013 7:13:04 PM]
6.6.1.3. Subgroup Analysis
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc613.htm[4/17/2013 7:13:05 PM]
6. Process or Product Monitoring and Control
6.6. Case Studies in Process Monitoring
6.6.1. Lithography Process
6.6.1.3. Subgroup Analysis
Control
charts for
subgroups
The resulting classical Shewhart control charts for each
possible subgroup are shown below.
Site as
subgroup
The first pair of control charts use the site as the subgroup.
However, since site has a subgroup size of one we use the
control charts for individual measurements. A moving
average and a moving range chart are shown.
Moving
average
control chart
Moving
range control
chart
6.6.1.3. Subgroup Analysis
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc613.htm[4/17/2013 7:13:05 PM]
Wafer as
subgroup
The next pair of control charts use the wafer as the
subgroup. In this case, the subgroup size is five. A mean
and a standard deviation control chart are shown.
Mean control
chart
SD control
chart
There is no LCL for the standard deviation chart because of
the small subgroup size.
Cassette as
subgroup
The next pair of control charts use the cassette as the
subgroup. In this case, the subgroup size is 15. A mean and
a standard deviation control chart are shown.
Mean control
chart
6.6.1.3. Subgroup Analysis
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc613.htm[4/17/2013 7:13:05 PM]
SD control
chart
Interpretation Which of these subgroupings of the data is correct? As you
can see, each sugrouping produces a different chart. Part of
the answer lies in the manufacturing requirements for this
process. Another aspect that can be statistically determined
is the magnitude of each of the sources of variation. In
order to understand our data structure and how much
variation each of our sources contribute, we need to
perform a variance component analysis. The variance
component analysis for this data set is shown below.
Component
Variance
Component
Estimate
Cassette 0.2645
Wafer 0.0500
Site 0.1755
6.6.1.3. Subgroup Analysis
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc613.htm[4/17/2013 7:13:05 PM]
Variance
Component
Estimation
If your software does not generate the variance components
directly, they can be computed from a standard analysis of
variance output by equating mean squares (MS) to expected
mean squares (EMS).
The sum of squares and mean squares for a nested, random
effects model are shown below.
Degr ees of Sumof
Sour ce Fr eedom Squar es
Mean Squar es
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - -
Casset t e 29 127. 40293
4. 3932
Waf er ( Casset t e) 60 25. 52089
0. 4253
Si t e( Casset t e, Waf er ) 360 63. 17865
0. 1755
The expected mean squares for cassette, wafer within
cassette, and site within cassette and wafer, along with their
associated mean squares, are the following.
4. 3932 = ( 3*5) *Var ( casset t es) + 5*Var ( waf er ) +
Var ( si t e)
0. 4253 = 5*Var ( waf er ) + Var ( si t e)
0. 1755 = Var ( si t e)
Solving these equations, we obtain the variance component
estimates 0.2645, 0.04997, and 0.1755 for cassettes, wafers,
and sites, respectively.
All of the analyses in this section can be completed using R
code.
6.6.1.4. Shewhart Control Chart
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc614.htm[4/17/2013 7:13:06 PM]
6. Process or Product Monitoring and Control
6.6. Case Studies in Process Monitoring
6.6.1. Lithography Process
6.6.1.4. Shewhart Control Chart
Choosing
the right
control
charts to
monitor
the
process
The largest source of variation in this data is the lot-to-lot
variation. So, using classical Shewhart methods, if we specify
our subgroup to be anything other than lot, we will be ignoring
the known lot-to-lot variation and could get out-of-control
points that already have a known, assignable cause - the data
comes from different lots. However, in the lithography
processing area the measurements of most interest are the site
level measurements, not the lot means. How can we get
around this seeming contradiction?
Chart
sources of
variation
separately
One solution is to chart the important sources of variation
separately. We would then be able to monitor the variation of
our process and truly understand where the variation is coming
from and if it changes. For this dataset, this approach would
require having two sets of control charts, one for the
individual site measurements and the other for the lot means.
This would double the number of charts necessary for this
process (we would have 4 charts for line width instead of 2).
Chart only
most
important
source of
variation
Another solution would be to have one chart on the largest
source of variation. This would mean we would have one set
of charts that monitor the lot-to-lot variation. From a
manufacturing standpoint, this would be unacceptable.
Use
boxplot
type chart
We could create a non-standard chart that would plot all the
individual data values and group them together in a boxplot
type format by lot. The control limits could be generated to
monitor the individual data values while the lot-to-lot variation
would be monitored by the patterns of the groupings. This
would take special programming and management intervention
to implement non-standard charts in most floor shop control
systems.
Alternate
form for
mean
control
chart
A commonly applied solution is the first option; have multiple
charts on this process. When creating the control limits for the
lot means, care must be taken to use the lot-to-lot variation
instead of the within lot variation. The resulting control charts
are: the standard individuals/moving range charts (as seen
previously), and a control chart on the lot means that is
different from the previous lot means chart. This new chart
6.6.1.4. Shewhart Control Chart
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc614.htm[4/17/2013 7:13:06 PM]
uses the lot-to-lot variation to calculate control limits instead
of the average within-lot standard deviation. The
accompanying standard deviation chart is the same as seen
previously.
Mean
control
chart
using lot-
to-lot
variation
The control limits labeled with "UCL" and "LCL" are the
standard control limits. The control limits labeled with "UCL:
LL" and "LCL: LL" are based on the lot-to-lot variation.
6.6.1.5. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc615.htm[4/17/2013 7:13:07 PM]
6. Process or Product Monitoring and Control
6.6. Case Studies in Process Monitoring
6.6.1. Lithography Process
6.6.1.5. Work This Example Yourself
View
Dataplot
Macro for
this Case
Study
This page allows you to repeat the analysis outlined in the
case study description on the previous page using Dataplot . It
is required that you have already downloaded and installed
Dataplot and configured your browser. to run Dataplot. Output
from each analysis step below will be displayed in one or
more of the Dataplot windows. The four main windows are the
Output Window, the Graphics window, the Command History
window, and the data sheet window. Across the top of the
main windows there are menus for executing Dataplot
commands. Across the bottom is a command entry window
where commands can be typed in.
Data Analysis Steps
Results and
Conclusions
Click on the links below to start Dataplot and
run this case study yourself. Each step may use
results from previous steps, so please be patient.
Wait until the software verifies that the current
step is complete before clicking on the next step.
The links in this column
will connect you with
more detailed
information about each
analysis step from the
case study description.
1. I nvoke Dat apl ot and r ead dat a.
1. Read i n t he dat a. 1. You have r ead 5
col umns of number s
i nt o Dat apl ot ,
var i abl es CASSETTE,
WAFER, SI TE,
WI DTH, and RUNSEQ.
2. Pl ot of t he r esponse var i abl e
1. Numer i cal summar y of WI DTH.
2. 4- Pl ot of WI DTH.
3. Run sequence pl ot of WI DTH.
1. The summar y shows
t he mean l i ne wi dt h
i s 2. 53 and t he
st andar d devi at i on
of t he l i ne
wi dt h i s 0. 69.
2. The 4- pl ot shows
non- const ant
l ocat i on and
scal e and moder at e
6.6.1.5. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc615.htm[4/17/2013 7:13:07 PM]
aut ocor r el at i on.
3. The r un sequence
pl ot shows
non- const ant
l ocat i on and scal e.
3. Gener at e scat t er and box pl ot s
agai nst
i ndi vi dual f act or s.
1. Scat t er pl ot of WI DTH ver sus
CASSETTE.
2. Box pl ot of WI DTH ver sus
CASSETTE.
3. Scat t er pl ot of WI DTH ver sus
WAFER.
4. Box pl ot of WI DTH ver sus
WAFER.
5. Scat t er pl ot of WI DTH ver sus
SI TE.
6. Box pl ot of WI DTH ver sus
SI TE.
7. DOE mean pl ot of WI DTH ver sus
CASSETTE, WAFER, and SI TE.
8. DOE sd pl ot of WI DTH ver sus
CASSETTE, WAFER, and SI TE.
1. The scat t er pl ot
shows consi der abl e
var i at i on i n
l ocat i on.
2. The box pl ot
shows consi der abl e
var i at i on i n
l ocat i on and scal e
and t he pr escence
of some out l i er s.
3. The scat t er pl ot
shows mi ni mal
var i at i on i n
l ocat i on and scal e.
4. The box pl ot
shows mi ni mal
var i at i on i n
l ocat i on and scal e.
I t al so show
some out l i er s.
5. The scat t er pl ot
shows some
var i at i on i n
l ocat i on.
6. The box pl ot
shows some
var i at i on i n
l ocat i on. Scal e
seems r el at i vel y
const ant .
Some out l i er s.
7. The DOE mean
pl ot shows ef f ect s
f or CASSETTE and
SI TE, no ef f ect
f or WAFER.
8. The DOE sd pl ot
shows ef f ect s
f or CASSETTE and
SI TE, no ef f ect
f or WAFER.
4. Subgr oup anal ysi s.
1. Gener at e a movi ng mean cont r ol
char t .
2. Gener at e a movi ng r ange cont r ol
char t .
3. Gener at e a mean cont r ol char t
f or WAFER.
1. The movi ng mean
pl ot shows
a l ar ge number of
out - of -
cont r ol poi nt s.
2. The movi ng r ange
pl ot shows
a l ar ge number of
out - of -
6.6.1.5. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc615.htm[4/17/2013 7:13:07 PM]
4. Gener at e a sd cont r ol char t
f or WAFER.
5. Gener at e a mean cont r ol char t
f or CASSETTE.
6. Gener at e a sd cont r ol char t
f or CASSETTE.
7. Gener at e an anal ysi s of
var i ance. Thi s i s not
cur r ent l y i mpl ement ed i n
DATAPLOT f or nest ed
dat aset s.
8. Gener at e a mean cont r ol char t
usi ng l ot - t o- l ot var i at i on.
cont r ol poi nt s.
3. The mean cont r ol
char t shows
a l ar ge number of
out - of -
cont r ol poi nt s.
4. The sd cont r ol
char t shows
no out - of - cont r ol
poi nt s.
5. The mean cont r ol
char t shows
a l ar ge number of
out - of -
cont r ol poi nt s.
6. The sd cont r ol
char t shows
no out - of - cont r ol
poi nt s.
7. The anal ysi s of
var i ance and
component s of
var i ance
cal cul at i ons show
t hat
casset t e t o
casset t e
var i at i on i s 54%
of t he t ot al
and si t e t o si t e
var i at i on
i s 36% of t he
t ot al .
8. The mean cont r ol
char t shows one
poi nt t hat i s on
t he boundar y of
bei ng out of
cont r ol .
6.6.2. Aerosol Particle Size
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc62.htm[4/17/2013 7:13:07 PM]
6. Process or Product Monitoring and Control
6.6. Case Studies in Process Monitoring
6.6.2. Aerosol Particle Size
Box-
Jenkins
Modeling
of Aerosol
Particle
Size
This case study illustrates the use of Box-Jenkins modeling
with aerosol particle size data.
1. Background and Data
2. Model Identification
3. Model Estimation
4. Model Validation
5. Work This Example Yourself
6.6.2.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc621.htm[4/17/2013 7:13:08 PM]
6. Process or Product Monitoring and Control
6.6. Case Studies in Process Monitoring
6.6.2. Aerosol Particle Size
6.6.2.1. Background and Data
Data Source The source of the data for this case study is Antuan Negiz
who analyzed these data while he was a post-doc in the
NIST Statistical Engineering Division from the Illinois
Institute of Technology.
Data
Collection
These data were collected from an aerosol mini-spray dryer
device. The purpose of this device is to convert a slurry
stream into deposited particles in a drying chamber. The
device injects the slurry at high speed. The slurry is
pulverized as it enters the drying chamber when it comes into
contact with a hot gas stream at low humidity. The liquid
contained in the pulverized slurry particles is vaporized, then
transferred to the hot gas stream leaving behind dried small-
sized particles.
The response variable is particle size, which is collected
equidistant in time. There are a variety of associated
variables that may affect the injection process itself and
hence the size and quality of the deposited particles. For this
case study, we restrict our analysis to the response variable.
Applications Such deposition process operations have many applications
from powdered laundry detergents at one extreme to ceramic
molding at an important other extreme. In ceramic molding,
the distribution and homogeneity of the particle sizes are
particularly important because after the molds are baked and
cured, the properties of the final molded ceramic product is
strongly affected by the intermediate uniformity of the base
ceramic particles, which in turn is directly reflective of the
quality of the initial atomization process in the aerosol
injection device.
Aerosol
Particle
Size
Dynamic
Modeling
and Control
The data set consists of particle sizes collected over time.
The basic distributional properties of this process are of
interest in terms of distributional shape, constancy of size,
and variation in size. In addition, this time series may be
examined for autocorrelation structure to determine a
prediction model of particle size as a function of time--such
a model is frequently autoregressive in nature. Such a high-
quality prediction equation would be essential as a first step
in developing a predictor-corrective recursive feedback
6.6.2.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc621.htm[4/17/2013 7:13:08 PM]
mechanism which would serve as the core in developing and
implementing real-time dynamic corrective algorithms. The
net effect of such algorthms is, of course, a particle size
distribution that is much less variable, much more stable in
nature, and of much higher quality. All of this results in final
ceramic mold products that are more uniform and predictable
across a wide range of important performance
characteristics.
For the purposes of this case study, we restrict the analysis to
determining an appropriate Box-Jenkins model of the particle
size.
Software The analyses used in this case study can be generated using
both Dataplot code and R code.
Case study
data
115. 36539
114. 63150
114. 63150
116. 09940
116. 34400
116. 09940
116. 34400
116. 83331
116. 34400
116. 83331
117. 32260
117. 07800
117. 32260
117. 32260
117. 81200
117. 56730
118. 30130
117. 81200
118. 30130
117. 81200
118. 30130
118. 30130
118. 54590
118. 30130
117. 07800
116. 09940
118. 30130
118. 79060
118. 05661
118. 30130
118. 54590
118. 30130
118. 54590
118. 05661
118. 30130
118. 54590
118. 30130
118. 30130
118. 30130
118. 30130
118. 05661
118. 30130
117. 81200
118. 30130
117. 32260
117. 32260
117. 56730
117. 81200
117. 56730
117. 81200
117. 81200
117. 32260
116. 34400
116. 58870
116. 83331
116. 58870
6.6.2.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc621.htm[4/17/2013 7:13:08 PM]
116. 83331
116. 83331
117. 32260
116. 34400
116. 09940
115. 61010
115. 61010
115. 61010
115. 36539
115. 12080
115. 61010
115. 85471
115. 36539
115. 36539
115. 36539
115. 12080
114. 87611
114. 87611
115. 12080
114. 87611
114. 87611
114. 63150
114. 63150
114. 14220
114. 38680
114. 14220
114. 63150
114. 87611
114. 38680
114. 87611
114. 63150
114. 14220
114. 14220
113. 89750
114. 14220
113. 89750
113. 65289
113. 65289
113. 40820
113. 40820
112. 91890
113. 40820
112. 91890
113. 40820
113. 89750
113. 40820
113. 65289
113. 89750
113. 65289
113. 65289
113. 89750
113. 65289
113. 16360
114. 14220
114. 38680
113. 65289
113. 89750
113. 89750
113. 40820
113. 65289
113. 89750
113. 65289
113. 65289
114. 14220
114. 38680
114. 63150
115. 61010
115. 12080
114. 63150
114. 38680
113. 65289
113. 40820
113. 40820
113. 16360
113. 16360
113. 16360
113. 16360
113. 16360
112. 42960
113. 40820
113. 40820
6.6.2.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc621.htm[4/17/2013 7:13:08 PM]
113. 16360
113. 16360
113. 16360
113. 16360
111. 20631
112. 67420
112. 91890
112. 67420
112. 91890
113. 16360
112. 91890
112. 67420
112. 91890
112. 67420
112. 91890
113. 16360
112. 67420
112. 67420
112. 91890
113. 16360
112. 67420
112. 91890
111. 20631
113. 40820
112. 91890
112. 67420
113. 16360
113. 65289
113. 40820
114. 14220
114. 87611
114. 87611
116. 09940
116. 34400
116. 58870
116. 09940
116. 34400
116. 83331
117. 07800
117. 07800
116. 58870
116. 83331
116. 58870
116. 34400
116. 83331
116. 83331
117. 07800
116. 58870
116. 58870
117. 32260
116. 83331
118. 79060
116. 83331
117. 07800
116. 58870
116. 83331
116. 34400
116. 58870
116. 34400
116. 34400
116. 34400
116. 09940
116. 09940
116. 34400
115. 85471
115. 85471
115. 85471
115. 61010
115. 61010
115. 61010
115. 36539
115. 12080
115. 61010
115. 85471
115. 12080
115. 12080
114. 87611
114. 87611
114. 38680
114. 14220
114. 14220
6.6.2.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc621.htm[4/17/2013 7:13:08 PM]
114. 38680
114. 14220
114. 38680
114. 38680
114. 38680
114. 38680
114. 38680
114. 14220
113. 89750
114. 14220
113. 65289
113. 16360
112. 91890
112. 67420
112. 42960
112. 42960
112. 42960
112. 18491
112. 18491
112. 42960
112. 18491
112. 42960
111. 69560
112. 42960
112. 42960
111. 69560
111. 94030
112. 18491
112. 18491
112. 18491
111. 94030
111. 69560
111. 94030
111. 94030
112. 42960
112. 18491
112. 18491
111. 94030
112. 18491
112. 18491
111. 20631
111. 69560
111. 69560
111. 69560
111. 94030
111. 94030
112. 18491
111. 69560
112. 18491
111. 94030
111. 69560
112. 18491
110. 96170
111. 69560
111. 20631
111. 20631
111. 45100
110. 22771
109. 98310
110. 22771
110. 71700
110. 22771
111. 20631
111. 45100
111. 69560
112. 18491
112. 18491
112. 18491
112. 42960
112. 67420
112. 18491
112. 42960
112. 18491
112. 91890
112. 18491
112. 42960
111. 20631
112. 42960
112. 42960
112. 42960
112. 42960
6.6.2.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc621.htm[4/17/2013 7:13:08 PM]
113. 16360
112. 18491
112. 91890
112. 91890
112. 67420
112. 42960
112. 42960
112. 42960
112. 91890
113. 16360
112. 67420
113. 16360
112. 91890
112. 42960
112. 67420
112. 91890
112. 18491
112. 91890
113. 16360
112. 91890
112. 91890
112. 91890
112. 67420
112. 42960
112. 42960
113. 16360
112. 91890
112. 67420
113. 16360
112. 91890
113. 16360
112. 91890
112. 67420
112. 91890
112. 67420
112. 91890
112. 91890
112. 91890
113. 16360
112. 91890
112. 91890
112. 18491
112. 42960
112. 42960
112. 18491
112. 91890
112. 67420
112. 42960
112. 42960
112. 18491
112. 42960
112. 67420
112. 42960
112. 42960
112. 18491
112. 67420
112. 42960
112. 42960
112. 67420
112. 42960
112. 42960
112. 42960
112. 67420
112. 91890
113. 40820
113. 40820
113. 40820
112. 91890
112. 67420
112. 67420
112. 91890
113. 65289
113. 89750
114. 38680
114. 87611
114. 87611
115. 12080
115. 61010
115. 36539
115. 61010
115. 85471
6.6.2.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc621.htm[4/17/2013 7:13:08 PM]
116. 09940
116. 83331
116. 34400
116. 58870
116. 58870
116. 34400
116. 83331
116. 83331
116. 83331
117. 32260
116. 83331
117. 32260
117. 56730
117. 32260
117. 07800
117. 32260
117. 81200
117. 81200
117. 81200
118. 54590
118. 05661
118. 05661
117. 56730
117. 32260
117. 81200
118. 30130
118. 05661
118. 54590
118. 05661
118. 30130
118. 05661
118. 30130
118. 30130
118. 30130
118. 05661
117. 81200
117. 32260
118. 30130
118. 30130
117. 81200
117. 07800
118. 05661
117. 81200
117. 56730
117. 32260
117. 32260
117. 81200
117. 32260
117. 81200
117. 07800
117. 32260
116. 83331
117. 07800
116. 83331
116. 83331
117. 07800
115. 12080
116. 58870
116. 58870
116. 34400
115. 85471
116. 34400
116. 34400
115. 85471
116. 58870
116. 34400
115. 61010
115. 85471
115. 61010
115. 85471
115. 12080
115. 61010
115. 61010
115. 85471
115. 61010
115. 36539
114. 87611
114. 87611
114. 63150
114. 87611
115. 12080
6.6.2.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc621.htm[4/17/2013 7:13:08 PM]
114. 63150
114. 87611
115. 12080
114. 63150
114. 38680
114. 38680
114. 87611
114. 63150
114. 63150
114. 63150
114. 63150
114. 63150
114. 14220
113. 65289
113. 65289
113. 89750
113. 65289
113. 40820
113. 40820
113. 89750
113. 89750
113. 89750
113. 65289
113. 65289
113. 89750
113. 40820
113. 40820
113. 65289
113. 89750
113. 89750
114. 14220
113. 65289
113. 40820
113. 40820
113. 65289
113. 40820
114. 14220
113. 89750
114. 14220
113. 65289
113. 65289
113. 65289
113. 89750
113. 16360
113. 16360
113. 89750
113. 65289
113. 16360
113. 65289
113. 40820
112. 91890
113. 16360
113. 16360
113. 40820
113. 40820
113. 65289
113. 16360
113. 40820
113. 16360
113. 16360
112. 91890
112. 91890
112. 91890
113. 65289
113. 65289
113. 16360
112. 91890
112. 67420
113. 16360
112. 91890
112. 67420
112. 91890
112. 91890
112. 91890
111. 20631
112. 91890
113. 16360
112. 42960
112. 67420
113. 16360
112. 42960
6.6.2.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc621.htm[4/17/2013 7:13:08 PM]
112. 67420
112. 91890
112. 67420
111. 20631
112. 42960
112. 67420
112. 42960
113. 16360
112. 91890
112. 67420
112. 91890
112. 42960
112. 67420
112. 18491
112. 91890
112. 42960
112. 18491
6.6.2.2. Model Identification
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc622.htm[4/17/2013 7:13:08 PM]
6. Process or Product Monitoring and Control
6.6. Case Studies in Process Monitoring
6.6.2. Aerosol Particle Size
6.6.2.2. Model Identification
Check for
Stationarity,
Outliers,
Seasonality
The first step in the analysis is to generate a run sequence
plot of the response variable. A run sequence plot can
indicate stationarity (i.e., constant location and scale), the
presence of outliers, and seasonal patterns.
Non-stationarity can often be removed by differencing the
data or fitting some type of trend curve. We would then
attempt to fit a Box-Jenkins model to the differenced data or
to the residuals after fitting a trend curve.
Although Box-Jenkins models can estimate seasonal
components, the analyst needs to specify the seasonal period
(for example, 12 for monthly data). Seasonal components are
common for economic time series. They are less common for
engineering and scientific data.
Run Sequence
Plot
Interpretation
of the Run
Sequence Plot
We can make the following conclusions from the run
sequence plot.
1. The data show strong and positive autocorrelation.
2. There does not seem to be a significant trend or any
obvious seasonal pattern in the data.
6.6.2.2. Model Identification
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc622.htm[4/17/2013 7:13:08 PM]
The next step is to examine the sample autocorrelations using
the autocorrelation plot.
Autocorrelation
Plot
Interpretation
of the
Autocorrelation
Plot
The autocorrelation plot has a 95% confidence band, which
is constructed based on the assumption that the process is a
moving average process. The autocorrelation plot shows that
the sample autocorrelations are very strong and positive and
decay very slowly.
The autocorrelation plot indicates that the process is non-
stationary and suggests an ARIMA model. The next step is to
difference the data.
Run Sequence
Plot of
Differenced
Data
Interpretation
of the Run
Sequence Plot
The run sequence plot of the differenced data shows that the
mean of the differenced data is around zero, with the
differenced data less autocorrelated than the original data.
The next step is to examine the sample autocorrelations of
6.6.2.2. Model Identification
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc622.htm[4/17/2013 7:13:08 PM]
the differenced data.
Autocorrelation
Plot of the
Differenced
Data
Interpretation
of the
Autocorrelation
Plot of the
Differenced
Data
The autocorrelation plot of the differenced data with a 95%
confidence band shows that only the autocorrelation at lag 1
is significant. The autocorrelation plot together with run
sequence of the differenced data suggest that the differenced
data are stationary. Based on the autocorrelation plot, an
MA(1) model is suggested for the differenced data.
To examine other possible models, we produce the partial
autocorrelation plot of the differenced data.
Partial
Autocorrelation
Plot of the
Differenced
Data
Interpretation
of the Partial
Autocorrelation
Plot of the
Differenced
Data
The partial autocorrelation plot of the differenced data with
95% confidence bands shows that only the partial
autocorrelations of the first and second lag are significant.
This suggests an AR(2) model for the differenced data.
6.6.2.2. Model Identification
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc622.htm[4/17/2013 7:13:08 PM]
Akaike
Information
Criterion (AIC
and AICC)
Information-based criteria, such as the AIC or AICC (see
Brockwell and Davis (2002), pp. 171-174), can be used to
automate the choice of an appropriate model. Many software
programs for time series analysis will generate the AIC or
AICC for a broad range of models.
Whatever method is used for model identification, model
diagnostics should be performed on the selected model.
Based on the plots in this section, we will examine the
ARIMA(2,1,0) and ARIMA(0,1,1) models in detail.
6.6.2.3. Model Estimation
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc623.htm[4/17/2013 7:13:09 PM]
6. Process or Product Monitoring and Control
6.6. Case Studies in Process Monitoring
6.6.2. Aerosol Particle Size
6.6.2.3. Model Estimation
AR(2)
Model
Parameter
Estimates
The following parameter estimates were computed for the AR(2) model based on the
differenced data.
Par amet er St andar d 95 %Conf i dence
Sour ce Est i mat e Er r or I nt er val
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
I nt er cept - 0. 0050 0. 0119
AR1 - 0. 4064 0. 0419 ( - 0. 4884, - 0. 3243)
AR2 - 0. 1649 0. 0419 ( - 0. 2469, - 0. 0829)
Number of Obser vat i ons: 558
Degr ees of Fr eedom: 558 - 3 = 555
Resi dual St andar d Devi at i on: 0. 4423
Both AR parameters are significant since the confidence intervals do not contain zero.
The model for the differenced data, Y
t
, is an AR(2) model:
with = 0.4423.
It is often more convenient to express the model in terms of the original data, X
t
, rather
than the differenced data. From the definition of the difference, Y
t
= X
t
- X
t-1
, we can
make the appropriate substitutions into the above equation:
to arrive at the model in terms of the original series:
MA(1)
Model
Parameter
Estimates
Alternatively, the parameter estimates for an MA(1) model based on the differenced data
are the following.
Par amet er St andar d 95 %Conf i dence
Sour ce Est i mat e Er r or I nt er val
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
I nt er cept - 0. 0051 0. 0114
MA1 - 0. 3921 0. 0366 ( - 0. 4638, - 0. 3205)
Number of Obser vat i ons: 558
Degr ees of Fr eedom: 558 - 2 = 556
Resi dual St andar d Devi at i on: 0. 4434
6.6.2.3. Model Estimation
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc623.htm[4/17/2013 7:13:09 PM]
The model for the differenced data, Y
t
, is an ARIMA(0,1,1) model:
with = 0.4434.
It is often more convenient to express the model in terms of the original data, X
t
, rather
than the differenced data. Making the appropriate substitutions into the above equation:
we arrive at the model in terms of the original series:
6.6.2.4. Model Validation
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc624.htm[4/17/2013 7:13:10 PM]
6. Process or Product Monitoring and Control
6.6. Case Studies in Process Monitoring
6.6.2. Aerosol Particle Size
6.6.2.4. Model Validation
Residuals After fitting the model, we should check whether the model
is appropriate.
As with standard non-linear least squares fitting, the primary
tool for model diagnostic checking is residual analysis.
4-Plot of
Residuals from
ARIMA(2,1,0)
Model
The 4-plot is a convenient graphical technique for model
validation in that it tests the assumptions for the residuals on
a single graph.
Interpretation
of the 4-Plot
We can make the following conclusions based on the above
4-plot.
1. The run sequence plot shows that the residuals do not
violate the assumption of constant location and scale. It
also shows that most of the residuals are in the range (-
1, 1).
2. The lag plot indicates that the residuals are not
autocorrelated at lag 1.
3. The histogram and normal probability plot indicate that
the normal distribution provides an adequate fit for this
model.
6.6.2.4. Model Validation
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc624.htm[4/17/2013 7:13:10 PM]
Autocorrelation
Plot of
Residuals from
ARIMA(2,1,0)
Model
In addition, the autocorrelation plot of the residuals from the
ARIMA(2,1,0) model was generated.
Interpretation
of the
Autocorrelation
Plot
The autocorrelation plot shows that for the first 25 lags, all
sample autocorrelations except those at lags 7 and 18 fall
inside the 95 % confidence bounds indicating the residuals
appear to be random.
Test the
Randomness of
Residuals From
the
ARIMA(2,1,0)
Model Fit
We apply the Box-Ljung test to the residuals from the
ARIMA(2,1,0) model fit to determine whether residuals are
random. In this example, the Box-Ljung test shows that the
first 24 lag autocorrelations among the residuals are zero (p-
value = 0.080), indicating that the residuals are random and
that the model provides an adequate fit to the data.
4-Plot of
Residuals from
ARIMA(0,1,1)
Model
The 4-plot is a convenient graphical technique for model
validation in that it tests the assumptions for the residuals on
a single graph.
6.6.2.4. Model Validation
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc624.htm[4/17/2013 7:13:10 PM]
Interpretation
of the 4-Plot
from the
ARIMA(0,1,1)
Model
We can make the following conclusions based on the above
4-plot.
1. The run sequence plot shows that the residuals do not
violate the assumption of constant location and scale. It
also shows that most of the residuals are in the range (-
1, 1).
2. The lag plot indicates that the residuals are not
autocorrelated at lag 1.
3. The histogram and normal probability plot indicate that
the normal distribution provides an adequate fit for this
model.
This 4-plot of the residuals indicates that the fitted model is
adequate for the data.
Autocorrelation
Plot of
Residuals from
ARIMA(0,1,1)
Model
The autocorrelation plot of the residuals from ARIMA(0,1,1)
was generated.
6.6.2.4. Model Validation
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc624.htm[4/17/2013 7:13:10 PM]
Interpretation
of the
Autocorrelation
Plot
Similar to the result for the ARIMA(2,1,0) model, it shows
that for the first 25 lags, all sample autocorrelations expect
those at lags 7 and 18 fall inside the 95% confidence bounds
indicating the residuals appear to be random.
Test the
Randomness of
Residuals From
the
ARIMA(0,1,1)
Model Fit
The Box-Ljung test is also applied to the residuals from the
ARIMA(0,1,1) model. The test indicates that there is at least
one non-zero autocorrelation amont the first 24 lags. We
conclude that there is not enough evidence to claim that the
residuals are random (p-value = 0.026).
Summary Overall, the ARIMA(0,1,1) is an adequate model. However,
the ARIMA(2,1,0) is a little better than the ARIMA(0,1,1).
6.6.2.5. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc625.htm[4/17/2013 7:13:11 PM]
6. Process or Product Monitoring and Control
6.6. Case Studies in Process Monitoring
6.6.2. Aerosol Particle Size
6.6.2.5. Work This Example Yourself
View
Dataplot
Macro for
this Case
Study
This page allows you to repeat the analysis outlined in the
case study description on the previous page using Dataplot . It
is required that you have already downloaded and installed
Dataplot and configured your browser. to run Dataplot. Output
from each analysis step below will be displayed in one or
more of the Dataplot windows. The four main windows are the
Output Window, the Graphics window, the Command History
window, and the data sheet window. Across the top of the
main windows there are menus for executing Dataplot
commands. Across the bottom is a command entry window
where commands can be typed in.
Data Analysis Steps
Results and
Conclusions
Click on the links below to start Dataplot and
run this case study yourself. Each step may use
results from previous steps, so please be patient.
Wait until the software verifies that the current
step is complete before clicking on the next step.
The links in this column
will connect you with
more detailed
information about each
analysis step from the
case study description.
1. I nvoke Dat apl ot and r ead dat a.
1. Read i n t he dat a. 1. You have r ead
one col umn of number s
i nt o Dat apl ot ,
var i abl e Y.
2. Model i dent i f i cat i on pl ot s
1. Run sequence pl ot of Y.
2. Aut ocor r el at i on pl ot of Y.
3. Run sequence pl ot of t he
di f f er enced dat a of Y.
1. The r un sequence
pl ot shows t hat t he
dat a show st r ong
and posi t i ve
aut ocor r el at i on.
2. The
aut ocor r el at i on pl ot
i ndi cat es
si gni f i cant
aut ocor r el at i on
and t hat t he
dat a ar e not
st at i onar y.
6.6.2.5. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc625.htm[4/17/2013 7:13:11 PM]
4. Aut ocor r el at i on pl ot of t he
di f f er enced dat a of Y.
5. Par t i al aut ocor r el at i on pl ot
of t he di f f er enced dat a of Y.
3. The r un sequence
pl ot shows t hat t he
di f f er enced dat a
appear t o be
st at i onar y
and do not
exhi bi t seasonal i t y.
4. The
aut ocor r el at i on pl ot
of t he
di f f er enced dat a
suggest s an
ARI MA( 0, 1, 1)
model may be
appr opr i at e.
5. The par t i al
aut ocor r el at i on pl ot
suggest s an
ARI MA( 2, 1, 0) model
may
be appr opr i at e.
3. Est i mat e t he model .
1. ARI MA( 2, 1, 0) f i t of Y.
2. ARI MA( 0, 1, 1) f i t of Y.
1. The ARMA f i t
gener at es par amet er
est i mat es f or t he
ARI MA( 2, 1, 0)
model .
2. The ARMA f i t
gener at es par amet er
est i mat es f or t he
ARI MA( 0, 1, 1)
model .
4. Model val i dat i on.
1. Gener at e a 4- pl ot of t he
r esi dual s f r om t he ARI MA( 2, 1, 0)
model .
2. Gener at e an aut ocor r el at i on pl ot
of t he r esi dual s f r om t he
ARI MA( 2, 1, 0) model .
3. Per f or ma Lj ung- Box t est of
r andomness f or t he r esi dual s f r om
t he ARI MA( 2, 1, 0) model .
4. Gener at e a 4- pl ot of t he
r esi dual s f r om t he ARI MA( 0, 1, 1)
model .
5. Gener at e an aut ocor r el at i on pl ot
of t he r esi dual s f r om t he
ARI MA( 0, 1, 1) model .
6. Per f or ma Lj ung- Box t est of
r andomness f or t he r esi dual s f r om
t he ARI MA( 0, 1, 1) model .
1. The 4- pl ot shows
t hat t he
assumpt i ons f or
t he r esi dual s
ar e sat i sf i ed.
2. The
aut ocor r el at i on pl ot
of t he
r esi dual s
i ndi cat es t hat t he
r esi dual s ar e
r andom.
3. The Lj ung- Box
t est i ndi cat es
t hat t he
r esi dual s ar e
r andom.
4. The 4- pl ot shows
t hat t he
assumpt i ons f or
t he r esi dual s
ar e sat i sf i ed.
5. The
6.6.2.5. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc625.htm[4/17/2013 7:13:11 PM]
aut ocor r el at i on pl ot
of t he
r esi dual s
i ndi cat es t hat t he
r esi dual s ar e
r andom.
6. The Lj ung- Box
t est i ndi cat es
t hat t he
r esi dual s ar e not
r andomat t he 95%
l evel , but
ar e r andomat t he
99% l evel .
6.7. References
http://www.itl.nist.gov/div898/handbook/pmc/section7/pmc7.htm[4/17/2013 7:13:11 PM]
6. Process or Product Monitoring and Control
6.7. References
Selected References
Time Series Analysis
Abraham, B. and Ledolter, J. (1983). Statistical Methods for Forecasting,
Wiley, New York, NY.
Box, G. E. P., Jenkins, G. M., and Reinsel, G. C. (1994). Time Series
Analysis, Forecasting and Control, 3rd ed. Prentice Hall, Englewood Clifs,
NJ.
Box, G. E. P. and McGregor, J. F. (1974). "The Analysis of Closed-Loop
Dynamic Stochastic Systems", Technometrics, Vol. 16-3.
Brockwell, Peter J. and Davis, Richard A. (1987). Time Series: Theory and
Methods, Springer-Verlang.
Brockwell, Peter J. and Davis, Richard A. (2002). Introduction to Time
Series and Forecasting, 2nd. ed., Springer-Verlang.
Chatfield, C. (1996). The Analysis of Time Series, 5th ed., Chapman & Hall,
New York, NY.
DeLurgio, S. A. (1998). Forecasting Principles and Applications, Irwin
McGraw-Hill, Boston, MA.
Ljung, G. and Box, G. (1978). "On a Measure of Lack of Fit in Time Series
Models", Biometrika, 65, 297-303.
Nelson, C. R. (1973). Applied Time Series Analysis for Managerial
Forecasting, Holden-Day, Boca-Raton, FL.
Makradakis, S., Wheelwright, S. C. and McGhee, V. E. (1983). Forecasting:
Methods and Applications, 2nd ed., Wiley, New York, NY.
Statistical Process and Quality Control
Army Chemical Corps (1953). Master Sampling Plans for Single, Duplicate,
Double and Multiple Sampling, Manual No. 2.
Bissell, A. F. (1990). "How Reliable is Your Capability Index?", Applied
Statistics, 39, 331-340.
Champ, C.W., and Woodall, W.H. (1987). "Exact Results for Shewhart
6.7. References
http://www.itl.nist.gov/div898/handbook/pmc/section7/pmc7.htm[4/17/2013 7:13:11 PM]
Control Charts with Supplementary Runs Rules", Technometrics, 29, 393-
399.
Duncan, A. J. (1986). Quality Control and Industrial Statistics, 5th ed.,
Irwin, Homewood, IL.
Hotelling, H. (1947). Multivariate Quality Control. In C. Eisenhart, M. W.
Hastay, and W. A. Wallis, eds. Techniques of Statistical Analysis. New
York: McGraw-Hill.
Juran, J. M. (1997). "Early SQC: A Historical Supplement", Quality
Progress, 30(9) 73-81.
Montgomery, D. C. (2000). Introduction to Statistical Quality Control, 4th
ed., Wiley, New York, NY.
Kotz, S. and Johnson, N. L. (1992). Process Capability Indices, Chapman &
Hall, London.
Lowry, C. A., Woodall, W. H., Champ, C. W., and Rigdon, S. E. (1992). "A
Multivariate Exponentially Weighted Moving Average Chart",
Technometrics, 34, 46-53.
Lucas, J. M. and Saccucci, M. S. (1990). "Exponentially weighted moving
average control schemes: Properties and enhancements", Technometrics 32,
1-29.
Ott, E. R. and Schilling, E. G. (1990). Process Quality Control, 2nd ed.,
McGraw-Hill, New York, NY.
Quesenberry, C. P. (1993). "The effect of sample size on estimated limits for
and X control charts", Journal of Quality Technology, 25(4) 237-247.
Ryan, T.P. (2000). Statistical Methods for Quality Improvement, 2nd ed.,
Wiley, New York, NY.
Ryan, T. P. and Schwertman, N. C. (1997). "Optimal limits for attributes
control charts", Journal of Quality Technology, 29 (1), 86-98.
Schilling, E. G. (1982). Acceptance Sampling in Quality Control, Marcel
Dekker, New York, NY.
Tracy, N. D., Young, J. C. and Mason, R. L. (1992). "Multivariate Control
Charts for Individual Observations", Journal of Quality Technology, 24(2),
88-95.
Woodall, W. H. (1997). "Control Charting Based on Attribute Data:
Bibliography and Review", Journal of Quality Technology, 29, 172-183.
Woodall, W. H., and Adams, B. M. (1993); "The Statistical Design of
CUSUM Charts", Quality Engineering, 5(4), 559-570.
Zhang, Stenback, and Wardrop (1990). "Interval Estimation of the Process
Capability Index", Communications in Statistics: Theory and Methods,
19(21), 4455-4470.
6.7. References
http://www.itl.nist.gov/div898/handbook/pmc/section7/pmc7.htm[4/17/2013 7:13:11 PM]
Statistical Analysis
Anderson, T. W. (1984). Introduction to Multivariate Statistical Analysis,
2nd ed., Wiley New York, NY.
Johnson, R. A. and Wichern, D. W. (1998). Applied Multivariate Statistical
Analysis, Fourth Ed., Prentice Hall, Upper Saddle River, NJ.
7. Product and Process Comparisons
http://www.itl.nist.gov/div898/handbook/prc/prc.htm[4/17/2013 7:13:12 PM]
7. Product and Process
Comparisons
This chapter presents the background and specific analysis techniques
needed to compare the performance of one or more processes against known
standards or one another.
1. Introduction
1. Scope
2. Assumptions
3. Statistical Tests
4. Confidence Intervals
5. Equivalence of Tests and
Intervals
6. Outliers
7. Trends
2. Comparisons: One Process
1. Comparing to a Distribution
2. Comparing to a Nominal
Mean
3. Comparing to Nominal
Variability
4. Fraction Defective
5. Defect Density
6. Location of Population
Values
3. Comparisons: Two Processes
1. Means: Normal Data
2. Variability: Normal Data
3. Fraction Defective
4. Failure Rates
5. Means: General Case
4. Comparisons: Three +
Processes
1. Comparing Populations
2. Comparing Variances
3. Comparing Means
4. Variance Components
5. Comparing Categorical
Datasets
6. Comparing Fraction
Defectives
7. Multiple Comparisons
Detailed table of contents
References for Chapter 7
7.1. Introduction
http://www.itl.nist.gov/div898/handbook/prc/section1/prc1.htm[4/17/2013 7:13:12 PM]
7. Product and Process Comparisons
7.1. Introduction
Goals of
this
section
The primary goal of this section is to lay a foundation for
understanding statistical tests and confidence intervals that are
useful for making decisions about processes and comparisons
among processes. The materials covered are:
Scope
Assumptions
Introduction to hypothesis testing
Introduction to confidence intervals
Relationship between hypothesis testing and confidence
intervals
Outlier detection
Detection of sequential trends in data or processes
Hypothesis
testing and
confidence
intervals
This chapter explores the types of comparisons which can be
made from data and explains hypothesis testing, confidence
intervals, and the interpretation of each.
7.1.1. What is the scope?
http://www.itl.nist.gov/div898/handbook/prc/section1/prc11.htm[4/17/2013 7:13:13 PM]
7. Product and Process Comparisons
7.1. Introduction
7.1.1. What is the scope?
Data from
one
process
This section deals with introductory material related to
comparisons that can be made on data from one process for
cases where the process standard deviation may be known or
unknown.
7.1.2. What assumptions are typically made?
http://www.itl.nist.gov/div898/handbook/prc/section1/prc12.htm[4/17/2013 7:13:13 PM]
7. Product and Process Comparisons
7.1. Introduction
7.1.2. What assumptions are typically made?
Validity of
tests
The validity of the tests described in this chapter depend
on the following assumptions:
1. The data come from a single process that can be
represented by a single statistical distribution.
2. The distribution is a normal distribution.
3. The data are uncorrelated over time.
An easy method for checking the assumption of a single
normal distribution is to construct a histogram of the data.
Clarification The tests described in this chapter depend on the
assumption of normality, and the data should be examined
for departures from normality before the tests are applied.
However, the tests are robust to small departures from
normality; i.e., they work fairly well as long as the data are
bell-shaped and the tails are not heavy. Quantitative
methods for checking the normality assumption are
discussed in the next section.
Another graphical method for testing the normality
assumption is the normal probability plot.
A graphical method for testing for correlation among
measurements is a time-lag plot. Correlation may not be a
problem if measurements are properly structured over time.
Correlation problems often occur when measurements are
made close together in time.
7.1.3. What are statistical tests?
http://www.itl.nist.gov/div898/handbook/prc/section1/prc13.htm[4/17/2013 7:13:14 PM]
7. Product and Process Comparisons
7.1. Introduction
7.1.3. What are statistical tests?
What is
meant by a
statistical
test?
A statistical test provides a mechanism for making
quantitative decisions about a process or processes. The
intent is to determine whether there is enough evidence to
"reject" a conjecture or hypothesis about the process. The
conjecture is called the null hypothesis. Not rejecting may be
a good result if we want to continue to act as if we "believe"
the null hypothesis is true. Or it may be a disappointing
result, possibly indicating we may not yet have enough data
to "prove" something by rejecting the null hypothesis.
For more discussion about the meaning of a statistical
hypothesis test, see Chapter 1.
Concept of
null
hypothesis
A classic use of a statistical test occurs in process control
studies. For example, suppose that we are interested in
ensuring that photomasks in a production process have mean
linewidths of 500 micrometers. The null hypothesis, in this
case, is that the mean linewidth is 500 micrometers. Implicit
in this statement is the need to flag photomasks which have
mean linewidths that are either much greater or much less
than 500 micrometers. This translates into the alternative
hypothesis that the mean linewidths are not equal to 500
micrometers. This is a two-sided alternative because it guards
against alternatives in opposite directions; namely, that the
linewidths are too small or too large.
The testing procedure works this way. Linewidths at random
positions on the photomask are measured using a scanning
electron microscope. A test statistic is computed from the
data and tested against pre-determined upper and lower
critical values. If the test statistic is greater than the upper
critical value or less than the lower critical value, the null
hypothesis is rejected because there is evidence that the mean
linewidth is not 500 micrometers.
One-sided
tests of
hypothesis
Null and alternative hypotheses can also be one-sided. For
example, to ensure that a lot of light bulbs has a mean
lifetime of at least 500 hours, a testing program is
implemented. The null hypothesis, in this case, is that the
mean lifetime is greater than or equal to 500 hours. The
complement or alternative hypothesis that is being guarded
against is that the mean lifetime is less than 500 hours. The
test statistic is compared with a lower critical value, and if it
7.1.3. What are statistical tests?
http://www.itl.nist.gov/div898/handbook/prc/section1/prc13.htm[4/17/2013 7:13:14 PM]
is less than this limit, the null hypothesis is rejected.
Thus, a statistical test requires a pair of hypotheses; namely,
H
0
: a null hypothesis
H
a
: an alternative hypothesis.
Significance
levels
The null hypothesis is a statement about a belief. We may
doubt that the null hypothesis is true, which might be why we
are "testing" it. The alternative hypothesis might, in fact, be
what we believe to be true. The test procedure is constructed
so that the risk of rejecting the null hypothesis, when it is in
fact true, is small. This risk, , is often referred to as the
significance level of the test. By having a test with a small
value of , we feel that we have actually "proved" something
when we reject the null hypothesis.
Errors of
the second
kind
The risk of failing to reject the null hypothesis when it is in
fact false is not chosen by the user but is determined, as one
might expect, by the magnitude of the real discrepancy. This
risk, , is usually referred to as the error of the second kind.
Large discrepancies between reality and the null hypothesis
are easier to detect and lead to small errors of the second
kind; while small discrepancies are more difficult to detect
and lead to large errors of the second kind. Also the risk
increases as the risk decreases. The risks of errors of the
second kind are usually summarized by an operating
characteristic curve (OC) for the test. OC curves for several
types of tests are shown in (Natrella, 1962).
Guidance in
this chapter
This chapter gives methods for constructing test statistics and
their corresponding critical values for both one-sided and
two-sided tests for the specific situations outlined under the
scope. It also provides guidance on the sample sizes required
for these tests.
Further guidance on statistical hypothesis testing,
significance levels and critical regions, is given in Chapter 1.
7.1.3.1. Critical values and p values
http://www.itl.nist.gov/div898/handbook/prc/section1/prc131.htm[4/17/2013 7:13:14 PM]
7. Product and Process Comparisons
7.1. Introduction
7.1.3. What are statistical tests?
7.1.3.1. Critical values and p values
Determination
of critical
values
Critical values for a test of hypothesis depend upon a test
statistic, which is specific to the type of test, and the
significance level, , which defines the sensitivity of the
test. A value of = 0.05 implies that the null hypothesis is
rejected 5% of the time when it is in fact true. The choice
of is somewhat arbitrary, although in practice values of
0.1, 0.05, and 0.01 are common. Critical values are
essentially cut-off values that define regions where the test
statistic is unlikely to lie; for example, a region where the
critical value is exceeded with probability if the null
hypothesis is true. The null hypothesis is rejected if the test
statistic lies within this region which is often referred to as
the rejection region(s). Critical values for specific tests of
hypothesis are tabled in chapter 1.
Information in
this chapter
This chapter gives formulas for the test statistics and points
to the appropriate tables of critical values for tests of
hypothesis regarding means, standard deviations, and
proportion defectives.
P values Another quantitative measure for reporting the result of a
test of hypothesis is the p-value. The p-value is the
probability of the test statistic being at least as extreme as
the one observed given that the null hypothesis is true. A
small p-value is an indication that the null hypothesis is
false.
Good practice It is good practice to decide in advance of the test how
small a p-value is required to reject the test. This is exactly
analagous to choosing a significance level, for test. For
example, we decide either to reject the null hypothesis if
the test statistic exceeds the critical value (for = 0.05) or
analagously to reject the null hypothesis if the p-value is
smaller than 0.05. It is important to understand the
relationship between the two concepts because some
statistical software packages report p-values rather than
critical values.
7.1.4. What are confidence intervals?
http://www.itl.nist.gov/div898/handbook/prc/section1/prc14.htm[4/17/2013 7:13:15 PM]
7. Product and Process Comparisons
7.1. Introduction
7.1.4. What are confidence intervals?
How do we
form a
confidence
interval?
The purpose of taking a random sample from a lot or
population and computing a statistic, such as the mean from
the data, is to approximate the mean of the population. How
well the sample statistic estimates the underlying population
value is always an issue. A confidence interval addresses this
issue because it provides a range of values which is likely to
contain the population parameter of interest.
Confidence
levels
Confidence intervals are constructed at a confidence level,
such as 95%, selected by the user. What does this mean? It
means that if the same population is sampled on numerous
occasions and interval estimates are made on each occasion,
the resulting intervals would bracket the true population
parameter in approximately 95% of the cases. A confidence
stated at a level can be thought of as the inverse of a
significance level, .
One and
two-sided
confidence
intervals
In the same way that statistical tests can be one or two-sided,
confidence intervals can be one or two-sided. A two-sided
confidence interval brackets the population parameter from
above and below. A one-sided confidence interval brackets
the population parameter either from above or below and
furnishes an upper or lower bound to its magnitude.
Example of
a two-
sided
confidence
interval
For example, a 100( )% confidence interval for the mean
of a normal population is;
where is the sample mean, z
1-o/2
is the 1-o/2 critical value
of the standard normal distribution which is found in the table
of the standard normal distribution, is the known population
standard deviation, and N is the sample size.
Guidance
in this
chapter
This chapter provides methods for estimating the population
parameters and confidence intervals for the situations
described under the scope.
7.1.4. What are confidence intervals?
http://www.itl.nist.gov/div898/handbook/prc/section1/prc14.htm[4/17/2013 7:13:15 PM]
Problem
with
unknown
standard
deviation
In the normal course of events, population standard deviations
are not known, and must be estimated from the data.
Confidence intervals, given the same confidence level, are by
necessity wider if the standard deviation is estimated from
limited data because of the uncertainty in this estimate.
Procedures for creating confidence intervals in this situation
are described fully in this chapter.
More information on confidence intervals can also be found in
Chapter 1.
7.1.5. What is the relationship between a test and a confidence interval?
http://www.itl.nist.gov/div898/handbook/prc/section1/prc15.htm[4/17/2013 7:13:16 PM]
7. Product and Process Comparisons
7.1. Introduction
7.1.5. What is the relationship between a test
and a confidence interval?
There is a
correspondence
between
hypothesis
testing and
confidence
intervals
In general, for every test of hypothesis there is an
equivalent statement about whether the hypothesized
parameter value is included in a confidence interval. For
example, consider the previous example of linewidths
where photomasks are tested to ensure that their
linewidths have a mean of 500 micrometers. The null and
alternative hypotheses are:
H
0
: mean linewidth = 500 micrometers
H
a
: mean linewidth 500 micrometers
Hypothesis test
for the mean For the test, the sample mean, , is calculated from N
linewidths chosen at random positions on each
photomask. For the purpose of the test, it is assumed that
the standard deviation, , is known from a long history
of this process. A test statistic is calculated from these
sample statistics, and the null hypothesis is rejected if:
where z
o/2
and z
1-o/2
are tabled values from the normal
distribution.
Equivalent
confidence
interval
With some algebra, it can be seen that the null hypothesis
is rejected if and only if the value 500 micrometers is not
in the confidence interval
Equivalent
confidence
interval
In fact, all values bracketed by this interval would be
accepted as null values for a given set of test data.
7.1.5. What is the relationship between a test and a confidence interval?
http://www.itl.nist.gov/div898/handbook/prc/section1/prc15.htm[4/17/2013 7:13:16 PM]
7.1.6. What are outliers in the data?
http://www.itl.nist.gov/div898/handbook/prc/section1/prc16.htm[4/17/2013 7:13:16 PM]
7. Product and Process Comparisons
7.1. Introduction
7.1.6. What are outliers in the data?
Definition
of outliers
An outlier is an observation that lies an abnormal distance from other values in a
random sample from a population. In a sense, this definition leaves it up to the
analyst (or a consensus process) to decide what will be considered abnormal. Before
abnormal observations can be singled out, it is necessary to characterize normal
observations.
Ways to
describe
data
Two activities are essential for characterizing a set of data:
1. Examination of the overall shape of the graphed data for important features,
including symmetry and departures from assumptions. The chapter on
Exploratory Data Analysis (EDA) discusses assumptions and summarization
of data in detail.
2. Examination of the data for unusual observations that are far removed from
the mass of data. These points are often referred to as outliers. Two graphical
techniques for identifying outliers, scatter plots and box plots, along with an
analytic procedure for detecting outliers when the distribution is normal
(Grubbs' Test), are also discussed in detail in the EDA chapter.
Box plot
construction
The box plot is a useful graphical display for describing the behavior of the data in
the middle as well as at the ends of the distributions. The box plot uses the median
and the lower and upper quartiles (defined as the 25th and 75th percentiles). If the
lower quartile is Q1 and the upper quartile is Q3, then the difference (Q3 - Q1) is
called the interquartile range or IQ.
Box plots
with fences
A box plot is constructed by drawing a box between the upper and lower quartiles
with a solid line drawn across the box to locate the median. The following quantities
(called fences) are needed for identifying extreme values in the tails of the
distribution:
1. lower inner fence: Q1 - 1.5*IQ
2. upper inner fence: Q3 + 1.5*IQ
3. lower outer fence: Q1 - 3*IQ
4. upper outer fence: Q3 + 3*IQ
Outlier
detection
criteria
A point beyond an inner fence on either side is considered a mild outlier. A point
beyond an outer fence is considered an extreme outlier.
Example of
an outlier
The data set of N = 90 ordered observations as shown below is examined for
outliers:
7.1.6. What are outliers in the data?
http://www.itl.nist.gov/div898/handbook/prc/section1/prc16.htm[4/17/2013 7:13:16 PM]
box plot
30, 171, 184, 201, 212, 250, 265, 270, 272, 289, 305, 306, 322, 322, 336, 346, 351,
370, 390, 404, 409, 411, 436, 437, 439, 441, 444, 448, 451, 453, 470, 480, 482, 487,
494, 495, 499, 503, 514, 521, 522, 527, 548, 550, 559, 560, 570, 572, 574, 578, 585,
592, 592, 607, 616, 618, 621, 629, 637, 638, 640, 656, 668, 707, 709, 719, 737, 739,
752, 758, 766, 792, 792, 794, 802, 818, 830, 832, 843, 858, 860, 869, 918, 925, 953,
991, 1000, 1005, 1068, 1441
The computations are as follows:
Median = (n+1)/2 largest data point = the average of the 45th and 46th
ordered points = (559 + 560)/2 = 559.5
Lower quartile = .25(N+1)th ordered point = 22.75th ordered point = 411 +
.75(436-411) = 429.75
Upper quartile = .75(N+1)th ordered point = 68.25th ordered point = 739
+.25(752-739) = 742.25
Interquartile range = 742.25 - 429.75 = 312.5
Lower inner fence = 429.75 - 1.5 (312.5) = -39.0
Upper inner fence = 742.25 + 1.5 (312.5) = 1211.0
Lower outer fence = 429.75 - 3.0 (312.5) = -507.75
Upper outer fence = 742.25 + 3.0 (312.5) = 1679.75
From an examination of the fence points and the data, one point (1441) exceeds the
upper inner fence and stands out as a mild outlier; there are no extreme outliers.
Histogram
with box
plot
A histogram with an overlaid box plot are shown below.
7.1.6. What are outliers in the data?
http://www.itl.nist.gov/div898/handbook/prc/section1/prc16.htm[4/17/2013 7:13:16 PM]
The outlier is identified as the largest value in the data set, 1441, and appears as the
circle to the right of the box plot.
Outliers
may contain
important
information
Outliers should be investigated carefully. Often they contain valuable information
about the process under investigation or the data gathering and recording process.
Before considering the possible elimination of these points from the data, one should
try to understand why they appeared and whether it is likely similar values will
continue to appear. Of course, outliers are often bad data points.
7.1.7. What are trends in sequential process or product data?
http://www.itl.nist.gov/div898/handbook/prc/section1/prc17.htm[4/17/2013 7:13:17 PM]
7. Product and Process Comparisons
7.1. Introduction
7.1.7. What are trends in sequential process or
product data?
Detecting
trends by
plotting
the data
points to
see if a
line with
an
obviously
non-zero
slope fits
the points
Detecting trends is equivalent to comparing the process values
to what we would expect a series of numbers to look like if
there were no trends. If we see a significant departure from a
model where the next observation is equally likely to go up or
down, then we would reject the hypothesis of "no trend".
A common way of investigating for trends is to fit a straight
line to the data and observe the line's direction (or slope). If
the line looks horizontal, then there is no evidence of a trend;
otherwise there is. Formally, this is done by testing whether
the slope of the line is significantly different from zero. The
methodology for this is covered in Chapter 4.
Other
trend tests
A non-parametric approach for detecting significant trends
known as the Reverse Arrangement Test is described in
Chapter 8.
7.2. Comparisons based on data from one process
http://www.itl.nist.gov/div898/handbook/prc/section2/prc2.htm[4/17/2013 7:13:17 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one
process
Questions
answered in
this section
For a single process, the current state of the process can be
compared with a nominal or hypothesized state. This
section outlines techniques for answering the following
questions from data gathered from a single process:
1. Do the observations come from a particular
distribution?
1. Chi-Square Goodness-of-Fit test for a
continuous or discrete distribution
2. Kolmogorov- Smirnov test for a continuous
distribution
3. Anderson-Darling and Shapiro-Wilk tests for
a continuous distribution
2. Are the data consistent with the assumed process
mean?
1. Confidence interval approach
2. Sample sizes required
3. Are the data consistent with a nominal standard
deviation?
1. Confidence interval approach
2. Sample sizes required
4. Does the proportion of defectives meet
requirements?
1. Confidence intervals
2. Sample sizes required
5. Does the defect density meet requirements?
6. What intervals contain a fixed percentage of the
data?
1. Approximate intervals that contain most of the
population values
2. Percentiles
3. Tolerance intervals
4. Tolerance intervals based on the smallest and
largest observations
General forms
of testing
These questions are addressed either by an hypothesis test
or by a confidence interval.
Parametric vs.
non-
parametric
All hypothesis-testing procedures can be broadly described
as either parametric or non-parametric/distribution-free.
Parametric test procedures are those that:
7.2. Comparisons based on data from one process
http://www.itl.nist.gov/div898/handbook/prc/section2/prc2.htm[4/17/2013 7:13:17 PM]
testing
1. Involve hypothesis testing of specified parameters
(such as "the population mean=50 grams"...).
2. Require a stringent set of assumptions about the
underlying sampling distributions.
When to use
nonparametric
methods?
When do we require non-parametric or distribution-free
methods? Here are a few circumstances that may be
candidates:
1. The measurements are only categorical; i.e., they are
nominally scaled, or ordinally (in ranks) scaled.
2. The assumptions underlying the use of parametric
methods cannot be met.
3. The situation at hand requires an investigation of
such features as randomness, independence,
symmetry, or goodness of fit rather than the testing
of hypotheses about specific values of particular
population parameters.
Difference
between non-
parametric
and
distribution-
free
Some authors distinguish between non-parametric and
distribution-free procedures.
Distribution-free test procedures are broadly defined as:
1. Those whose test statistic does not depend on the
form of the underlying population distribution from
which the sample data were drawn, or
2. Those for which the data are nominally or ordinally
scaled.
Nonparametric test procedures are defined as those that
are not concerned with the parameters of a distribution.
Advantages of
nonparametric
methods.
Distribution-free or nonparametric methods have several
advantages, or benefits:
1. They may be used on all types of data-categorical
data, which are nominally scaled or are in rank form,
called ordinally scaled, as well as interval or ratio-
scaled data.
2. For small sample sizes they are easy to apply.
3. They make fewer and less stringent assumptions
than their parametric counterparts.
4. Depending on the particular procedure they may be
almost as powerful as the corresponding parametric
procedure when the assumptions of the latter are
7.2. Comparisons based on data from one process
http://www.itl.nist.gov/div898/handbook/prc/section2/prc2.htm[4/17/2013 7:13:17 PM]
met, and when this is not the case, they are generally
more powerful.
Disadvantages
of
nonparametric
methods
Of course there are also disadvantages:
1. If the assumptions of the parametric methods can be
met, it is generally more efficient to use them.
2. For large sample sizes, data manipulations tend to
become more laborious, unless computer software is
available.
3. Often special tables of critical values are needed for
the test statistic, and these values cannot always be
generated by computer software. On the other hand,
the critical values for the parametric tests are readily
available and generally easy to incorporate in
computer programs.
7.2.1. Do the observations come from a particular distribution?
http://www.itl.nist.gov/div898/handbook/prc/section2/prc21.htm[4/17/2013 7:13:18 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.1. Do the observations come from a
particular distribution?
Data are
often
assumed to
come from
a particular
distribution.
Goodness-of-fit tests indicate whether or not it is reasonable
to assume that a random sample comes from a specific
distribution. Statistical techniques often rely on observations
having come from a population that has a distribution of a
specific form (e.g., normal, lognormal, Poisson, etc.).
Standard control charts for continuous measurements, for
instance, require that the data come from a normal
distribution. Accurate lifetime modeling requires specifying
the correct distributional model. There may be historical or
theoretical reasons to assume that a sample comes from a
particular population, as well. Past data may have
consistently fit a known distribution, for example, or theory
may predict that the underlying population should be of a
specific form.
Hypothesis
Test model
for
Goodness-
of-fit
Goodness-of-fit tests are a form of hypothesis testing where
the null and alternative hypotheses are
H
0
: Sample data come from the stated distribution.
H
A
: Sample data do not come from the stated distribution.
Parameters
may be
assumed or
estimated
from the
data
One needs to consider whether a simple or composite
hypothesis is being tested. For a simple hypothesis, values of
the distribution's parameters are specified prior to drawing
the sample. For a composite hypothesis, one or more of the
parameters is unknown. Often, these parameters are estimated
using the sample observations.
A simple hypothesis would be:
H
0
: Data are from a normal distribution, = 0 and = 1.
A composite hypothesis would be:
H
0
: Data are from a normal distribution, unknown and .
Composite hypotheses are more common because they allow
us to decide whether a sample comes from any distribution of
a specific type. In this situation, the form of the distribution
is of interest, regardless of the values of the parameters.
Unfortunately, composite hypotheses are more difficult to
7.2.1. Do the observations come from a particular distribution?
http://www.itl.nist.gov/div898/handbook/prc/section2/prc21.htm[4/17/2013 7:13:18 PM]
work with because the critical values are often hard to
compute.
Problems
with
censored
data
A second issue that affects a test is whether the data are
censored. When data are censored, sample values are in some
way restricted. Censoring occurs if the range of potential
values are limited such that values from one or both tails of
the distribution are unavailable (e.g., right and/or left
censoring - where high and/or low values are missing).
Censoring frequently occurs in reliability testing, when either
the testing time or the number of failures to be observed is
fixed in advance. A thorough treatment of goodness-of-fit
testing under censoring is beyond the scope of this document.
See D'Agostino & Stephens (1986) for more details.
Three types
of tests will
be covered
Three goodness-of-fit tests are examined in detail:
1. Chi-square test for continuous and discrete
distributions;
2. Kolmogorov-Smirnov test for continuous distributions
based on the empirical distribution function (EDF);
3. Anderson-Darling test for continuous distributions.
A more extensive treatment of goodness-of-fit techniques is
presented in D'Agostino & Stephens (1986). Along with the
tests mentioned above, other general and specific tests are
examined, including tests based on regression and graphical
techniques.
7.2.1.1. Chi-square goodness-of-fit test
http://www.itl.nist.gov/div898/handbook/prc/section2/prc211.htm[4/17/2013 7:13:19 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.1. Do the observations come from a particular distribution?
7.2.1.1. Chi-square goodness-of-fit test
Choice of
number of
groups for
"Goodness
of Fit" tests
is important
- but only
useful rules
of thumb
can be given
The test requires that the data first be grouped. The actual
number of observations in each group is compared to the
expected number of observations and the test statistic is
calculated as a function of this difference. The number of
groups and how group membership is defined will affect the
power of the test (i.e., how sensitive it is to detecting
departures from the null hypothesis). Power will not only be
affected by the number of groups and how they are defined,
but by the sample size and shape of the null and underlying
(true) distributions. Despite the lack of a clear "best
method", some useful rules of thumb can be given.
Group
Membership
When data are discrete, group membership is unambiguous.
Tabulation or cross tabulation can be used to categorize the
data. Continuous data present a more difficult challenge.
One defines groups by segmenting the range of possible
values into non-overlapping intervals. Group membership
can then be defined by the endpoints of the intervals. In
general, power is maximized by choosing endpoints such
that group membership is equiprobable (i.e., the probabilities
associated with an observation falling into a given group are
divided as evenly as possible across the intervals). Many
commercial software packages follow this procedure.
Rule-of-
thumb for
number of
groups
One rule-of-thumb suggests using the value 2n
2/5
as a good
starting point for choosing the number of groups. Another
well known rule-of-thumb requires every group to have at
least 5 data points.
Computation
of the chi-
square
goodness-
of-fit test
The formulas for the computation of the chi-square goodnes-
of-fit test are given in the EDA chapter.
7.2.1.2. Kolmogorov- Smirnov test
http://www.itl.nist.gov/div898/handbook/prc/section2/prc212.htm[4/17/2013 7:13:19 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.1. Do the observations come from a particular distribution?
7.2.1.2. Kolmogorov- Smirnov test
The K-S
test is a
good
alternative
to the chi-
square
test.
The Kolmogorov-Smirnov (K-S) test was originally proposed
in the 1930's in papers by Kolmogorov (1933) and Smirnov
(1936). Unlike the Chi-Square test, which can be used for
testing against both continuous and discrete distributions, the
K-S test is only appropriate for testing data against a
continuous distribution, such as the normal or Weibull
distribution. It is one of a number of tests that are based on the
empirical cumulative distribution function (ECDF).
K-S
procedure
Details on the construction and interpretation of the K-S test
statistic, D, and examples for several distributions are outlined
in Chapter 1.
The
probability
associated
with the
test
statistic is
difficult to
compute.
Critical values associated with the test statistic, D, are difficult
to compute for finite sample sizes, often requiring Monte
Carlo simulation. However, some general purpose statistical
software programs support the Kolmogorov-Smirnov test at
least for some of the more common distributions. Tabled
values can be found in Birnbaum (1952). A correction factor
can be applied if the parameters of the distribution are
estimated with the same data that are being tested. See
D'Agostino and Stephens (1986) for details.
7.2.1.3. Anderson-Darling and Shapiro-Wilk tests
http://www.itl.nist.gov/div898/handbook/prc/section2/prc213.htm[4/17/2013 7:13:20 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.1. Do the observations come from a particular distribution?
7.2.1.3. Anderson-Darling and Shapiro-Wilk
tests
Purpose:
Test for
distributional
adequacy
The Anderson-Darling Test
The Anderson-Darling test (Stephens, 1974) is used to test
if a sample of data comes from a specific distribution. It is a
modification of the Kolmogorov-Smirnov (K-S) test and
gives more weight to the tails of the distribution than does
the K-S test. The K-S test is distribution free in the sense
that the critical values do not depend on the specific
distribution being tested.
Requires
critical
values for
each
distribution
The Anderson-Darling test makes use of the specific
distribution in calculating critical values. This has the
advantage of allowing a more sensitive test and the
disadvantage that critical values must be calculated for each
distribution. Tables of critical values are not given in this
handbook (see Stephens 1974, 1976, 1977, and 1979)
because this test is usually applied with a statistical
software program that produces the relevant critical values.
Currently, Dataplot computes critical values for the
Anderson-Darling test for the following distributions:
normal
lognormal
Weibull
extreme value type I.
Anderson-
Darling
procedure
Details on the construction and interpretation of the
Anderson-Darling test statistic, A
2
, and examples for
several distributions are outlined in Chapter 1.
Shapiro-Wilk
test for
normality
The Shapiro-Wilk Test For Normality
The Shapiro-Wilk test, proposed in 1965, calculates a W
statistic that tests whether a random sample, x
1
, x
2
, ..., x
n
comes from (specifically) a normal distribution . Small
values of W are evidence of departure from normality and
percentage points for the W statistic, obtained via Monte
Carlo simulations, were reproduced by Pearson and Hartley
(1972, Table 16). This test has done very well in
7.2.1.3. Anderson-Darling and Shapiro-Wilk tests
http://www.itl.nist.gov/div898/handbook/prc/section2/prc213.htm[4/17/2013 7:13:20 PM]
comparison studies with other goodness of fit tests.
The W statistic is calculated as follows:
where the x
(i)
are the ordered sample values (x
(1)
is the
smallest) and the a
i
are constants generated from the means,
variances and covariances of the order statistics of a sample
of size n from a normal distribution (see Pearson and
Hartley (1972, Table 15).
For more information about the Shapiro-Wilk test the reader
is referred to the original Shapiro and Wilk (1965) paper
and the tables in Pearson and Hartley (1972).
7.2.2. Are the data consistent with the assumed process mean?
http://www.itl.nist.gov/div898/handbook/prc/section2/prc22.htm[4/17/2013 7:13:20 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.2. Are the data consistent with the assumed
process mean?
The testing
of H
0
for a
single
population
mean
Given a random sample of measurements, Y
1
, ..., Y
N
, there
are three types of questions regarding the true mean of the
population that can be addressed with the sample data. They
are:
1. Does the true mean agree with a known standard or
assumed mean?
2. Is the true mean of the population less than a given
standard?
3. Is the true mean of the population at least as large as a
given standard?
Typical null
hypotheses
The corresponding null hypotheses that test the true mean, ,
against the standard or assumed mean, are:
1.
2.
3.
Test
statistic
where the
standard
deviation is
not known
The basic statistics for the test are the sample mean and the
standard deviation. The form of the test statistic depends on
whether the poulation standard deviation, , is known or is
estimated from the data at hand. The more typical case is
where the standard deviation must be estimated from the
data, and the test statistic is
where the sample mean is
and the sample standard deviation is
7.2.2. Are the data consistent with the assumed process mean?
http://www.itl.nist.gov/div898/handbook/prc/section2/prc22.htm[4/17/2013 7:13:20 PM]
with N - 1 degrees of freedom.
Comparison
with critical
values
For a test at significance level , where is chosen to be
small, typically 0.01, 0.05 or 0.10, the hypothesis associated
with each case enumerated above is rejected if:
1. | t | _ t
1-o/2, N-1
2. t _ t
1-o, N-1
3. t _ t
o, N-1
where t
1-o/2, N-1
is the 1-o/2 critical value from the t
distribution with N - 1 degrees of freedom and similarly for
cases (2) and (3). Critical values can be found in the t-table
in Chapter 1.
Test
statistic
where the
standard
deviation is
known
If the standard deviation is known, the form of the test
statistic is
For case (1), the test statistic is compared with z
1-o/2
, which
is the 1-o/2 critical value from the standard normal
distribution, and similarly for cases (2) and (3).
Caution If the standard deviation is assumed known for the purpose
of this test, this assumption should be checked by a test of
hypothesis for the standard deviation.
An
illustrative
example of
the t-test
The following numbers are particle (contamination) counts
for a sample of 10 semiconductor silicon wafers:
50 48 44 56 61 52 53 55 67 51
The mean = 53.7 counts and the standard deviation = 6.567
counts.
The test is
two-sided
Over a long run the process average for wafer particle counts
has been 50 counts per wafer, and on the basis of the sample,
we want to test whether a change has occurred. The null
hypothesis that the process mean is 50 counts is tested against
the alternative hypothesis that the process mean is not equal
to 50 counts. The purpose of the two-sided alternative is to
rule out a possible process change in either direction.
Critical
values
For a significance level of = 0.05, the chances of
erroneously rejecting the null hypothesis when it is true are 5
7.2.2. Are the data consistent with the assumed process mean?
http://www.itl.nist.gov/div898/handbook/prc/section2/prc22.htm[4/17/2013 7:13:20 PM]
% or less. (For a review of hypothesis testing basics, see
Chapter 1).
Even though there is a history on this process, it has not been
stable enough to justify the assumption that the standard
deviation is known. Therefore, the appropriate test statistic is
the t-statistic. Substituting the sample mean, sample standard
deviation, and sample size into the formula for the test
statistic gives a value of
t = 1.782
with degrees of freedom N - 1 = 9. This value is tested
against the critical value
t
1-0.025;9
= 2.262
from the t-table where the critical value is found under the
column labeled 0.975 for the probability of exceeding the
critical value and in the row for 9 degrees of freedom. The
critical value is based on instead of because of the
two-sided alternative (two-tailed test) which requires equal
probabilities in each tail of the distribution that add to .
Conclusion Because the value of the test statistic falls in the interval (-
2.262, 2.262), we cannot reject the null hypothesis and,
therefore, we may continue to assume the process mean is 50
counts.
7.2.2.1. Confidence interval approach
http://www.itl.nist.gov/div898/handbook/prc/section2/prc221.htm[4/17/2013 7:13:21 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.2. Are the data consistent with the assumed process mean?
7.2.2.1. Confidence interval approach
Testing using
a confidence
interval
The hypothesis test results in a "yes" or "no" answer. The null
hypothesis is either rejected or not rejected. There is another way of
testing a mean and that is by constructing a confidence interval about
the true but unknown mean.
General form
of confidence
intervals
where the
standard
deviation is
unknown
Tests of hypotheses that can be made from a single sample of data
were discussed on the foregoing page. As with null hypotheses,
confidence intervals can be two-sided or one-sided, depending on the
question at hand. The general form of confidence intervals, for the
three cases discussed earlier, where the standard deviation is unknown
are:
1. Two-sided confidence interval for :
2. Lower one-sided confidence interval for :
3. Upper one-sided confidence interval for :
where t
o/2, N-1
is the o/2 critical value from the t distribution with N - 1
degrees of freedom and similarly for cases (2) and (3). Critical values
can be found in the t table in Chapter 1.
Confidence
level
The confidence intervals are constructed so that the probability of the
interval containing the mean is 1 - . Such intervals are referred to as
100(1- )% confidence intervals.
A 95%
confidence
interval for
The corresponding confidence interval for the test of hypothesis
example on the foregoing page is shown below. A 95 % confidence
interval for the population mean of particle counts per wafer is given by
7.2.2.1. Confidence interval approach
http://www.itl.nist.gov/div898/handbook/prc/section2/prc221.htm[4/17/2013 7:13:21 PM]
the example
Interpretation The 95 % confidence interval includes the null hypothesis if, and only
if, it would be accepted at the 5 % level. This interval includes the null
hypothesis of 50 counts so we cannot reject the hypothesis that the
process mean for particle counts is 50. The confidence interval includes
all null hypothesis values for the population mean that would be
accepted by an hypothesis test at the 5 % significance level. This
assumes, of course, a two-sided alternative.
7.2.2.2. Sample sizes required
http://www.itl.nist.gov/div898/handbook/prc/section2/prc222.htm[4/17/2013 7:13:22 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.2. Are the data consistent with the assumed process mean?
7.2.2.2. Sample sizes required
The
computation
of sample
sizes depends
on many
things, some
of which
have to be
assumed in
advance
Perhaps one of the most frequent questions asked of a statistician is,
"How many measurements should be included in the sample?
"
Unfortunately, there is no correct answer without additional
information (or assumptions). The sample size required for an
experiment designed to investigate the behavior of an unknown
population mean will be influenced by the following:
value selected for , the risk of rejecting a true hypothesis
value of , the risk of accepting a false null hypothesis when
a particular value of the alternative hypothesis is true.
value of the population standard deviation.
Application -
estimating a
minimum
sample size,
N, for
limiting the
error in the
estimate of
the mean
For example, suppose that we wish to estimate the average daily
yield, , of a chemical process by the mean of a sample, Y
1
, ..., Y
N
,
such that the error of estimation is less than with a probability of
95%. This means that a 95% confidence interval centered at the
sample mean should be
and if the standard deviation is known,
The critical value from the normal distribution for 1-o/2 = 0.975 is
1.96. Therefore,
Limitation
and
interpretation
A restriction is that the standard deviation must be known. Lacking
an exact value for the standard deviation requires some
accommodation, perhaps the best estimate available from a previous
experiment.
7.2.2.2. Sample sizes required
http://www.itl.nist.gov/div898/handbook/prc/section2/prc222.htm[4/17/2013 7:13:22 PM]
Controlling
the risk of
accepting a
false
hypothesis
To control the risk of accepting a false hypothesis, we set not only
, the probability of rejecting the null hypothesis when it is true,
but also , the probability of accepting the null hypothesis when in
fact the population mean is where is the difference or shift
we want to detect.
Standard
deviation
assumed to
be known
The minimum sample size, N, is shown below for two- and one-
sided tests of hypotheses with assumed to be known.
The quantities z
1-o/2
and z
1-
are critical values from the normal
distribution.
Note that it is usual to state the shift, , in units of the standard
deviation, thereby simplifying the calculation.
Example
where the
shift is stated
in terms of
the standard
deviation
For a one-sided hypothesis test where we wish to detect an increase
in the population mean of one standard deviation, the following
information is required: , the significance level of the test, and ,
the probability of failing to detect a shift of one standard deviation.
For a test with = 0.05 and = 0.10, the minimum sample size
required for the test is
N = (1.645 + 1.282)
2
= 8.567 ~ 9.
More often
we must
compute the
sample size
with the
population
standard
deviation
being
unknown
The procedures for computing sample sizes when the standard
deviation is not known are similar to, but more complex, than when
the standard deviation is known. The formulation depends on the t
distribution where the minimum sample size is given by
The drawback is that critical values of the t distribution depend on
known degrees of freedom, which in turn depend upon the sample
size which we are trying to estimate.
Iterate on the
initial
estimate
using critical
values from
Therefore, the best procedure is to start with an intial estimate based
on a sample standard deviation and iterate. Take the example
discussed above where the the minimum sample size is computed to
be N = 9. This estimate is low. Now use the formula above with
degrees of freedom N - 1 = 8 which gives a second estimate of
7.2.2.2. Sample sizes required
http://www.itl.nist.gov/div898/handbook/prc/section2/prc222.htm[4/17/2013 7:13:22 PM]
the t table
N = (1.860 + 1.397)
2
= 10.6 ~11.
It is possible to apply another iteration using degrees of freedom 10,
but in practice one iteration is usually sufficient. For the purpose of
this example, results have been rounded to the closest integer;
however, computer programs for finding critical values from the t
distribution allow non-integer degrees of freedom.
Table
showing
minimum
sample sizes
for a two-
sided test
The table below gives sample sizes for a two-sided test of
hypothesis that the mean is a given value, with the shift to be
detected a multiple of the standard deviation. For a one-sided test at
significance level , look under the value of 2 in column 1. Note
that this table is based on the normal approximation (i.e., the
standard deviation is known).
Sample Size Table for Two-Sided Tests
.01 .01 98 25 11
.01 .05 73 18 8
.01 .10 61 15 7
.01 .20 47 12 6
.01 .50 27 7 3
.05 .01 75 19 9
.05 .05 53 13 6
.05 .10 43 11 5
.05 .20 33 8 4
.05 .50 16 4 3
.10 .01 65 16 8
.10 .05 45 11 5
.10 .10 35 9 4
.10 .20 25 7 3
.10 .50 11 3 3
.20 .01 53 14 6
.20 .05 35 9 4
.20 .10 27 7 3
.20 .20 19 5 3
.20 .50 7 3 3
7.2.3. Are the data consistent with a nominal standard deviation?
http://www.itl.nist.gov/div898/handbook/prc/section2/prc23.htm[4/17/2013 7:13:23 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.3. Are the data consistent with a nominal
standard deviation?
The testing of
H
0
for a single
population
mean
Given a random sample of measurements, Y
1
, ..., Y
N
, there
are three types of questions regarding the true standard
deviation of the population that can be addressed with the
sample data. They are:
1. Does the true standard deviation agree with a
nominal value?
2. Is the true standard deviation of the population less
than or equal to a nominal value?
3. Is the true stanard deviation of the population at
least as large as a nominal value?
Corresponding
null
hypotheses
The corresponding null hypotheses that test the true
standard deviation, , against the nominal value, are:
1. H
0
: =
2. H
0
: <=
3. H
0
: >=
Test statistic The basic test statistic is the chi-square statistic
with N - 1 degrees of freedom where s is the sample
standard deviation; i.e.,
.
Comparison
with critical
values
For a test at significance level , where is chosen to be
small, typically 0.01, 0.05 or 0.10, the hypothesis
associated with each case enumerated above is rejected if:
7.2.3. Are the data consistent with a nominal standard deviation?
http://www.itl.nist.gov/div898/handbook/prc/section2/prc23.htm[4/17/2013 7:13:23 PM]
where X
2
o/2
is the critical value from the chi-square
distribution with N - 1 degrees of freedom and similarly
for cases (2) and (3). Critical values can be found in the
chi-square table in Chapter 1.
Warning Because the chi-square distribution is a non-negative,
asymmetrical distribution, care must be taken in looking
up critical values from tables. For two-sided tests, critical
values are required for both tails of the distribution.
Example
A supplier of 100 ohm
.
cm silicon wafers claims that his
fabrication process can produce wafers with sufficient
consistency so that the standard deviation of resistivity for
the lot does not exceed 10 ohm
.
cm. A sample of N = 10
wafers taken from the lot has a standard deviation of 13.97
ohm.cm. Is the suppliers claim reasonable? This question
falls under null hypothesis (2) above. For a test at
significance level, = 0.05, the test statistic,
is compared with the critical value, X
2
0.95, 9
= 16.92.
Since the test statistic (17.56) exceeds the critical value
(16.92) of the chi-square distribution with 9 degrees of
freedom, the manufacturer's claim is rejected.
7.2.3.1. Confidence interval approach
http://www.itl.nist.gov/div898/handbook/prc/section2/prc231.htm[4/17/2013 7:13:23 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.3. Are the data consistent with a nominal standard deviation?
7.2.3.1. Confidence interval approach
Confidence
intervals
for the
standard
deviation
Confidence intervals for the true standard deviation can be
constructed using the chi-square distribution. The 100(1- )%
confidence intervals that correspond to the tests of hypothesis
on the previous page are given by
1. Two-sided confidence interval for
2. Lower one-sided confidence interval for
3. Upper one-sided confidence interval for
where for case (1), X
2
o/2
is the critical value from the
chi-square distribution with N - 1 degrees of freedom and
similarly for cases (2) and (3). Critical values can be found in
the chi-square table in Chapter 1.
Choice of
risk level
can
change the
conclusion
Confidence interval (1) is equivalent to a two-sided test for the
standard deviation. That is, if the hypothesized or nominal
value, , is not contained within these limits, then the
hypothesis that the standard deviation is equal to the nominal
value is rejected.
A dilemma
of
hypothesis
testing
A change in can lead to a change in the conclusion. This
poses a dilemma. What should be? Unfortunately, there is
no clear-cut answer that will work in all situations. The usual
strategy is to set small so as to guarantee that the null
hypothesis is wrongly rejected in only a small number of
7.2.3.1. Confidence interval approach
http://www.itl.nist.gov/div898/handbook/prc/section2/prc231.htm[4/17/2013 7:13:23 PM]
cases. The risk, , of failing to reject the null hypothesis when
it is false depends on the size of the discrepancy, and also
depends on . The discussion on the next page shows how to
choose the sample size so that this risk is kept small for
specific discrepancies.
7.2.3.2. Sample sizes required
http://www.itl.nist.gov/div898/handbook/prc/section2/prc232.htm[4/17/2013 7:13:24 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.3. Are the data consistent with a nominal standard deviation?
7.2.3.2. Sample sizes required
Sample sizes
to minimize
risk of false
acceptance
The following procedure for computing sample sizes for
tests involving standard deviations follows W. Diamond
(1989). The idea is to find a sample size that is large
enough to guarantee that the risk, , of accepting a false
hypothesis is small.
Alternatives
are specific
departures
from the null
hypothesis
This procedure is stated in terms of changes in the variance,
not the standard deviation, which makes it somewhat
difficult to interpret. Tests that are generally of interest are
stated in terms of o, a discrepancy from the hypothesized
variance. For example:
1. Is the true variance larger than its hypothesized value
by o ?
2. Is the true variance smaller than its hypothesized
value by o ?
That is, the tests of interest are:
1. H
0
:
2. H
0
:
Interpretation The experimenter wants to assure that the probability of
erroneously accepting the null hypothesis of unchanged
variance is at most . The sample size, N, required for this
type of detection depends on the factor, o; the significance
level, o; and the risk, .
First choose
the level of
significance
and beta risk
The sample size is determined by first choosing appropriate
values of o and and then following the directions below
to find the degrees of freedom, v, from the chi-square
distribution.
The
calculations
should be
done by
creating a
table or
First compute
Then generate a table of degrees of freedom, v, say
7.2.3.2. Sample sizes required
http://www.itl.nist.gov/div898/handbook/prc/section2/prc232.htm[4/17/2013 7:13:24 PM]
spreadsheet
between 1 and 200. For case (1) or (2) above, calculate
v
and the corresponding value of C
v
for each value of degrees
of freedom in the table where
The value of v where C
v
is closest to is the correct
degrees of freedom and
N = v + 1
Hints on
using
software
packages to
do the
calculations
The quantity X
2
1-o, v
is the critical value from the chi-
square distribution with v degrees of freedom which is
exceeded with probability o. It is sometimes referred to as
the percent point function (PPF) or the inverse chi-square
function. The probability that is evaluated to get C
v
is
called the cumulative density function (CDF).
Example Consider the case where the variance for resistivity
measurements on a lot of silicon wafers is claimed to be
100 (ohm
.
cm)
2
. A buyer is unwilling to accept a shipment
if o is greater than 55 ohm
.
cm for a particular lot. This
problem falls under case (1) above. How many samples are
needed to assure risks of o = 0.05 and = 0.01?
Calculations If software is available to compute the roots (or zero
values) of a univariate function, then we can determine the
sample size by finding the roots of a function that calculates
C
v
for a given value of v. The procedure is:
1. Def i ne const ant s.
n = 0. 05
= 0. 01
5 = 55
o
0
2
= 100
R = 1 + 5/ o
0
2
2. Cr eat e a f unct i on, Cnu.
Cnu = F( F
- 1
( n, ) / R, ) -
F( x, ) r et ur ns t he pr obabi l i t y of a chi -
squar e r andom
var i abl e wi t h degr ees of f r eedomt hat i s
l ess t han
or equal t o x and
F
-1
( n, ) r et ur ns x such t hat F( x, ) = n.
7.2.3.2. Sample sizes required
http://www.itl.nist.gov/div898/handbook/prc/section2/prc232.htm[4/17/2013 7:13:24 PM]
3. Fi nd t he val ue of f or whi ch t he f unct i on,
Cnu, i s zer o.
Using this procedure, Cnu is zero when v is 169.3.
Therefore, the minimum sample size needed to guarantee
the risk level is N = 170.
Alternatively, we can determine the sample size by simply
printing computed values of Cnu for various values of v.
1. Def i ne const ant s.
n = 0. 05
5 = 55
o
0
2
= 100
R = 1 + 5/ o
0
2
2. Gener at e Cnu f or val ues of f r om 1 t o 200.
Bnu = F
-1
( n, ) / R
Cnu = F( Bnu, )
The values of Cnu generated for v between 165 and 175
degrees of freedom are shown below.
Bnu Cnu
165 126. 4344 0. 0114
166 127. 1380 0. 0110
167 127. 8414 0. 0107
168 128. 5446 0. 0104
169 129. 2477 0. 0101
170 129. 9506 0. 0098
171 130. 6533 0. 0095
172 131. 3558 0. 0092
173 132. 0582 0. 0090
174 132. 7604 0. 0087
175 133. 4625 0. 0085
The value of Cnu closest to 0.01 is 0.0101, which is
associated with v = 169 degrees of freedom. Therefore, the
minimum sample size needed to guarantee the risk level is
N = 170.
The calculations used in this section can be performed
using both Dataplot code and R code.
7.2.4. Does the proportion of defectives meet requirements?
http://www.itl.nist.gov/div898/handbook/prc/section2/prc24.htm[4/17/2013 7:13:25 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.4. Does the proportion of defectives meet
requirements?
Testing
proportion
defective is
based on the
binomial
distribution
The proportion of defective items in a manufacturing
process can be monitored using statistics based on the
observed number of defectives in a random sample of size
N from a continuous manufacturing process, or from a
large population or lot. The proportion defective in a
sample follows the binomial distribution where p is the
probability of an individual item being found defective.
Questions of interest for quality control are:
1. Is the proportion of defective items within prescribed
limits?
2. Is the proportion of defective items less than a
prescribed limit?
3. Is the proportion of defective items greater than a
prescribed limit?
Hypotheses
regarding
proportion
defective
The corresponding hypotheses that can be tested are:
1. p p
0
2. p p
0
3. p p
0
where p
0
is the prescribed proportion defective.
Test statistic
based on a
normal
approximation
Given a random sample of measurements Y
1
, ..., Y
N
from a
population, the proportion of items that are judged
defective from these N measurements is denoted . The
test statistic
depends on a normal approximation to the binomial
distribution that is valid for large N, (N > 30). This
approximation simplifies the calculations using critical
values from the table of the normal distribution as shown
below.
7.2.4. Does the proportion of defectives meet requirements?
http://www.itl.nist.gov/div898/handbook/prc/section2/prc24.htm[4/17/2013 7:13:25 PM]
Restriction on
sample size
Because the test is approximate, N needs to be large for the
test to be valid. One criterion is that N should be chosen so
that
min{Np
0
, N(1 - p
0
)} 5
For example, if p
0
= 0.1, then N should be at least 50 and
if p
0
= 0.01, then N should be at least 500. Criteria for
choosing a sample size in order to guarantee detecting a
change of size are discussed on another page.
One and two-
sided tests for
proportion
defective
Tests at the 1 - confidence level corresponding to
hypotheses (1), (2), and (3) are shown below. For
hypothesis (1), the test statistic, z, is compared with z
1-o/2
,
the critical value from the normal distribution that is
exceeded with probability and similarly for (2) and
(3). If
1. | z | _ z
1-o/2
2. z _ z
o
3. z _ z
1-o
the null hypothesis is rejected.
Example of a
one-sided test
for proportion
defective
After a new method of processing wafers was introduced
into a fabrication process, two hundred wafers were tested,
and twenty-six showed some type of defect. Thus, for N=
200, the proportion defective is estimated to be = 26/200
= 0.13. In the past, the fabrication process was capable of
producing wafers with a proportion defective of at most
0.10. The issue is whether the new process has degraded
the quality of the wafers. The relevant test is the one-sided
test (3) which guards against an increase in proportion
defective from its historical level.
Calculations
for a one-
sided test of
proportion
defective
For a test at significance level = 0.05, the hypothesis of
no degradation is validated if the test statistic z is less than
the critical value, z
0.95
= 1.645. The test statistic is
computed to be
Interpretation Because the test statistic is less than the critical value
(1.645), we cannot reject hypothesis (3) and, therefore, we
cannot conclude that the new fabrication method is
degrading the quality of the wafers. The new process may,
indeed, be worse, but more evidence would be needed to
reach that conclusion at the 95% confidence level.
7.2.4. Does the proportion of defectives meet requirements?
http://www.itl.nist.gov/div898/handbook/prc/section2/prc24.htm[4/17/2013 7:13:25 PM]
7.2.4.1. Confidence intervals
http://www.itl.nist.gov/div898/handbook/prc/section2/prc241.htm[4/17/2013 7:13:26 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.4. Does the proportion of defectives meet requirements?
7.2.4.1. Confidence intervals
Confidence
intervals
using the
method of
Agresti and
Coull
The method recommended by Agresti and Coull (1998) and also by Brown, Cai
and DasGupta (2001) (the methodology was originally developed by Wilson in
1927) is to use the form of the confidence interval that corresponds to the
hypothesis test given in Section 7.2.4. That is, solve for the two values of p
0
(say,
p
upper
and p
lower
) that result from setting z = z
1-o/2
and solving for p
0
= p
upper
,
and then setting z = z
o/2
and solving for p
0
= p
lower
. (Here, as in Section 7.2.4,
z
o/2
denotes the variate value from the standard normal distribution such that the
area to the left of the value is /2.) Although solving for the two values of p
0
might sound complicated, the appropriate expressions can be obtained by
straightforward but slightly tedious algebra. Such algebraic manipulation isn't
necessary, however, as the appropriate expressions are given in various sources.
Specifically, we have
Formulas
for the
confidence
intervals
Procedure
does not
strongly
depend on
values of p
and n
This approach can be substantiated on the grounds that it is the exact algebraic
counterpart to the (large-sample) hypothesis test given in section 7.2.4 and is also
supported by the research of Agresti and Coull. One advantage of this procedure
is that its worth does not strongly depend upon the value of n and/or p, and
indeed was recommended by Agresti and Coull for virtually all combinations of
n and p.
Another
advantage
is that the
lower limit
cannot be
negative
Another advantage is that the lower limit cannot be negative. That is not true for
the confidence expression most frequently used:
7.2.4.1. Confidence intervals
http://www.itl.nist.gov/div898/handbook/prc/section2/prc241.htm[4/17/2013 7:13:26 PM]
A confidence limit approach that produces a lower limit which is an impossible
value for the parameter for which the interval is constructed is an inferior
approach. This also applies to limits for the control charts that are discussed in
Chapter 6.
One-sided
confidence
intervals
A one-sided confidence interval can also be constructed simply by replacing each
by in the expression for the lower or upper limit, whichever is desired.
The 95% one-sided interval for p for the example in the preceding section is:
Example
Conclusion
from the
example
Since the lower bound does not exceed 0.10, in which case it would exceed the
hypothesized value, the null hypothesis that the proportion defective is at most
0.10, which was given in the preceding section, would not be rejected if we used
the confidence interval to test the hypothesis. Of course a confidence interval has
value in its own right and does not have to be used for hypothesis testing.
Exact Intervals for Small Numbers of Failures and/or Small Sample Sizes
Constrution
of exact
two-sided
confidence
intervals
based on
the
binomial
distribution
If the number of failures is very small or if the sample size N is very small,
symmetical confidence limits that are approximated using the normal distribution
may not be accurate enough for some applications. An exact method based on the
binomial distribution is shown next. To construct a two-sided confidence interval
at the 100(1-o)% confidence level for the true proportion defective p where N
d
defects are found in a sample of size N follow the steps below.
1. Solve the equation
for p
U
to obtain the upper 100(1-o)% limit for p.
2. Next solve the equation
7.2.4.1. Confidence intervals
http://www.itl.nist.gov/div898/handbook/prc/section2/prc241.htm[4/17/2013 7:13:26 PM]
for p
L
to obtain the lower 100(1-o)% limit for p.
Note The interval (p
L
, p
U
) is an exact 100(1-o)% confidence interval for p. However,
it is not symmetric about the observed proportion defective, .
Binomial
confidence
interval
example
The equations above that determine p
L
and p
U
can be solved using readily
available functions. Take as an example the situation where twenty units are
sampled from a continuous production line and four items are found to be
defective. The proportion defective is estimated to be = 4/20 = 0.20. The steps
for calculating a 90 % confidence interval for the true proportion defective, p
follow.
1. I ni t al i ze const ant s.
al pha = 0. 10
Nd = 4
N = 20
2. Def i ne a f unct i on f or upper l i mi t ( f u) and a f unct i on
f or t he l ower l i mi t ( f l ) .
f u = F( Nd, pu, 20) - al pha/ 2
f l = F( Nd- 1, pl , 20) - ( 1- al pha/ 2)
F i s t he cumul at i ve densi t y f unct i on f or t he
bi nomi nal di st r i but i on.
3. Fi nd t he val ue of pu t hat cor r esponds t o f u = 0 and
t he val ue of pl t hat cor r esponds t o f l = 0 usi ng sof t war e
t o f i nd t he r oot s of a f unct i on.
The values of pu and pl for our example are:
pu = 0. 401029
pl = 0. 071354
Thus, a 90 % confidence interval for the proportion defective, p, is (0.071,
0.400). Whether or not the interval is truly "exact" depends on the software.
The calculations used in this example can be performed using both Dataplot code
and R code.
7.2.4.2. Sample sizes required
http://www.itl.nist.gov/div898/handbook/prc/section2/prc242.htm[4/17/2013 7:13:27 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.4. Does the proportion of defectives meet requirements?
7.2.4.2. Sample sizes required
Derivation
of formula
for required
sample size
when testing
proportions
The method of determining sample sizes for testing proportions is
similar to the method for determining sample sizes for testing the
mean. Although the sampling distribution for proportions actually
follows a binomial distribution, the normal approximation is used for
this derivation.
Problem
formulation
We want to test the hypothesis
H
0
: p = p
0
H
a
: p = p
0
with p denoting the proportion of defectives.
Define o as the change in the proportion defective that we are
interested in detecting
o = |p
1
- p
0
|
Specify the level of statisitical significance and statistical power,
respectively, by
P(reject H
0
| H
0
is true with any p = p
0
) _ o
P(reject H
0
| H
0
is false with any p _ o ) _ 1 -
Definition
of allowable
deviation
If we are interested in detecting a change in the proportion defective
of size o in either direction, the corresponding confidence interval for
p can be written
Relationship
to
confidence
interval
For a (1- o) % confidence interval based on the normal distribution,
where z
1-o/2
is the critical value of the normal distribution which is
exceeded with probability o/2,
7.2.4.2. Sample sizes required
http://www.itl.nist.gov/div898/handbook/prc/section2/prc242.htm[4/17/2013 7:13:27 PM]
Minimum
sample size
Thus, the minimum sample size is
1. For a two-sided interval
2. For a one-sided interval
The mathematical details of this derivation are given on pages
30-34 of Fleiss, Levin, and Paik.
Continuity
correction
Fleiss, Levin and Paik also recommend the following continuity
correction
with N' denoting the sample size computed using the above formula.
Example of
calculating
sample size
for testing
proportion
defective
Suppose that a department manager needs to be able to detect any
change above 0.10 in the current proportion defective of his product
line, which is running at approximately 10 % defective. He is
interested in a one-sided test and does not want to stop the line
except when the process has clearly degraded and, therefore, he
chooses a significance level for the test of 5 %. Suppose, also, that he
is willing to take a risk of 10 % of failing to detect a change of this
magnitude. With these criteria:
1. z
0.95
= 1.645; z
0.90
= 1.282
2. o = 0.10 (p
1
= 0.20)
3. p
0
= 0.10
and the minimum sample size for a one-sided test procedure is
7.2.4.2. Sample sizes required
http://www.itl.nist.gov/div898/handbook/prc/section2/prc242.htm[4/17/2013 7:13:27 PM]
With the continuity correction, the minimum sample size becomes
112.
7.2.5. Does the defect density meet requirements?
http://www.itl.nist.gov/div898/handbook/prc/section2/prc25.htm[4/17/2013 7:13:27 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.5. Does the defect density meet requirements?
Testing defect
densities is
based on the
Poisson
distribution
The number of defects observed in an area of size A units is often
assumed to have a Poisson distribution with parameter A x D,
where D is the actual process defect density (D is defects per
unit area). In other words:
The questions of primary interest for quality control are:
1. Is the defect density within prescribed limits?
2. Is the defect density less than a prescribed limit?
3. Is the defect density greater than a prescribed limit?
Normal
approximation
to the Poisson
We assume that AD is large enough so that the normal
approximation to the Poisson applies (in other words, AD > 10
for a reasonable approximation and AD > 20 for a good one).
That translates to
where is the standard normal distribution function.
Test statistic
based on a
normal
approximation
If, for a sample of area A with a defect density target of D
0
, a
defect count of C is observed, then the test statistic
can be used exactly as shown in the discussion of the test
statistic for fraction defectives in the preceding section.
Testing the
hypothesis
that the
process defect
density is less
than or equal
to D
0
For example, after choosing a sample size of area A (see below
for sample size calculation) we can reject that the process defect
density is less than or equal to the target D
0
if the number of
defects C in the sample is greater than C
A
, where
7.2.5. Does the defect density meet requirements?
http://www.itl.nist.gov/div898/handbook/prc/section2/prc25.htm[4/17/2013 7:13:27 PM]
and z
1-o
is the 100(1-o) percentile of the standard normal
distribution. The test significance level is 100(1-o). For a 90%
significance level use z
0.90
= 1.282 and for a 95% test use z
0.95
= 1.645. o is the maximum risk that an acceptable process with a
defect density at least as low as D
0
"fails" the test.
Choice of
sample size
(or area) to
examine for
defects
In order to determine a suitable area A to examine for defects,
you first need to choose an unacceptable defect density level.
Call this unacceptable defect density D
1
= kD
0
, where k > 1.
We want to have a probability of less than or equal to is of
"passing" the test (and not rejecting the hypothesis that the true
level is D
0
or better) when, in fact, the true defect level is D
1
or
worse. Typically will be 0.2, 0.1 or 0.05. Then we need to
count defects in a sample size of area A, where A is equal to
Example Suppose the target is D
0
= 4 defects per wafer and we want to
verify a new process meets that target. We choose o = 0.1 to be
the chance of failing the test if the new process is as good as D
0
(o = the Type I error probability or the "producer's risk") and we
choose = 0.1 for the chance of passing the test if the new
process is as bad as 6 defects per wafer ( = the Type II error
probability or the "consumer's risk"). That means z
1-o
= 1.282
and z
= -1.282.
The sample size needed is A wafers, where
which we round up to 9.
The test criteria is to "accept" that the new process meets target
unless the number of defects in the sample of 9 wafers exceeds
In other words, the reject criteria for the test of the new process
is 44 or more defects in the sample of 9 wafers.
Note: Technically, all we can say if we run this test and end up
not rejecting is that we do not have statistically significant
evidence that the new process exceeds target. However, the way
we chose the sample size for this test assures us we most likely
would have had statistically significant evidence for rejection if
7.2.5. Does the defect density meet requirements?
http://www.itl.nist.gov/div898/handbook/prc/section2/prc25.htm[4/17/2013 7:13:27 PM]
the process had been as bad as 1.5 times the target.
7.2.6. What intervals contain a fixed percentage of the population values?
http://www.itl.nist.gov/div898/handbook/prc/section2/prc26.htm[4/17/2013 7:13:28 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.6. What intervals contain a fixed percentage
of the population values?
Observations
tend to
cluster
around the
median or
mean
Empirical studies have demonstrated that it is typical for a
large number of the observations in any study to cluster near
the median. In right-skewed data this clustering takes place
to the left of (i.e., below) the median and in left-skewed
data the observations tend to cluster to the right (i.e., above)
the median. In symmetrical data, where the median and the
mean are the same, the observations tend to distribute
equally around these measures of central tendency.
Various
methods
Several types of intervals about the mean that contain a
large percentage of the population values are discussed in
this section.
Approximate intervals that contain most of the
population values
Percentiles
Tolerance intervals for a normal distribution
Tolerance intervals based on the smallest and largest
observations
7.2.6.1. Approximate intervals that contain most of the population values
http://www.itl.nist.gov/div898/handbook/prc/section2/prc261.htm[4/17/2013 7:13:28 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.6. What intervals contain a fixed percentage of the population values?
7.2.6.1. Approximate intervals that contain
most of the population values
Empirical
intervals
A rule of thumb is that where there is no evidence of
significant skewness or clustering, two out of every three
observations (67%) should be contained within a distance of
one standard deviation of the mean; 90% to 95% of the
observations should be contained within a distance of two
standard deviations of the mean; 99-100% should be
contained within a distance of three standard deviations. This
rule can help identify outliers in the data.
Intervals
that apply
to any
distribution
The Bienayme-Chebyshev rule states that regardless of how
the data are distributed, the percentage of observations that are
contained within a distance of k tandard deviations of the
mean is at least (1 - 1/k
2
)100%.
Exact
intervals
for the
normal
distribution
The Bienayme-Chebyshev rule is conservative because it
applies to any distribution. For a normal distribution, a higher
percentage of the observations are contained within k standard
deviations of the mean as shown in the following table.
Percentage of observations contained between the mean
and k standard deviations
k, No. of
Standard
Deviations
Empircal Rule
Bienayme-
Chebychev
Normal
Distribution
1 67% N/A 68.26%
2 90-95% at least 75% 95.44%
3 99-100%
at least
88.89%
99.73%
4 N/A
at least
93.75%
99.99%
7.2.6.2. Percentiles
http://www.itl.nist.gov/div898/handbook/prc/section2/prc262.htm[4/17/2013 7:13:29 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.6. What intervals contain a fixed percentage of the population values?
7.2.6.2. Percentiles
Definitions of
order
statistics and
ranks
For a series of measurements Y
1
, ..., Y
N
, denote the data
ordered in increasing order of magnitude by Y
[1]
, ..., Y
[N]
.
These ordered data are called order statistics. If Y
[j]
is the
order statistic that corresponds to the measurement Y
i
, then
the rank for Y
i
is j; i.e.,
Definition of
percentiles
Order statistics provide a way of estimating proportions of
the data that should fall above and below a given value,
called a percentile. The pth percentile is a value, Y
(p)
, such
that at most (100p) % of the measurements are less than
this value and at most 100(1- p) % are greater. The 50th
percentile is called the median.
Percentiles split a set of ordered data into hundredths.
(Deciles split ordered data into tenths). For example, 70 %
of the data should fall below the 70th percentile.
Estimation of
percentiles
Percentiles can be estimated from N measurements as
follows: for the pth percentile, set p(N+1) equal to k + d for
k an integer, and d, a fraction greater than or equal to 0 and
less than 1.
1. For 0 < k < N,
2. For k = 0, Y(p) = Y
[1]
3. For k = N, Y(p) = Y
[N]
Example and
interpretation
For the purpose of illustration, twelve measurements from a
gage study are shown below. The measurements are
resistivities of silicon wafers measured in ohm
.
cm.
i Measur ement s Or der st at s Ranks
1 95. 1772 95. 0610 9
2 95. 1567 95. 0925 6
3 95. 1937 95. 1065 10
7.2.6.2. Percentiles
http://www.itl.nist.gov/div898/handbook/prc/section2/prc262.htm[4/17/2013 7:13:29 PM]
4 95. 1959 95. 1195 11
5 95. 1442 95. 1442 5
6 95. 0610 95. 1567 1
7 95. 1591 95. 1591 7
8 95. 1195 95. 1682 4
9 95. 1065 95. 1772 3
10 95. 0925 95. 1937 2
11 95. 1990 95. 1959 12
12 95. 1682 95. 1990 8
To find the 90th percentile, p(N+1) = 0.9(13) =11.7; k = 11,
and d = 0.7. From condition (1) above, Y(0.90) is estimated
to be 95.1981 ohm
.
cm. This percentile, although it is an
estimate from a small sample of resistivities measurements,
gives an indication of the percentile for a population of
resistivity measurements.
Note that
there are
other ways of
calculating
percentiles in
common use
Some software packages set 1+p(N-1) equal to k + d, then
proceed as above. The two methods give fairly similar
results.
A third way of calculating percentiles (given in some
elementary textbooks) starts by calculating pN. If that is not
an integer, round up to the next highest integer k and use
Y
[k]
as the percentile estimate. If pN is an integer k, use
0.5(Y
[k]
+Y
[k+1]
).
Definition of
Tolerance
Interval
An interval covering population percentiles can be
interpreted as "covering a proportion p of the population
with a level of confidence, say, 90 %." This is known as a
tolerance interval.
7.2.6.3. Tolerance intervals for a normal distribution
http://www.itl.nist.gov/div898/handbook/prc/section2/prc263.htm[4/17/2013 7:13:30 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.6. What intervals contain a fixed percentage of the population values?
7.2.6.3. Tolerance intervals for a normal distribution
Definition of
a tolerance
interval
A confidence interval covers a population parameter with a stated
confidence, that is, a certain proportion of the time. There is also a way to
cover a fixed proportion of the population with a stated confidence. Such an
interval is called a tolerance interval. The endpoints of a tolerance interval
are called tolerance limits. An application of tolerance intervals to
manufacturing involves comparing specification limits prescribed by the
client with tolerance limits that cover a specified proportion of the
population.
Difference
between
confidence
and tolerance
intervals
Confidence limits are limits within which we expect a given population
parameter, such as the mean, to lie. Statistical tolerance limits are limits
within which we expect a stated proportion of the population to lie.
Not related to
engineering
tolerances
Statistical tolerance intervals have a probabilistic interpretation.
Engineering tolerances are specified outer limits of acceptability which are
usually prescribed by a design engineer and do not necessarily reflect a
characteristic of the actual measurements.
Three types of
tolerance
intervals
Three types of questions can be addressed by tolerance intervals. Question
(1) leads to a two-sided interval; questions (2) and (3) lead to one-sided
intervals.
1. What interval will contain p percent of the population measurements?
2. What interval guarantees that p percent of population measurements
will not fall below a lower limit?
3. What interval guarantees that p percent of population measurements
will not exceed an upper limit?
Tolerance
intervals for
measurements
from a
normal
distribution
For the questions above, the corresponding tolerance intervals are defined
by lower (L) and upper (U) tolerance limits which are computed from a
series of measurements Y
1
, ..., Y
N
:
1.
2.
3.
where the k factors are determined so that the intervals cover at least a
proportion p of the population with confidence, y.
7.2.6.3. Tolerance intervals for a normal distribution
http://www.itl.nist.gov/div898/handbook/prc/section2/prc263.htm[4/17/2013 7:13:30 PM]
Calculation
of k factor for
a two-sided
tolerance
limit for a
normal
distribution
If the data are from a normally distributed population, an approximate value
for the k
2
factor as a function of p and y for a two-sided tolerance interval
(Howe, 1969) is
where X
2
1-y, v
is the critical value of the chi-square distribution with
degrees of freedom v that is exceeded with probability y, and z
(1-p)/2
is the
critical value of the normal distribution associated with cummulative
probability (1-p)/2.
The quantity v represents the degrees of freedom used to estimate the
standard deviation. Most of the time the same sample will be used to
estimate both the mean and standard deviation so that v = N - 1, but the
formula allows for other possible values of v.
Example of
calculation
For example, suppose that we take a sample of N = 43 silicon wafers from a
lot and measure their thicknesses in order to find tolerance limits within
which a proportion p = 0.90 of the wafers in the lot fall with probability y =
0.99. Since the standard deviation, s, is computed from the sample of 43
wafers, the degrees of freedom are v = N - 1.
Use of tables
in calculating
two-sided
tolerance
intervals
Values of the k
2
factor as a function of p and y are tabulated in some
textbooks, such as Dixon and Massey (1969). To use the normal and chi-
square tables in this handbook to approximate the k
2
factor, follow the steps
outlined below.
1. Calculate: (1 - p)/2 = (1 - 0.9)/2 = 0.05 and v = N - 1 = 43 - 1 = 42.
2. Go to the page describing critical values of the normal distribution. In
the summary table under the column labeled 0.05, find
z
(1-p)/2
= z
0.05
= -1.645.
3. Go to the table of lower critical values of the chi-square distribution.
Under the column labeled 0.01 in the row labeled degrees of freedom
= 42, find
X
2
1-y, v
= X
2
0.01, 42
= 23.650.
4. Calculate
The tolerance limits are then computed from the sample mean, , and
7.2.6.3. Tolerance intervals for a normal distribution
http://www.itl.nist.gov/div898/handbook/prc/section2/prc263.htm[4/17/2013 7:13:30 PM]
standard deviation, s, according to case(1).
Important
notes
The notation for the critical value of the chi-square distribution can be
confusing. Values as tabulated are, in a sense, already squared; whereas the
critical value for the normal distribution must be squared in the formula
above.
Some software is capable of computing a tolerance intervals for a given set
of data so that the user does not need to perform all the calculations. All the
tolerance intervals shown in this section can be computed using both
Dataplot code and R code. In addition, R software is capable of computing
an exact value of the k
2
factor thus replacing the approximation given
above. R and Dataplot examples include the case where a tolerance interval
is computed automatically from a data set.
Calculation
of a one-
sided
tolerance
interval for a
normal
distribution
The calculation of an approximate k factor for one-sided tolerance intervals
comes directly from the following set of formulas (Natrella, 1963):
A one-sided
tolerance
interval
example
For the example above, it may also be of interest to guarantee with 0.99
probability (or 99% confidence) that 90% of the wafers have thicknesses
less than an upper tolerance limit. This problem falls under case (3). The
calculations for the k
1
factor for a one-sided tolerance interval are:
Tolerance
factor based
on the non-
central t
distribution
The value of k
1
can also be computed using the inverse cumulative
distribution function for the non-central t distribution. This method may
give more accurate results for small values of N. The value of k
1
using the
non-central t distribution (using the same example as above) is:
7.2.6.3. Tolerance intervals for a normal distribution
http://www.itl.nist.gov/div898/handbook/prc/section2/prc263.htm[4/17/2013 7:13:30 PM]
where o is the non-centrality parameter.
In this case, the difference between the two computations is negligble
(1.8752 versus 1.8740). However, the difference becomes more pronounced
as the value of N gets smaller (in particular, for N _ 10). For example, if N
= 43 is replaced with N = 6, the non-central t method returns a value of
4.4111 for k
1
while the method based on the Natrella formuals returns a
value of 5.2808.
The disadvantage of the non-central t method is that it depends on the
inverse cumulative distribution function for the non-central t distribution.
This function is not available in many statistical and spreadsheet software
programs, but it is available in Dataplot and R (see Dataplot code and R
code). The Natrella formulas only depend on the inverse cumulative
distribution function for the normal distribution (which is available in just
about all statistical and spreadsheet software programs). Unless you have
small samples (say N _ 10), the difference in the methods should not have
much practical effect.
7.2.6.4. Tolerance intervals based on the largest and smallest observations
http://www.itl.nist.gov/div898/handbook/prc/section2/prc264.htm[4/17/2013 7:13:31 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.6. What intervals contain a fixed percentage of the population values?
7.2.6.4. Tolerance intervals based on the largest and smallest
observations
Tolerance
intervals can
be constructed
for a
distribution of
any form
The methods on the previous pages for computing tolerance limits are based on the
assumption that the measurements come from a normal distribution. If the distribution is
not normal, tolerance intervals based on this assumption will not provide coverage for the
intended proportion p of the population. However, there are methods for achieving the
intended coverage if the form of the distribution is not known, but these methods may
produce substantially wider tolerance intervals.
Risks
associated
with making
assumptions
about the
distribution
There are situations where it would be particularly dangerous to make unwarranted
assumptions about the exact shape of the distribution, for example, when testing the
strength of glass for airplane windshields where it is imperative that a very large
proportion of the population fall within acceptable limits.
Tolerance
intervals
based on
largest and
smallest
observations
One obvious choice for a two-sided tolerance interval for an unknown distribution is the
interval between the smallest and largest observations from a sample of Y
1
, ..., Y
N
measurements. Given the sample size N and coverage p, an equation from Hahn and
Meeker (p. 91),
allows us to calculate the confidence y of the tolerance interval. For example, the
confidence levels for selected coverages between 0.5 and 0.9999 are shown below for N
= 25.
Conf i dence Cover age
1. 000 0. 5000
0. 993 0. 7500
0. 729 0. 9000
0. 358 0. 9500
0. 129 0. 9750
0. 026 0. 9900
0. 007 0. 9950
0. 0 0. 9990
0. 0 0. 9995
0. 0 0. 9999
Note that if 99 % confidence is required, the interval that covers the entire sample data
set is guaranteed to achieve a coverage of only 75 % of the population values.
What is the
optimal
sample size?
Another question of interest is, "How large should a sample be so that one can be
assured with probability y that the tolerance interval will contain at least a proportion p of
the population?"
7.2.6.4. Tolerance intervals based on the largest and smallest observations
http://www.itl.nist.gov/div898/handbook/prc/section2/prc264.htm[4/17/2013 7:13:31 PM]
Approximation
for N
A rather good approximation for the required sample size is given by
where X
2
y, 4
is the critical value of the chi-square distribution with 4 degrees of freedom
that is exceeded with probability y.
Example of
the effect of p
on the sample
size
Suppose we want to know how many measurements to make in order to guarantee that
the interval between the smallest and largest observations covers a proportion p of the
population with probability y = 0.95. From the table for the upper critical value of the
chi-square distribution, look under the column labeled 0.95 in the row for 4 degrees of
freedom. The value is found to be X
2
0.95, 4
= 9.488 and calculations are shown below for
p equal to 0.90 and 0.99.
For p = 0.90, y = 0.95:
For p = 0.99, y = 0.95:
These calculations demonstrate that requiring the tolerance interval to cover a very large
proportion of the population may lead to an unacceptably large sample size.
7.3. Comparisons based on data from two processes
http://www.itl.nist.gov/div898/handbook/prc/section3/prc3.htm[4/17/2013 7:13:31 PM]
7. Product and Process Comparisons
7.3. Comparisons based on data from two
processes
Outline for
this section
In many manufacturing environments it is common to have
two or more processes performing the same task or generating
similar products. The following pages describe tests covering
several of the most common and useful cases for two
processes.
1. Do two processes have the same mean?
1. Tests when the standard deviations are equal
2. Tests when the standard deviations are unequal
3. Tests for paired data
2. Do two processes have the same standard deviation?
3. Do two processes produce the same proportion of
defectives?
4. If the observations are failure times, are the failure rates
(or mean times to failure) the same?
5. Do two arbitrary processes have the same central
tendency?
Example of
a dual
track
process
For example, in an automobile manufacturing plant, there may
exist several assembly lines producing the same part. If one
line goes down for some reason, parts can still be produced
and production will not be stopped. For example, if the parts
are piston rings for a particular model car, the rings produced
by either line should conform to a given set of specifications.
How does one confirm that the two processes are in fact
producing rings that are similar? That is, how does one
determine if the two processes are similar?
The goal is
to
determine
if the two
processes
are similar
In order to answer this question, data on piston rings are
collected for each process. For example, on a particular day,
data on the diameters of ten piston rings from each process
are measured over a one-hour time frame.
To determine if the two processes are similar, we are
interested in answering the following questions:
1. Do the two processes produce piston rings with the
same diameter?
2. Do the two processes have similar variability in the
7.3. Comparisons based on data from two processes
http://www.itl.nist.gov/div898/handbook/prc/section3/prc3.htm[4/17/2013 7:13:31 PM]
diameters of the rings produced?
Unknown
standard
deviation
The second question assumes that one does not know the
standard deviation of either process and therefore it must be
estimated from the data. This is usually the case, and the tests
in this section assume that the population standard deviations
are unknown.
Assumption
of a
normal
distribution
The statistical methodology used (i.e., the specific test to be
used) to answer these two questions depends on the
underlying distribution of the measurements. The tests in this
section assume that the data are normally distributed.
7.3.1. Do two processes have the same mean?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm[4/17/2013 7:13:32 PM]
7. Product and Process Comparisons
7.3. Comparisons based on data from two processes
7.3.1. Do two processes have the same mean?
Testing
hypotheses
related to
the means of
two
processes
Given two random samples of measurements,
Y
1
, ..., Y
N
and Z
1
, ..., Z
N
from two independent processes (the Y's are sampled from process 1
and the Z's are sampled from process 2), there are three types of
questions regarding the true means of the processes that are often
asked. They are:
1. Are the means from the two processes the same?
2. Is the mean of process 1 less than or equal to the mean of
process 2?
3. Is the mean of process 1 greater than or equal to the mean of
process 2?
Typical null
hypotheses
The corresponding null hypotheses that test the true mean of the first
process, , against the true mean of the second process, are:
1. H
0
: =
2. H
0
: < or equal to
3. H
0
: > or equal to
Note that as previously discussed, our choice of which null hypothesis
to use is typically made based on one of the following considerations:
1. When we are hoping to prove something new with the sample
data, we make that the alternative hypothesis, whenever
possible.
2. When we want to continue to assume a reasonable or traditional
hypothesis still applies, unless very strong contradictory
evidence is present, we make that the null hypothesis, whenever
possible.
Basic
statistics
from the two
processes
The basic statistics for the test are the sample means
;
and the sample standard deviations
7.3.1. Do two processes have the same mean?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm[4/17/2013 7:13:32 PM]
with degrees of freedom and respectively.
Form of the
test statistic
where the
two
processes
have
equivalent
standard
deviations
If the standard deviations from the two processes are equivalent, and
this should be tested before this assumption is made, the test statistic
is
where the pooled standard deviation is estimated as
with degrees of freedom .
Form of the
test statistic
where the
two
processes do
NOT have
equivalent
standard
deviations
If it cannot be assumed that the standard deviations from the two
processes are equivalent, the test statistic is
The degrees of freedom are not known exactly but can be estimated
using the Welch-Satterthwaite approximation
Test
strategies
The strategy for testing the hypotheses under (1), (2) or (3) above is to
calculate the appropriate t statistic from one of the formulas above,
and then perform a test at significance level o, where o is chosen to
be small, typically .01, .05 or .10. The hypothesis associated with each
case enumerated above is rejected if:
7.3.1. Do two processes have the same mean?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm[4/17/2013 7:13:32 PM]
1. | t | _ t
1-o/2, v
2. t _ t
1-o, v
3. t _ t
o, v
Explanation
of critical
values
The critical values from the t table depend on the significance level
and the degrees of freedom in the standard deviation. For hypothesis
(1) t
1-o/2, v
is the 1-o/2 critical value from the t table with v degrees
of freedom and similarly for hypotheses (2) and (3).
Example of
unequal
number of
data points
A new procedure (process 2) to assemble a device is introduced and
tested for possible improvement in time of assembly. The question
being addressed is whether the mean, , of the new assembly process
is smaller than the mean, , for the old assembly process (process 1).
We choose to test hypothesis (2) in the hope that we will reject this
null hypothesis and thereby feel we have a strong degree of
confidence that the new process is an improvement worth
implementing. Data (in minutes required to assemble a device) for
both the new and old processes are listed below along with their
relevant statistics.
Device Process 1 (Old) Process 2 (New)
1 32 36
2 37 31
3 35 30
4 28 31
5 41 34
6 44 36
7 35 29
8 31 32
9 34 31
10 38
11 42
Mean 36. 0909 32. 2222
St andar d devi at i on 4. 9082 2. 5386
No. measur ement s 11 9
Degr ees f r eedom 10 8
Computation
of the test
statistic
From this table we generate the test statistic
with the degrees of freedom approximated by
Decision
process
For a one-sided test at the 5% significance level, go to the t table for
0.95 signficance level, and look up the critical value for degrees of
7.3.1. Do two processes have the same mean?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm[4/17/2013 7:13:32 PM]
freedom v = 16. The critical value is 1.746. Thus, hypothesis (2) is
rejected because the test statistic (t = 2.269) is greater than 1.746 and,
therefore, we conclude that process 2 has improved assembly time
(smaller mean) over process 1.
7.3.1.1. Analysis of paired observations
http://www.itl.nist.gov/div898/handbook/prc/section3/prc311.htm[4/17/2013 7:13:33 PM]
7. Product and Process Comparisons
7.3. Comparisons based on data from two processes
7.3.1. Do two processes have the same mean?
7.3.1.1. Analysis of paired observations
Definition of
paired
comparisons
Given two random samples,
Y
1
, ..., Y
N
and Z
1
, ..., Z
N
from two populations, the data are said to be paired if the ith
measurement on the first sample is naturally paired with the
ith measurement on the second sample. For example, if N
supposedly identical products are chosen from a production
line, and each one, in turn, is tested with first one measuring
device and then with a second measuring device, it is
possible to decide whether the measuring devices are
compatible; i.e., whether there is a difference between the
two measurement systems. Similarly, if "before" and "after"
measurements are made with the same device on N objects, it
is possible to decide if there is a difference between "before"
and "after"; for example, whether a cleaning process changes
an important characteristic of an object. Each "before"
measurement is paired with the corresponding "after"
measurement, and the differences
are calculated.
Basic
statistics for
the test
The mean and standard deviation for the differences are
calculated as
and
with v = N - 1 degrees of freedom.
Test statistic
based on the
t
The paired-sample t test is used to test for the difference of
two means before and after a treatment. The test statistic is:
7.3.1.1. Analysis of paired observations
http://www.itl.nist.gov/div898/handbook/prc/section3/prc311.htm[4/17/2013 7:13:33 PM]
distribution
The hypotheses described on the foregoing page are rejected
if:
1. | t | _ t
1-o/2, v
2. t _ t
1-o, v
3. t _ t
o, v
where for hypothesis (1) t
1-o/2, v
is the 1-o/2 critical value
from the t distribution with v degrees of freedom and
similarly for cases (2) and (3). Critical values can be found
in the t table in Chapter 1.
7.3.1.2. Confidence intervals for differences between means
http://www.itl.nist.gov/div898/handbook/prc/section3/prc312.htm[4/17/2013 7:13:33 PM]
7. Product and Process Comparisons
7.3. Comparisons based on data from two processes
7.3.1. Do two processes have the same mean?
7.3.1.2. Confidence intervals for differences between means
Definition of
confidence
interval for
difference
between
population
means
Given two random samples,
Y
1
, ..., Y
N
and Z
1
, ..., Z
N
from two populations, two-sided confidence intervals with 100 (1- )% coverage for the
difference between the unknown population means, and , are shown in the table
below. Relevant statistics for paired observations and for unpaired observations are
shown elsewhere.
Two-sided confidence intervals with 100(1- )% coverage for - :
Paired observations
Unpaired observations
Interpretation
of confidence
interval
One interpretation of the confidence interval for means is that if zero is contained within
the confidence interval, the two population means are equivalent.
7.3.2. Do two processes have the same standard deviation?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc32.htm[4/17/2013 7:13:34 PM]
7. Product and Process Comparisons
7.3. Comparisons based on data from two processes
7.3.2. Do two processes have the same standard
deviation?
Testing
hypotheses
related to
standard
deviations
from two
processes
Given two random samples of measurements,
Y
1
, ..., Y
N
and Z
1
, ..., Z
N
from two independent processes, there are three types of
questions regarding the true standard deviations of the
processes that can be addressed with the sample data. They
are:
1. Are the standard deviations from the two processes the
same?
2. Is the standard deviation of one process less than the
standard deviation of the other process?
3. Is the standard deviation of one process greater than
the standard deviation of the other process?
Typical null
hypotheses
The corresponding null hypotheses that test the true standard
deviation of the first process, , against the true standard
deviation of the second process, are:
1. H
0
: =
2. H
0
:
3. H
0
:
Basic
statistics
from the two
processes
The basic statistics for the test are the sample variances
and degrees of freedom and , respectively.
Form of the
test statistic
The test statistic is
7.3.2. Do two processes have the same standard deviation?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc32.htm[4/17/2013 7:13:34 PM]
Test
strategies
The strategy for testing the hypotheses under (1), (2) or (3)
above is to calculate the F statistic from the formula above,
and then perform a test at significance level , where is
chosen to be small, typically 0.01, 0.05 or 0.10. The
hypothesis associated with each case enumerated above is
rejected if:
1. or
2.
3.
Explanation
of critical
values
The critical values from the F table depend on the
significance level and the degrees of freedom in the standard
deviations from the two processes. For hypothesis (1):
is the upper critical value from the F table
with
degrees of freedom for the numerator and
degrees of freedom for the denominator
and
is the upper critical value from the F table
with
degrees of freedom for the numerator and
degrees of freedom for the denominator.
Caution on
looking up
critical
values
The F distribution has the property that
which means that only upper critical values are required for
two-sided tests. However, note that the degrees of freedom
are interchanged in the ratio. For example, for a two-sided
test at significance level 0.05, go to the F table labeled "2.5%
significance level".
For , reverse the order of the degrees of
freedom; i.e., look across the top of the table for
and down the table for .
7.3.2. Do two processes have the same standard deviation?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc32.htm[4/17/2013 7:13:34 PM]
For , look across the top of the table for
and down the table for .
Critical values for cases (2) and (3) are defined similarly,
except that the critical values for the one-sided tests are
based on rather than on .
Two-sided
confidence
interval
The two-sided confidence interval for the ratio of the two
unknown variances (squares of the standard deviations) is
shown below.
Two-sided confidence interval with 100(1- )% coverage
for:
One interpretation of the confidence interval is that if the
quantity "one" is contained within the interval, the standard
deviations are equivalent.
Example of
unequal
number of
data points
A new procedure to assemble a device is introduced and
tested for possible improvement in time of assembly. The
question being addressed is whether the standard deviation,
, of the new assembly process is better (i.e., smaller) than
the standard deviation, , for the old assembly process.
Therefore, we test the null hypothesis that . We form
the hypothesis in this way because we hope to reject it, and
therefore accept the alternative that is less than . This is
hypothesis (2). Data (in minutes required to assemble a
device) for both the old and new processes are listed on an
earlier page. Relevant statistics are shown below:
Process 1 Process 2
Mean 36. 0909 32. 2222
St andar d devi at i on 4. 9082 2. 5874
No. measur ement s 11 9
Degr ees f r eedom 10 8
Computation
of the test
statistic
From this table we generate the test statistic
Decision
process
For a test at the 5% significance level, go to the F table for
5% signficance level, and look up the critical value for
numerator degrees of freedom = 10 and
denominator degrees of freedom = 8. The critical
value is 3.35. Thus, hypothesis (2) can be rejected because
7.3.2. Do two processes have the same standard deviation?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc32.htm[4/17/2013 7:13:34 PM]
the test statistic (F = 3.60) is greater than 3.35. Therefore, we
accept the alternative hypothesis that process 2 has better
precision (smaller standard deviation) than process 1.
7.3.3. How can we determine whether two processes produce the same proportion of defectives?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc33.htm[4/17/2013 7:13:35 PM]
7. Product and Process Comparisons
7.3. Comparisons based on data from two processes
7.3.3. How can we determine whether two
processes produce the same proportion of
defectives?
Case 1: Large Samples (Normal Approximation to
Binomial)
The
hypothesis of
equal
proportions
can be tested
using a z
statistic
If the samples are reasonably large we can use the normal
approximation to the binomial to develop a test similar to
testing whether two normal means are equal.
Let sample 1 have x
1
defects out of n
1
and sample 2 have
x
2
defects out of n
2
. Calculate the proportion of defects for
each sample and the z statistic below:
where
Compare |z| to the normal z
1-o/2
table value for a two-
sided test. For a one-sided test, assuming the alternative
hypothesis is p
1
> p
2
, compare z to the normal z
1-o
table
value. If the alternative hypothesis is p
1
< p
2
, compare z to
z
o
.
Case 2: An Exact Test for Small Samples
The Fisher
Exact
Probability
test is an
excellent
choice for
small samples
The Fisher Exact Probability Test is an excellent
nonparametric technique for analyzing discrete data (either
nominal or ordinal), when the two independent samples are
small in size. It is used when the results from two
independent random samples fall into one or the other of
two mutually exclusive classes (i.e., defect versus good, or
successes vs failures).
Example of a
2x2
In other words, every subject in each group has one of two
possible scores. These scores are represented by frequencies
7.3.3. How can we determine whether two processes produce the same proportion of defectives?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc33.htm[4/17/2013 7:13:35 PM]
contingency
table
in a 2x2 contingency table. The following discussion, using
a 2x2 contingency table, illustrates how the test operates.
We are working with two independent groups, such as
experiments and controls, males and females, the Chicago
Bulls and the New York Knicks, etc.
- + Total
Group
I
A B A+B
Group
II
C D C+D
Total A+C B+D N
The column headings, here arbitrarily indicated as plus and
minus, may be of any two classifications, such as: above
and below the median, passed and failed, Democrat and
Republican, agree and disagree, etc.
Determine
whether two
groups differ
in the
proportion
with which
they fall into
two
classifications
Fisher's test determines whether the two groups differ in
the proportion with which they fall into the two
classifications. For the table above, the test would
determine whether Group I and Group II differ significantly
in the proportion of plusses and minuses attributed to them.
The method proceeds as follows:
The exact probability of observing a particular set of
frequencies in a 2 2 table, when the marginal totals are
regarded as fixed, is given by the hypergeometric
distribution
But the test does not just look at the observed case. If
needed, it also computes the probability of more extreme
outcomes, with the same marginal totals. By "more
extreme", we mean relative to the null hypothesis of equal
proportions.
7.3.3. How can we determine whether two processes produce the same proportion of defectives?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc33.htm[4/17/2013 7:13:35 PM]
Example of
Fisher's test
This will become clear in the next illustrative example.
Consider the following set of 2 x 2 contingency tables:
Observed Data
More extreme outcomes with same
marginals
(a) (b) (c)
2 5 7
3 2 5
5 7 12
1 6 7
4 1 5
5 7 12
0 7 7
5 0 5
5 7 12
Table (a) shows the observed frequencies and tables (b)
and (c) show the two more extreme distributions of
frequencies that could occur with the same marginal totals
7, 5. Given the observed data in table (a) , we wish to test
the null hypothesis at, say, = 0.05.
Applying the previous formula to tables (a), (b), and (c),
we obtain
The probability associated with the occurrence of values as
extreme as the observed results under H
0
is given by
adding these three p's:
.26515 + .04419 + .00126 = .31060
So p = 0.31060 is the probability that we get from Fisher's
test. Since 0.31060 is larger than , we cannot reject the
null hypothesis.
Tocher's Modification
Tocher's
modification
makes
Fisher's test
less
conservative
Tocher (1950) showed that a slight modification of the
Fisher test makes it a more useful test. Tocher starts by
isolating the probability of all cases more extreme than the
observed one. In this example that is
p
b
+ p
c
= .04419 + .00126 = .04545
Now, if this probability is larger than , we cannot reject
H
0
. But if this probability is less than , while the
probability that we got from Fisher's test is greater than
7.3.3. How can we determine whether two processes produce the same proportion of defectives?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc33.htm[4/17/2013 7:13:35 PM]
(as is the case in our example) then Tocher advises to
compute the following ratio:
For the data in the example, that would be
Now we go to a table of random numbers and at random
draw a number between 0 and 1. If this random number is
smaller than the ratio above of 0.0172, we reject H
0
. If it is
larger we cannot reject H
0
. This added small probability of
rejecting H
0
brings the test procedure Type I error (i.e.,
value) to exactly 0.05 and makes the Fisher test less
conservative.
The test is a one-tailed test. For a two-tailed test, the value
of p obtained from the formula must be doubled.
A difficulty with the Tocher procedure is that someone else
analyzing the same data would draw a different random
number and possibly make a different decision about the
validity of H
0
.
7.3.4. Assuming the observations are failure times, are the failure rates (or Mean Times To Failure) for two distributions the same?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc34.htm[4/17/2013 7:13:36 PM]
7. Product and Process Comparisons
7.3. Comparisons based on data from two processes
7.3.4. Assuming the observations are failure
times, are the failure rates (or Mean
Times To Failure) for two distributions
the same?
Comparing
two
exponential
distributions
is to
compare the
means or
hazard rates
The comparison of two (or more) life distributions is a
common objective when performing statistical analyses of
lifetime data. Here we look at the one-parameter exponential
distribution case.
In this case, comparing two exponential distributions is
equivalent to comparing their means (or the reciprocal of
their means, known as their hazard rates).
Type II Censored data
Definition
of Type II
censored
data
Definition: Type II censored data occur when a life test is
terminated exactly when a pre-specified number of failures
have occurred. The remaining units have not yet failed. If n
units were on test, and the pre-specified number of failures is
r (where r is less than or equal to n), then the test ends at t
r
= the time of the r-th failure.
Two
exponential
samples
oredered by
time
Suppose we have Type II censored data from two
exponential distributions with means
1
and
2
. We have two
samples from these distributions, of sizes n
1
on test with r
1
failures and n
2
on test with r
2
failures, respectively. The
observations are time to failure and are therefore ordered by
time.
Test of
equality of
1
and
2
and
confidence
Letting
7.3.4. Assuming the observations are failure times, are the failure rates (or Mean Times To Failure) for two distributions the same?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc34.htm[4/17/2013 7:13:36 PM]
interval for
1
/
2
Then
and
with T
1
and T
2
independent. Thus
where
and
has an F distribution with (2r
1
, 2r
2
) degrees of freedom.
Tests of equality of
1
and
2
can be performed using tables
of the F distribution or computer programs. Confidence
intervals for
1
/
2
, which is the ratio of the means or the
hazard rates for the two distributions, are also readily
obtained.
Numerical
example
A numerical application will illustrate the concepts outlined
above.
For this example,
H
0
:
1
/
2
= 1
H
a
:
1
/
2
1
Two samples of size 10 from exponential distributions were
put on life test. The first sample was censored after 7 failures
and the second sample was censored after 5 failures. The
times to failure were:
Sample 1: 125 189 210 356 468 550 610
Sample 2: 170 234 280 350 467
So r
1
= 7, r
2
= 5 and t
1,(r1)
= 610, t
2,(r2)
=467.
Then T
1
= 4338 and T
2
= 3836.
The estimator for
1
is 4338 / 7 = 619.71 and the estimator
for
2
is 3836 / 5 = 767.20.
The ratio of the estimators = U = 619.71 / 767.20 = .808.
7.3.4. Assuming the observations are failure times, are the failure rates (or Mean Times To Failure) for two distributions the same?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc34.htm[4/17/2013 7:13:36 PM]
If the means are the same, the ratio of the estimators, U,
follows an F distribution with 2r
1
, 2r
2
degrees of freedom.
The P(F < .808) = .348. The associated p-value is 2(.348) =
.696. Based on this p-value, we find no evidence to reject the
null hypothesis (that the true but unknown ratio = 1). Note
that this is a two-sided test, and we would reject the null
hyposthesis if the p-value is either too small (i.e., less or
equal to .025) or too large (i.e., greater than or equal to .975)
for a 95% significance level test.
We can also put a 95% confidence interval around the ratio
of the two means. Since the .025 and .975 quantiles of
F
(14,10)
are 0.3178 and 3.5504, respectively, we have
Pr(U/3.5504 <
1
/
2
< U/.3178) = .95
and (.228, 2.542) is a 95% confidence interval for the ratio
of the unknown means. The value of 1 is within this range,
which is another way of showing that we cannot reject the
null hypothesis at the 95% significance level.
7.3.5. Do two arbitrary processes have the same central tendency?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc35.htm[4/17/2013 7:13:37 PM]
7. Product and Process Comparisons
7.3. Comparisons based on data from two processes
7.3.5. Do two arbitrary processes have the same central
tendency?
The
nonparametric
equivalent of
the t test is
due to Mann
and Whitney,
called the U
test
By "arbitrary" we mean that we make no underlying assumptions about
normality or any other distribution. The test is called the Mann-Whitney U
Test, which is the nonparametric equivalent of the t test for means.
The U-test (as the majority of nonparametric tests) uses the rank sums of the
two samples.
Procedure The test is implemented as follows.
1. Rank all (n
1
+ n
2
) observations in ascending order. Ties receive the
average of their observations.
2. Calculate the sum of the ranks, call these T
a
and T
b
3. Calculate the U statistic,
U
a
= n
1
(n
2
) + 0.5(n
1
)(n
1
+ 1) - T
a
or
U
b
= n
1
(n
2
) + 0.5(n
2
)(n
2
+ 1) - T
b
where U
a
+ U
b
= n
1
(n
2
).
Null
Hypothesis
The null hypothesis is: the two populations have the same central tendency.
The alternative hypothesis is: The central tendencies are NOT the same.
Test statistic The test statistic, U, is the smaller of U
a
and U
b
. For sample sizes larger than
20, we can use the normal z as follows:
z = [ U - E(U)] /
where
The critical value is the normal tabled z for /2 for a two-tailed test or z at
7.3.5. Do two arbitrary processes have the same central tendency?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc35.htm[4/17/2013 7:13:37 PM]
level, for a one-tail test.
For small samples, tables are readily available in most textbooks on
nonparametric statistics.
Example
An illustrative
example of the
U test
Two processing systems were used to clean wafers. The following data
represent the (coded) particle counts. The null hypothesis is that there is no
difference between the central tendencies of the particle counts; the alternative
hypothesis is that there is a difference. The solution shows the typical kind of
output software for this procedure would generate, based on the large sample
approximation.
Group A Rank Group B Rank
.55 8 .49 5
.67 15.5 .68 17
.43 1 .59 9.5
.51 6 .72 19
.48 3.5 .67 15.5
.60 11 .75 20.5
.71 18 .65 13.5
.53 7 .77 22
.44 2 .62 12
.65 13.5 .48 3.5
.75 20.5 .59 9.5
N Sum of Ranks U Std. Dev of U Median
A 11 106.000 81.000 15.229 0.540
B 11 147.000 40.000 15.229 0.635
For U = 40.0 and E[U] = 0.5(n
1
)(n
2
) = 60.5, the test statistic is
where
For a two-sided test with significance level o = 0.05, the critical value is z
1-o/2
= 1.96. Since |z| is less than the critical value, we do not reject the null
7.3.5. Do two arbitrary processes have the same central tendency?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc35.htm[4/17/2013 7:13:37 PM]
hypothesis and conclude that there is not enough evidence to claim that two
groups have different central tendencies.
7.4. Comparisons based on data from more than two processes
http://www.itl.nist.gov/div898/handbook/prc/section4/prc4.htm[4/17/2013 7:13:37 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more
than two processes
Introduction This section begins with a nonparametric procedure for
comparing several populations with unknown distributions.
Then the following topics are discussed:
Comparing variances
Comparing means (ANOVA technique)
Estimating variance components
Comparing categorical data
Comparing population proportion defectives
Making multiple comparisons
7.4.1. How can we compare several populations with unknown distributions (the Kruskal-Wallis test)?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc41.htm[4/17/2013 7:13:38 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.1. How can we compare several populations
with unknown distributions (the Kruskal-
Wallis test)?
The Kruskal-Wallis (KW) Test for Comparing
Populations with Unknown Distributions
A
nonparametric
test for
comparing
population
medians by
Kruskal and
Wallis
The KW procedure tests the null hypothesis that k samples
from possibly different populations actually originate from
similar populations, at least as far as their central
tendencies, or medians, are concerned. The test assumes
that the variables under consideration have underlying
continuous distributions.
In what follows assume we have k samples, and the
sample size of the i-th sample is n
i
, i = 1, 2, . . ., k.
Test based on
ranks of
combined data
In the computation of the KW statistic, each observation is
replaced by its rank in an ordered combination of all the k
samples. By this we mean that the data from the k samples
combined are ranked in a single series. The minimum
observation is replaced by a rank of 1, the next-to-the-
smallest by a rank of 2, and the largest or maximum
observation is replaced by the rank of N, where N is the
total number of observations in all the samples (N is the
sum of the n
i
).
Compute the
sum of the
ranks for each
sample
The next step is to compute the sum of the ranks for each
of the original samples. The KW test determines whether
these sums of ranks are so different by sample that they are
not likely to have all come from the same population.
Test statistic
follows a
2
distribution
It can be shown that if the k samples come from the same
population, that is, if the null hypothesis is true, then the
test statistic, H, used in the KW procedure is distributed
approximately as a chi-square statistic with df = k - 1,
provided that the sample sizes of the k samples are not too
small (say, n
i
>4, for all i). H is defined as follows:
7.4.1. How can we compare several populations with unknown distributions (the Kruskal-Wallis test)?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc41.htm[4/17/2013 7:13:38 PM]
where
k = number of samples (groups)
n
i
= number of observations for the i-th sample or
group
N = total number of observations (sum of all the n
i
)
R
i
= sum of ranks for group i
Example
An illustrative
example
The following data are from a comparison of four
investment firms. The observations represent percentage of
growth during a three month period.for recommended
funds.
A B C D
4.2 3.3 1.9 3.5
4.6 2.4 2.4 3.1
3.9 2.6 2.1 3.7
4.0 3.8 2.7 4.1
2.8 1.8 4.4
Step 1: Express the data in terms of their ranks
A B C D
17 10 2 11
19 4.5 4.5 9
14 6 3 12
15 13 7 16
8 1 18
SUM 65 41.5 17.5 66
Compute the
test statistic
The corresponding H test statistic is
7.4.1. How can we compare several populations with unknown distributions (the Kruskal-Wallis test)?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc41.htm[4/17/2013 7:13:38 PM]
From the chi-square table in Chapter 1, the critical value
for 1-o = 0.95 with df = k-1 = 3 is 7.812. Since 13.678 >
7.812, we reject the null hypothesis.
Note that the rejection region for the KW procedure is one-
sided, since we only reject the null hypothesis when the H
statistic is too large.
7.4.2. Assuming the observations are normal, do the processes have the same variance?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc42.htm[4/17/2013 7:13:39 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.2. Assuming the observations are normal, do
the processes have the same variance?
Before
comparing
means, test
whether the
variances
are equal
Techniques for comparing means of normal populations
generally assume the populations have the same variance.
Before using these ANOVA techniques, it is advisable to test
whether this assumption of homogeneity of variance is
reasonable. The following procedure is widely used for this
purpose.
Bartlett's Test for Homogeneity of Variances
Null
hypothesis
Bartlett's test is a commonly used test for equal variances.
Let's examine the null and alternative hypotheses.
against
Test
statistic
Assume we have samples of size n
i
from the i-th population,
i = 1, 2, . . . , k, and the usual variance estimates from each
sample:
where
Now introduce the following notation:
j
= n
j
- 1 (the
j
are
the degrees of freedom) and
The Bartlett's test statistic M is defined by
7.4.2. Assuming the observations are normal, do the processes have the same variance?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc42.htm[4/17/2013 7:13:39 PM]
Distribution
of the test
statistic
When none of the degrees of freedom is small, Bartlett
showed that M is distributed approximately as . The chi-
square approximation is generally acceptable if all the n
i
are
at least 5.
Bias
correction
This is a slightly biased test, according to Bartlett. It can be
improved by dividing M by the factor
Instead of M, it is suggested to use M/C for the test statistic.
Bartlett's
test is not
robust
This test is not robust, it is very sensitive to departures from
normality.
An alternative description of Bartlett's test appears in Chapter
1.
Gear Data Example (from Chapter 1):
An
illustrative
example of
Bartlett's
test
Gear diameter measurements were made on 10 batches of
product. The complete set of measurements appears in
Chapter 1. Bartlett's test was applied to this dataset leading to
a rejection of the assumption of equal batch variances at the
.05 critical value level. applied to this dataset
The Levene Test for Homogeneity of Variances
The Levene
test for
equality of
variances
Levene's test offers a more robust alternative to Bartlett's
procedure. That means it will be less likely to reject a true
hypothesis of equality of variances just because the
distributions of the sampled populations are not normal.
When non-normality is suspected, Levene's procedure is a
better choice than Bartlett's.
Levene's test is described in Chapter 1. This description also
includes an example where the test is applied to the gear
data. Levene's test does not reject the assumption of equality
of batch variances for these data. This differs from the
conclusion drawn from Bartlett's test and is a better answer
if, indeed, the batch population distributions are non-normal.
7.4.2. Assuming the observations are normal, do the processes have the same variance?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc42.htm[4/17/2013 7:13:39 PM]
7.4.3. Are the means equal?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc43.htm[4/17/2013 7:13:39 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.3. Are the means equal?
Test
equality of
means
The procedure known as the Analysis of Variance or ANOVA
is used to test hypotheses concerning means when we have
several populations.
The Analysis of Variance (ANOVA)
The ANOVA
procedure
is one of the
most
powerful
statistical
techniques
ANOVA is a general technique that can be used to test the
hypothesis that the means among two or more groups are
equal, under the assumption that the sampled populations are
normally distributed.
A couple of questions come immediately to mind: what
means? and why analyze variances in order to derive
conclusions about the means?
Both questions will be answered as we delve further into the
subject.
Introduction
to ANOVA
To begin, let us study the effect of temperature on a passive
component such as a resistor. We select three different
temperatures and observe their effect on the resistors. This
experiment can be conducted by measuring all the
participating resistors before placing n resistors each in three
different ovens.
Each oven is heated to a selected temperature. Then we
measure the resistors again after, say, 24 hours and analyze
the responses, which are the differences between before and
after being subjected to the temperatures. The temperature is
called a factor. The different temperature settings are called
levels. In this example there are three levels or settings of the
factor Temperature.
What is a
factor?
A factor is an independent treatment variable whose
settings (values) are controlled and varied by the
experimenter. The intensity setting of a factor is the level.
Levels may be quantitative numbers or, in many
cases, simply "present" or "not present" ("0" or
"1").
The 1-way In the experiment above, there is only one factor,
7.4.3. Are the means equal?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc43.htm[4/17/2013 7:13:39 PM]
ANOVA temperature, and the analysis of variance that we will be
using to analyze the effect of temperature is called a one-way
or one-factor ANOVA.
The 2-way
or 3-way
ANOVA
We could have opted to also study the effect of positions in
the oven. In this case there would be two factors, temperature
and oven position. Here we speak of a two-way or two-
factor ANOVA. Furthermore, we may be interested in a third
factor, the effect of time. Now we deal with a three-way or
three-factorANOVA. In each of these ANOVA's we test a
variety of hypotheses of equality of means (or average
responses when the factors are varied).
Hypotheses
that can be
tested in an
ANOVA
First consider the one-way ANOVA. The null hypothesis is:
there is no difference in the population means of the different
levels of factor A (the only factor).
The alternative hypothesis is: the means are not the same.
For the 2-way ANOVA, the possible null hypotheses are:
1. There is no difference in the means of factor A
2. There is no difference in means of factor B
3. There is no interaction between factors A and B
The alternative hypothesis for cases 1 and 2 is: the means are
not equal.
The alternative hypothesis for case 3 is: there is an
interaction between A and B.
For the 3-way ANOVA: The main effects are factors A, B
and C. The 2-factor interactions are: AB, AC, and BC. There
is also a three-factor interaction: ABC.
For each of the seven cases the null hypothesis is the same:
there is no difference in means, and the alternative hypothesis
is the means are not equal.
The n-way
ANOVA
In general, the number of main effects and interactions can
be found by the following expression:
The first term is for the overall mean, and is always 1. The
second term is for the number of main effects. The third term
is for the number of 2-factor interactions, and so on. The last
term is for the n-factor interaction and is always 1.
In what follows, we will discuss only the 1-way and 2-way
ANOVA.
7.4.3. Are the means equal?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc43.htm[4/17/2013 7:13:39 PM]
7.4.3.1. 1-Way ANOVA overview
http://www.itl.nist.gov/div898/handbook/prc/section4/prc431.htm[4/17/2013 7:13:40 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.3. Are the means equal?
7.4.3.1. 1-Way ANOVA overview
Overview and
principles
This section gives an overview of the one-way ANOVA.
First we explain the principles involved in the 1-way
ANOVA.
Partition
response into
components
In an analysis of variance the variation in the response
measurements is partitoned into components that
correspond to different sources of variation.
The goal in this procedure is to split the total variation in
the data into a portion due to random error and portions
due to changes in the values of the independent
variable(s).
Variance of n
measurements
The variance of n measurements is given by
where is the mean of the n measurements.
Sums of
squares and
degrees of
freedom
The numerator part is called the sum of squares of
deviations from the mean, and the denominator is called
the degrees of freedom.
The variance, after some algebra, can be rewritten as:
The first term in the numerator is called the "raw sum of
squares" and the second term is called the "correction term
for the mean". Another name for the numerator is the
"corrected sum of squares", and this is usually abbreviated
by Total SS or SS(Total).
The SS in a 1-way ANOVA can be split into two
components, called the "sum of squares of treatments" and
"sum of squares of error", abbreviated as SST and SSE,
respectively.
7.4.3.1. 1-Way ANOVA overview
http://www.itl.nist.gov/div898/handbook/prc/section4/prc431.htm[4/17/2013 7:13:40 PM]
The guiding
principle
behind
ANOVA is the
decomposition
of the sums of
squares, or
Total SS
Algebraically, this is expressed by
where k is the number of treatments and the bar over the
y.. denotes the "grand" or "overall" mean. Each n
i
is the
number of observations for treatment i. The total number of
observations is N (the sum of the n
i
).
Note on
subscripting
Don't be alarmed by the double subscripting. The total SS
can be written single or double subscripted. The double
subscript stems from the way the data are arranged in the
data table. The table is usually a rectangular array with k
columns and each column consists of n
i
rows (however, the
lengths of the rows, or the n
i
, may be unequal).
Definition of
"Treatment"
We introduced the concept of treatment. The definition is:
A treatment is a specific combination of factor levels
whose effect is to be compared with other treatments.
7.4.3.2. The 1-way ANOVA model and assumptions
http://www.itl.nist.gov/div898/handbook/prc/section4/prc432.htm[4/17/2013 7:13:41 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.3. Are the means equal?
7.4.3.2. The 1-way ANOVA model and
assumptions
A model
that
describes
the
relationship
between the
response
and the
treatment
(between
the
dependent
and
independent
variables)
The mathematical model that describes the relationship
between the response and treatment for the one-way ANOVA
is given by
where Y
ij
represents the j-th observation (j = 1, 2, ...n
i
) on the
i-th treatment (i = 1, 2, ..., k levels). So, Y
23
represents the
third observation using level 2 of the factor. is the common
effect for the whole experiment,
i
represents the i-th
treatment effect and
ij
represents the random error present in
the j-th observation on the i-th treatment.
Fixed
effects
model
The errors
ij
are assumed to be normally and independently
(NID) distributed, with mean zero and variance . is
always a fixed parameter and are considered to
be fixed parameters if the levels of the treatment are fixed,
and not a random sample from a population of possible
levels. It is also assumed that is chosen so that
holds. This is the fixed effects model.
Random
effects
model
If the k levels of treatment are chosen at random, the model
equation remains the same. However, now the
i
's are
random variables assumed to be NID(0, ). This is the
random effects model.
Whether the levels are fixed or random depends on how these
levels are chosen in a given experiment.
7.4.3.3. The ANOVA table and tests of hypotheses about means
http://www.itl.nist.gov/div898/handbook/prc/section4/prc433.htm[4/17/2013 7:13:42 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.3. Are the means equal?
7.4.3.3. The ANOVA table and tests of
hypotheses about means
Sums of
Squares help
us compute
the variance
estimates
displayed in
ANOVA
Tables
The sums of squares SST and SSE previously computed for
the one-way ANOVA are used to form two mean squares,
one for treatments and the second for error. These mean
squares are denoted by MST and MSE, respectively. These
are typically displayed in a tabular form, known as an
ANOVA Table. The ANOVA table also shows the statistics
used to test hypotheses about the population means.
Ratio of MST
and MSE
When the null hypothesis of equal means is true, the two
mean squares estimate the same quantity (error variance),
and should be of approximately equal magnitude. In other
words, their ratio should be close to 1. If the null hypothesis
is false, MST should be larger than MSE.
Divide sum of
squares by
degrees of
freedom to
obtain mean
squares
The mean squares are formed by dividing the sum of
squares by the associated degrees of freedom.
Let N = n
i
. Then, the degrees of freedom for treatment,
DFT = k - 1, and the degrees of freedom for error, DFE =
N - k.
The corresponding mean squares are:
MST = SST / DFT
MSE = SSE / DFE
The F-test The test statistic, used in testing the equality of treatment
means is: F = MST / MSE.
The critical value is the tabular value of the F distribution,
based on the chosen level and the degrees of freedom
DFT and DFE.
The calculations are displayed in an ANOVA table, as
follows:
ANOVA table
Source SS DF MS F
7.4.3.3. The ANOVA table and tests of hypotheses about means
http://www.itl.nist.gov/div898/handbook/prc/section4/prc433.htm[4/17/2013 7:13:42 PM]
Treatments SST k-1 SST / (k-1) MST/MSE
Error SSE N-k SSE / (N-k)
Total
(corrected)
SS N-1
The word "source" stands for source of variation. Some
authors prefer to use "between" and "within" instead of
"treatments" and "error", respectively.
ANOVA Table Example
A numerical
example
The data below resulted from measuring the difference in
resistance resulting from subjecting identical resistors to
three different temperatures for a period of 24 hours. The
sample size of each group was 5. In the language of Design
of Experiments, we have an experiment in which each of
three treatments was replicated 5 times.
Level 1 Level 2 Level 3
6.9 8.3 8.0
5.4 6.8 10.5
5.8 7.8 8.1
4.6 9.2 6.9
4.0 6.5 9.3
means 5.34 7.72 8.56
The resulting ANOVA table is
Example
ANOVA table
Source SS DF MS F
Treatments 27.897 2 13.949 9.59
Error 17.452 12 1.454
Total (corrected) 45.349 14
Correction Factor 779.041 1
Interpretation
of the
ANOVA table
The test statistic is the F value of 9.59. Using an of .05,
we have that F
.05; 2, 12
= 3.89 (see the F distribution table in
Chapter 1). Since the test statistic is much larger than the
critical value, we reject the null hypothesis of equal
population means and conclude that there is a (statistically)
7.4.3.3. The ANOVA table and tests of hypotheses about means
http://www.itl.nist.gov/div898/handbook/prc/section4/prc433.htm[4/17/2013 7:13:42 PM]
significant difference among the population means. The p-
value for 9.59 is .00325, so the test statistic is significant at
that level.
Techniques
for further
analysis
The populations here are resistor readings while operating
under the three different temperatures. What we do not
know at this point is whether the three means are all
different or which of the three means is different from the
other two, and by how much.
There are several techniques we might use to further
analyze the differences. These are:
constructing confidence intervals around the
difference of two means,
estimating combinations of factor levels with
confidence bounds
multiple comparisons of combinations of factor levels
tested simultaneously.
7.4.3.4. 1-Way ANOVA calculations
http://www.itl.nist.gov/div898/handbook/prc/section4/prc434.htm[4/17/2013 7:13:42 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.3. Are the means equal?
7.4.3.4. 1-Way ANOVA calculations
Formulas
for 1-way
ANOVA
hand
calculations
Although computer programs that do ANOVA calculations
now are common, for reference purposes this page describes
how to calculate the various entries in an ANOVA table.
Remember, the goal is to produce two variances (of
treatments and error) and their ratio. The various
computational formulas will be shown and applied to the data
from the previous example.
Step 1:
compute
CM
STEP 1 Compute CM, the correction for the mean.
Step 2:
compute
total SS
STEP 2 Compute the total SS.
The total SS = sum of squares of all observations - CM
The 829.390 SS is called the "raw" or "uncorrected " sum of
squares.
Step 3:
compute
SST
STEP 3 Compute SST, the treatment sum of squares.
First we compute the total (sum) for each treatment.
T
1
= (6.9) + (5.4) + ... + (4.0) = 26.7
7.4.3.4. 1-Way ANOVA calculations
http://www.itl.nist.gov/div898/handbook/prc/section4/prc434.htm[4/17/2013 7:13:42 PM]
T
2
= (8.3) + (6.8) + ... + (6.5) = 38.6
T
1
= (8.0) + (10.5) + ... + (9.3) = 42.8
Then
Step 4:
compute
SSE
STEP 4 Compute SSE, the error sum of squares.
Here we utilize the property that the treatment sum of squares
plus the error sum of squares equals the total sum of squares.
Hence, SSE = SS Total - SST = 45.349 - 27.897 = 17.45.
Step 5:
Compute
MST, MSE,
and F
STEP 5 Compute MST, MSE and their ratio, F.
MST is the mean square of treatments, MSE is the mean
square of error (MSE is also frequently denoted by ).
MST = SST / (k-1) = 27.897 / 2 = 13.949
MSE = SSE / (N-k) = 17.452/ 12 = 1.454
where N is the total number of observations and k is the
number of treatments. Finally, compute F as
F = MST / MSE = 9.59
That is it. These numbers are the quantities that are
assembled in the ANOVA table that was shown previously.
7.4.3.5. Confidence intervals for the difference of treatment means
http://www.itl.nist.gov/div898/handbook/prc/section4/prc435.htm[4/17/2013 7:13:43 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.3. Are the means equal?
7.4.3.5. Confidence intervals for the difference
of treatment means
Confidence
intervals for
the
difference
between two
means
This page shows how to construct a confidence interval
around (
i
-
j
) for the one-way ANOVA by continuing the
example shown on a previous page.
Formula for
the
confidence
interval
The formula for a (1- ) 100% confidence interval for the
difference between two treatment means is:
where = MSE.
Computation
of the
confidence
interval for
3
-
1
For the example, we have the following quantities for the
formula:
3
= 8.56
1
= 5.34
t
0.975, 12
= 2.179
Substituting these values yields (8.56 - 5.34) 2.179(0.763)
or 3.22 1.616.
That is, the confidence interval is from 1.604 to 4.836.
Additional
95%
confidence
intervals
A 95% confidence interval for
3
-
2
is: from -1.787 to
3.467.
A 95% confidence interval for
2
-
1
is: from -0.247 to
5.007.
Contrasts Later on the topic of estimating more general linear
7.4.3.5. Confidence intervals for the difference of treatment means
http://www.itl.nist.gov/div898/handbook/prc/section4/prc435.htm[4/17/2013 7:13:43 PM]
discussed
later
combinations of means (primarily contrasts) will be
discussed, including how to put confidence bounds around
contrasts.
7.4.3.6. Assessing the response from any factor combination
http://www.itl.nist.gov/div898/handbook/prc/section4/prc436.htm[4/17/2013 7:13:44 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.3. Are the means equal?
7.4.3.6. Assessing the response from any factor
combination
Contrasts This page treats how to estimate and put confidence bounds
around the response to different combinations of factors.
Primary focus is on the combinations that are known as
contrasts. We begin, however, with the simple case of a
single factor-level mean.
Estimation of a Factor Level Mean With Confidence
Bounds
Estimating
factor level
means
An unbiased estimator of the factor level mean
i
in the 1-
way ANOVA model is given by:
where
Variance of
the factor
level means
The variance of this sample mean estimator is
Confidence
intervals for
the factor
level means
It can be shown that:
has a t distribution with (N - k) degrees of freedom for the
ANOVA model under consideration, where N is the total
number of observations and k is the number of factor levels
or groups. The degrees of freedom are the same as were
used to calculate the MSE in the ANOVA table. That is: dfe
(degrees of freedom for error) = N - k. From this we can
calculate (1- )100% confidence limits for each
i
. These
are given by:
7.4.3.6. Assessing the response from any factor combination
http://www.itl.nist.gov/div898/handbook/prc/section4/prc436.htm[4/17/2013 7:13:44 PM]
Example 1
Example for
a 4-level
treatment (or
4 different
treatments)
The data in the accompanying table resulted from an
experiment run in a completely randomized design in which
each of four treatments was replicated five times.
Total Mean
Group 1 6.9 5.4 5.8 4.6 4.0 26.70 5.34
Group 2 8.3 6.8 7.8 9.2 6.5 38.60 7.72
Group 3 8.0 10.5 8.1 6.9 9.3 42.80 8.56
Group 4 5.8 3.8 6.1 5.6 6.2 27.50 5.50
All Groups 135.60 6.78
1-Way
ANOVA
table layout
This experiment can be illustrated by the table layout for
this 1-way ANOVA experiment shown below:
Level Sample j
i 1 2 ... 5 Sum Mean N
1 Y
11
Y
12
... Y
15
Y
1. 1.
n
1
2 Y
21
Y
22
... Y
25
Y
2. 2.
n
2
3 Y
31
Y
32
... Y
35
Y
3. 3.
n
3
4 Y
41
Y
42
... Y
45
Y
4. 4.
n
4
All Y
. ..
n
t
ANOVA
table
The resulting ANOVA table is
Source SS DF MS F
Treatments 38.820 3 12.940 9.724
Error 21.292 16 1.331
Total (Corrected) 60.112 19
Mean 919.368 1
Total (Raw) 979.480 20
The estimate for the mean of group 1 is 5.34, and the
sample size is n
1
= 5.
7.4.3.6. Assessing the response from any factor combination
http://www.itl.nist.gov/div898/handbook/prc/section4/prc436.htm[4/17/2013 7:13:44 PM]
Computing
the
confidence
interval
Since the confidence interval is two-sided, the entry (1 -
o/2) value for the t table is (1 - 0.05/2) = 0.975, and the
associated degrees of freedom is N - 4, or 20 - 4 = 16.
From the t table in Chapter 1, we obtain t
0.975;16
= 2.120.
Next we need the standard error of the mean for group 1:
Hence, we obtain confidence limits 5.34 2.120 (0.5159)
and the confidence interval is
Definition and Estimation of Contrasts
Definition of
contrasts and
orthogonal
contrasts
Definitions
A contrast is a linear combination of 2 or more factor level
means with coefficients that sum to zero.
Two contrasts are orthogonal if the sum of the products of
corresponding coefficients (i.e., coefficients for the same
means) adds to zero.
Formally, the definition of a contrast is expressed below,
using the notation
i
for the i-th treatment mean:
C = c
1 1
+ c
2 2
+ ... + c
j j
+ ... + c
kk k
where
c
1
+ c
2
+ ... + c
j
+ ... + c
k
= = 0
Simple contrasts include the case of the difference between
two factor means, such as
1
-
2
. If one wishes to compare
treatments 1 and 2 with treatment 3, one way of expressing
this is by:
1
+
2
- 2
3
. Note that
1
-
2
has coefficients +1, -1
1
+
2
- 2
3
has coefficients +1, +1, -2.
These coefficients sum to zero.
7.4.3.6. Assessing the response from any factor combination
http://www.itl.nist.gov/div898/handbook/prc/section4/prc436.htm[4/17/2013 7:13:44 PM]
An example
of
orthogonal
contrasts
As an example of orthogonal contrasts, note the three
contrasts defined by the table below, where the rows denote
coefficients for the column treatment means.
1 2 3 4
c
1
+1 0 0 -1
c
2
0 +1 -1 0
c
3
+1 -1 -1 +1
Some
properties of
orthogonal
contrasts
The following is true:
1. The sum of the coefficients for each contrast is zero.
2. The sum of the products of coefficients of each pair
of contrasts is also 0 (orthogonality property).
3. The first two contrasts are simply pairwise
comparisons, the third one involves all the treatments.
Estimation of
contrasts
As might be expected, contrasts are estimated by taking the
same linear combination of treatment mean estimators. In
other words:
and
Note: These formulas hold for any linear combination of
treatment means, not just for contrasts.
Confidence Interval for a Contrast
Confidence
intervals for
contrasts
An unbiased estimator for a contrast C is given by
7.4.3.6. Assessing the response from any factor combination
http://www.itl.nist.gov/div898/handbook/prc/section4/prc436.htm[4/17/2013 7:13:44 PM]
The estimator of is
The estimator is normally distributed because it is a
linear combination of independent normal random variables.
It can be shown that:
is distributed as t
N-r
for the one-way ANOVA model under
discussion.
Therefore, the 1- confidence limits for C are:
Example 2 (estimating contrast)
Contrast to
estimate
We wish to estimate, in our previous example, the
following contrast:
and construct a 95 % confidence interval for C.
Computing
the point
estimate and
standard
error
The point estimate is:
Applying the formulas above we obtain
and
and the standard error is = 0.5159.
Confidence
interval
For a confidence coefficient of 95 % and df = 20 - 4 = 16,
t
0.975,16
= 2.12. Therefore, the desired 95 % confidence
interval is -0.5 2.12(0.5159) or
7.4.3.6. Assessing the response from any factor combination
http://www.itl.nist.gov/div898/handbook/prc/section4/prc436.htm[4/17/2013 7:13:44 PM]
(-1.594, 0.594).
Estimation of Linear Combinations
Estimating
linear
combinations
Sometimes we are interested in a linear combination of the
factor-level means that is not a contrast. Assume that in our
sample experiment certain costs are associated with each
group. For example, there might be costs associated with
each factor as follows:
Factor Cost in $
1 3
2 5
3 2
4 1
The following linear combination might then be of interest:
Coefficients
do not have
to sum to
zero for
linear
combinations
This resembles a contrast, but the coefficients c
i
do not
sum to zero. A linear combination is given by the
definition:
with no restrictions on the coefficients c
i
.
Confidence
interval
identical to
contrast
Confidence limits for a linear combination C are obtained in
precisely the same way as those for a contrast, using the
same calculation for the point estimator and estimated
variance.
7.4.3.7. The two-way ANOVA
http://www.itl.nist.gov/div898/handbook/prc/section4/prc437.htm[4/17/2013 7:13:45 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.3. Are the means equal?
7.4.3.7. The two-way ANOVA
Definition
of a
factorial
experiment
The 2-way ANOVA is probably the most popular layout in
the Design of Experiments. To begin with, let us define a
factorial experiment:
An experiment that utilizes every combination of factor
levels as treatments is called a factorial experiment.
Model for
the two-
way
factorial
experiment
In a factorial experiment with factor A at a levels and factor
B at b levels, the model for the general layout can be written
as
where is the overall mean response,
i
is the effect due to
the i-th level of factor A,
j
is the effect due to the j-th level
of factor B and
ij
is the effect due to any interaction
between the i-th level of A and the j-th level of B.
Fixed
factors and
fixed effects
models
At this point, consider the levels of factor A and of factor B
chosen for the experiment to be the only levels of interest to
the experimenter such as predetermined levels for
temperature settings or the length of time for process step.
The factors A and B are said to be fixed factors and the
model is a fixed-effects model. Random actors will be
discussed later.
When an a x b factorial experiment is conducted with an
equal number of observations per treatment combination, the
total (corrected) sum of squares is partitioned as:
SS(total) = SS(A) + SS(B) + SS(AB) + SSE
where AB represents the interaction between A and B.
For reference, the formulas for the sums of squares are:
7.4.3.7. The two-way ANOVA
http://www.itl.nist.gov/div898/handbook/prc/section4/prc437.htm[4/17/2013 7:13:45 PM]
The
breakdown
of the total
(corrected
for the
mean) sums
of squares
The resulting ANOVA table for an a x b factorial experiment
is
Source SS df MS
Factor A SS(A) (a - 1) MS(A) = SS(A)/(a-
1)
Factor B SS(B) (b - 1) MS(B) = SS(B)/(b-
1)
Interaction AB SS(AB) (a-1)(b-
1)
MS(AB)=
SS(AB)/(a-1)(b-1)
Error SSE (N - ab) SSE/(N - ab)
Total
(Corrected)
SS(Total) (N - 1)
The
ANOVA
table can
be used to
test
hypotheses
about the
effects and
interactions
The various hypotheses that can be tested using this ANOVA
table concern whether the different levels of Factor A, or
Factor B, really make a difference in the response, and
whether the AB interaction is significant (see previous
discussion of ANOVA hypotheses).
7.4.3.8. Models and calculations for the two-way ANOVA
http://www.itl.nist.gov/div898/handbook/prc/section4/prc438.htm[4/17/2013 7:13:46 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.3. Are the means equal?
7.4.3.8. Models and calculations for the two-way
ANOVA
Basic Layout
The
balanced
2-way
factorial
layout
Factor A has 1, 2, ..., a levels. Factor B has 1, 2, ..., b levels. There are
ab treatment combinations (or cells) in a complete factorial layout.
Assume that each treatment cell has r independent obsevations (known
as replications). When each cell has the same number of replications,
the design is a balanced factorial. In this case, the abrdata points
{y
ijk
} can be shown pictorially as follows:
Factor B
1 2 ... b
1 y
111
, y
112
, ..., y
11r
y
121
, y
122
, ..., y
12r
... y
1b1
, y
1b2
, ..., y
1br
2 y
211
, y
212
, ..., y
21r
y
221
, y
222
, ..., y
22r
... y
2b1
, y
2b2
, ..., y
2br
Factor
A
.
.
... .... ...
a y
a11
, y
a12
, ..., y
a1r
y
a21
, y
a22
, ..., y
a2r
... y
ab1
, y
ab2
, ..., y
abr
How to
obtain
sums of
squares
for the
balanced
factorial
layout
Next, we will calculate the sums of squares needed for the ANOVA
table.
Let A
i
be the sum of all observations of level i of factor A, i = 1,
... ,a. The A
i
are the row sums.
Let B
j
be the sum of all observations of level j of factor B, j = 1,
...,b. The B
j
are the column sums.
Let (AB)
ij
be the sum of all observations of level i of A and
level j of B. These are cell sums.
Let r be the number of replicates in the experiment; that is: the
number of times each factorial treatment combination appears in
the experiment.
Then the total number of observations for each level of factor A is rb
and the total number of observations for each level of factor B is raand
the total number of observations for each interaction is r.
7.4.3.8. Models and calculations for the two-way ANOVA
http://www.itl.nist.gov/div898/handbook/prc/section4/prc438.htm[4/17/2013 7:13:46 PM]
Finally, the total number of observations n in the experiment is abr.
With the help of these expressions we arrive (omitting derivations) at
These expressions are used to calculate the ANOVA table entries for
the (fixed effects) 2-way ANOVA.
Two-Way ANOVA Example:
Data An evaluation of a new coating applied to 3 different materials was
conducted at 2 different laboratories. Each laboratory tested 3 samples
from each of the treated materials. The results are given in the next
table:
Materials (B)
LABS (A) 1 2 3
4.1 3.1 3.5
1 3.9 2.8 3.2
4.3 3.3 3.6
2.7 1.9 2.7
2 3.1 2.2 2.3
2.6 2.3 2.5
Row and
column
sums
The preliminary part of the analysis yields a table of row and column
sums.
Material (B)
7.4.3.8. Models and calculations for the two-way ANOVA
http://www.itl.nist.gov/div898/handbook/prc/section4/prc438.htm[4/17/2013 7:13:46 PM]
Lab (A) 1 2 3 Total (A
i
)
1 12.3 9.2 10.3 31.8
2 8.4 6.4 7.5 22.3
Total (B
j
) 20.7 15.6 17.8 54.1
ANOVA
table
From this table we generate the ANOVA table.
Source SS df MS F p-value
A 5.0139 1 5.0139 100.28 0
B 2.1811 2 1.0906 21.81 .0001
AB 0.1344 2 0.0672 1.34 .298
Error 0.6000 12 0.0500
Total (Corr) 7.9294 17
7.4.4. What are variance components?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc44.htm[4/17/2013 7:13:47 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.4. What are variance components?
Fixed and Random Factors and Components of
Variance
A fixed level
of a factor or
variable
means that
the levels in
the
experiment
are the only
ones we are
interested in
In the previous example, the levels of the factor
temperature were considered as fixed; that is, the three
temperatures were the only ones that we were interested in
(this may sound somewhat unlikely, but let us accept it
without opposition). The model employed for fixed levels is
called a fixed model. When the levels of a factor are
random, such as operators, days, lots or batches, where the
levels in the experiment might have been chosen at random
from a large number of possible levels, the model is called
a random model, and inferences are to be extended to all
levels of the population.
Random
levels are
chosen at
random from
a large or
infinite set of
levels
In a random model the experimenter is often interested in
estimating components of variance. Let us run an example
that analyzes and interprets a component of variance or
random model.
Components of Variance Example for Random Factors
Data for the
example
A company supplies a customer with a larger number of
batches of raw materials. The customer makes three sample
determinations from each of 5 randomly selected batches to
control the quality of the incoming material. The model is
and the k levels (e.g., the batches) are chosen at random
from a population with variance . The data are shown
below
Batch
1 2 3 4 5
74 68 75 72 79
7.4.4. What are variance components?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc44.htm[4/17/2013 7:13:47 PM]
76 71 77 74 81
75 72 77 73 79
ANOVA table
for example
A 1-way ANOVA is performed on the data with the
following results:
ANOVA
Source SS df MS EMS
Treatment (batches) 147.74 4 36.935 + 3
Error 17.99 10 1.799
Total (corrected) 165.73 14
Interpretation
of the
ANOVA table
The computations that produce the SS are the same for both
the fixed and the random effects model. For the random
model, however, the treatment sum of squares, SST, is an
estimate of { + 3 }. This is shown in the EMS
(Expected Mean Squares) column of the ANOVA table.
The test statistic from the ANOVA table is F = 36.94 / 1.80
= 20.5.
If we had chosen an value of .01, then the F value from
the table in Chapter 1 for a df of 4 in the numerator and 10
in the denominator is 5.99.
Method of
moments
Since the test statistic is larger than the critical value, we
reject the hypothesis of equal means. Since these batches
were chosen via a random selection process, it may be of
interest to find out how much of the variance in the
experiment might be attributed to batch diferences and how
much to random error. In order to answer these questions,
we can use the EMS column. The estimate of is 1.80 and
the computed treatment mean square of 36.94 is an estimate
of + 3 . Setting the MS values equal to the EMS values
(this is called the Method of Moments), we obtain
where we use s
2
since these are estimators of the
corresponding
2
's.
Computation
of the
Solving these expressions
7.4.4. What are variance components?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc44.htm[4/17/2013 7:13:47 PM]
components
of variance
The total variance can be estimated as
Interpretation In terms of percentages, we see that 11.71/13.51 = 86.7
percent of the total variance is attributable to batch
differences and 13.3 percent to error variability within the
batches.
7.4.5. How can we compare the results of classifying according to several categories?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc45.htm[4/17/2013 7:13:47 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.5. How can we compare the results of classifying
according to several categories?
Contingency
Table
approach
When items are classified according to two or more criteria, it is often of
interest to decide whether these criteria act independently of one another.
For example, suppose we wish to classify defects found in wafers produced
in a manufacturing plant, first according to the type of defect and, second,
according to the production shift during which the wafers were produced. If
the proportions of the various types of defects are constant from shift to
shift, then classification by defects is independent of the classification by
production shift. On the other hand, if the proportions of the various defects
vary from shift to shift, then the classification by defects depends upon or is
contingent upon the shift classification and the classifications are dependent.
In the process of investigating whether one method of classification is
contingent upon another, it is customary to display the data by using a cross
classification in an array consisting of r rows and c columns called a
contingency table. A contingency table consists of r x c cells representing
the r x c possible outcomes in the classification process. Let us construct an
industrial case:
Industrial
example
A total of 309 wafer defects were recorded and the defects were classified as
being one of four types, A, B, C, or D. At the same time each wafer was
identified according to the production shift in which it was manufactured, 1,
2, or 3.
Contingency
table
classifying
defects in
wafers
according to
type and
production
shift
These counts are presented in the following table.
Type of Defects
Shift A B C D Total
1 15(22.51) 21(20.99) 45(38.94) 13(11.56) 94
2 26(22.9) 31(21.44) 34(39.77) 5(11.81) 96
3 33(28.50) 17(26.57) 49(49.29) 20(14.63) 119
Total 74 69 128 38 309
7.4.5. How can we compare the results of classifying according to several categories?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc45.htm[4/17/2013 7:13:47 PM]
(Note: the numbers in parentheses are the expected cell frequencies).
Column
probabilities
Let p
A
be the probability that a defect will be of type A. Likewise, define p
B
,
p
C
, and p
D
as the probabilities of observing the other three types of defects.
These probabilities, which are called the column probabilities, will satisfy
the requirement
p
A
+ p
B
+ p
C
+ p
D
= 1
Row
probabilities
By the same token, let p
i
(i=1, 2, or 3) be the row probability that a defect
will have occurred during shift i, where
p
1
+ p
2
+ p
3
= 1
Multiplicative
Law of
Probability
Then if the two classifications are independent of each other, a cell
probability will equal the product of its respective row and column
probabilities in accordance with the Multiplicative Law of Probability.
Example of
obtaining
column and
row
probabilities
For example, the probability that a particular defect will occur in shift 1 and
is of type A is (p
1
) (p
A
). While the numerical values of the cell probabilities
are unspecified, the null hypothesis states that each cell probability will equal
the product of its respective row and column probabilities. This condition
implies independence of the two classifications. The alternative hypothesis is
that this equality does not hold for at least one cell.
In other words, we state the null hypothesis as H
0
: the two classifications are
independent, while the alternative hypothesis is H
a
: the classifications are
dependent.
To obtain the observed column probability, divide the column total by the
grand total, n. Denoting the total of column j as c
j
, we get
Similarly, the row probabilities p
1
, p
2
, and p
3
are estimated by dividing the
row totals r
1
, r
2
, and r
3
by the grand total n, respectively
Expected cell
frequencies
Denote the observed frequency of the cell in row i and column jof the
contingency table by n
ij
. Then we have
7.4.5. How can we compare the results of classifying according to several categories?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc45.htm[4/17/2013 7:13:47 PM]
Estimated
expected cell
frequency
when H
0
is
true.
In other words, when the row and column classifications are independent, the
estimated expected value of the observed cell frequency n
ij
in an r x c
contingency table is equal to its respective row and column totals divided by
the total frequency.
The estimated cell frequencies are shown in parentheses in the contingency
table above.
Test statistic From here we use the expected and observed frequencies shown in the table
to calculate the value of the test statistic
df = (r-1)(c-
1)
The next step is to find the appropriate number of degrees of freedom
associated with the test statistic. Leaving out the details of the derivation, we
state the result:
The number of degrees of freedom associated with a contingency
table consisting of r rows and c columns is (r-1) (c-1).
So for our example we have (3-1) (4-1) = 6 d.f.
Testing the
null
hypothesis
In order to test the null hypothesis, we compare the test statistic with the
critical value of X
2
1-o/2
at a selected value of o. Let us use o = 0.05. Then
the critical value is X
2
0.95,6
= 12.5916 (see the chi square table in Chapter
1). Since the test statistic of 19.18 exceeds the critical value, we reject the
null hypothesis and conclude that there is significant evidence that the
proportions of the different defect types vary from shift to shift. In this case,
the p-value of the test statistic is 0.00387.
7.4.6. Do all the processes have the same proportion of defects?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc46.htm[4/17/2013 7:13:48 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.6. Do all the processes have the same
proportion of defects?
The contingency table approach
Testing for
homogeneity
of proportions
using the chi-
square
distribution
via
contingency
tables
When we have samples from n populations (i.e., lots,
vendors, production runs, etc.), we can test whether there
are significant differences in the proportion defectives for
these populations using a contingency table approach. The
contingency table we construct has two rows and n
columns.
To test the null hypothesis of no difference in the
proportions among the n populations
H
0
: p
1
= p
2
= ... = p
n
against the alternative that not all n population proportions
are equal
H
1
: Not all p
i
are equal (i = 1, 2, ..., n)
The chi-square
test statistic
we use the following test statistic:
where f
o
is the observed frequency in a given cell of a 2 x
n contingency table, and f
c
is the theoretical count or
expected frequency in a given cell if the null hypothesis
were true.
The critical
value
The critical value is obtained from the
2
distribution
table with degrees of freedom (2-1)(n-1) = n-1, at a given
level of significance.
An illustrative example
Data for the
example
Diodes used on a printed circuit board are produced in lots
of size 4000. To study the homogeneity of lots with
7.4.6. Do all the processes have the same proportion of defects?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc46.htm[4/17/2013 7:13:48 PM]
respect to a demanding specification, we take random
samples of size 300 from 5 consecutive lots and test the
diodes. The results are:
Lot
Results 1 2 3 4 5 Totals
Nonconforming 36 46 42 63 38 225
Conforming 264 254 258 237 262 1275
Totals 300 300 300 300 300 1500
Computation
of the overall
proportion of
nonconforming
units
Assuming the null hypothesis is true, we can estimate the
single overall proportion of nonconforming diodes by
pooling the results of all the samples as
Computation
of the overall
proportion of
conforming
units
We estimate the proportion of conforming ("good") diodes
by the complement 1 - 0.15 = 0.85. Multiplying these two
proportions by the sample sizes used for each lot results in
the expected frequencies of nonconforming and
conforming diodes. These are presented below:
Table of
expected
frequencies
Lot
Results 1 2 3 4 5 Totals
Nonconforming 45 45 45 45 45 225
Conforming 255 255 255 255 255 1275
Totals 300 300 300 300 300 1500
Null and
alternate
hypotheses
To test the null hypothesis of homogeneity or equality of
proportions
H
0
: p
1
= p
2
= ... = p
5
against the alternative that not all 5 population proportions
are equal
H
1
: Not all p
i
are equal (i = 1, 2, ...,5)
Table for
computing the
test statistic
we use the observed and expected values from the tables
above to compute the
2
test statistic. The calculations are
presented below:
2 2
7.4.6. Do all the processes have the same proportion of defects?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc46.htm[4/17/2013 7:13:48 PM]
f
o
f
c
(f
o
- f
c
)
(f
o
- f
c
) (f
o
- f
c
) / f
c
36 45 -9 81 1.800
46 45 1 1 0.022
42 45 -3 9 0.200
63 45 18 324 7.200
38 45 -7 49 1.089
264 225 9 81 0.318
254 255 -1 1 0.004
258 255 3 9 0.035
237 255 -18 324 1.271
262 255 7 49 0.192
12.131
Conclusions If we choose a .05 level of significance, the critical value
of
2
with 4 degrees of freedom is 9.488 (see the chi
square distribution table in Chapter 1). Since the test
statistic (12.131) exceeds this critical value, we reject the
null hypothesis.
7.4.7. How can we make multiple comparisons?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc47.htm[4/17/2013 7:13:49 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.7. How can we make multiple comparisons?
What to do
after
equality of
means is
rejected
When processes are compared and the null hypothesis of
equality (or homogeneity) is rejected, all we know at that
point is that there is no equality amongst them. But we do
not know the form of the inequality.
Typical
questions
Questions concerning the reason for the rejection of the null
hypothesis arise in the form of:
"Which mean(s) or proportion (s) differ from a
standard or from each other?"
"Does the mean of treatment 1 differ from that of
treatment 2?"
"Does the average of treatments 1 and 2 differ from
the average of treatments 3 and 4?"
Multiple
Comparison
test
procedures
are needed
One popular way to investigate the cause of rejection of the
null hypothesis is a Multiple Comparison Procedure. These
are methods which examine or compare more than one pair
of means or proportions at the same time.
Note: Doing pairwise comparison procedures over and over
again for all possible pairs will not, in general, work. This is
because the overall significance level is not as specified for
a single pair comparison.
ANOVA F
test is a
preliminary
test
The ANOVA uses the F test to determine whether there
exists a significant difference among treatment means or
interactions. In this sense it is a preliminary test that informs
us if we should continue the investigation of the data at
hand.
If the null hypothesis (no difference among treatments or
interactions) is accepted, there is an implication that no
relation exists between the factor levels and the response.
There is not much we can learn, and we are finished with the
analysis.
When the F test rejects the null hypothesis, we usually want
to undertake a thorough analysis of the nature of the factor-
7.4.7. How can we make multiple comparisons?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc47.htm[4/17/2013 7:13:49 PM]
level effects.
Procedures
for
examining
factor-level
effects
Previously, we discussed several procedures for examining
particular factor-level effects. These were
Estimation of the Difference Between Two Factor
Means
Estimation of Factor Level Effects
Confidence Intervals For A Contrast
Determine
contrasts in
advance of
observing
the
experimental
results
These types of investigations should be done on
combinations of factors that were determined in advance of
observing the experimental results, or else the confidence
levels are not as specified by the procedure. Also, doing
several comparisons might change the overall confidence
level (see note above). This can be avoided by carefully
selecting contrasts to investigate in advance and making sure
that:
the number of such contrasts does not exceed the
number of degrees of freedom between the treatments
only orthogonal contrasts are chosen.
However, there are also several powerful multiple
comparison procedures we can use after observing the
experimental results.
Tests on Means after Experimentation
Procedures
for
performing
multiple
comparisons
If the decision on what comparisons to make is withheld
until after the data are examined, the following procedures
can be used:
Tukey's Method to test all possible pairwise
differences of means to determine if at least one
difference is significantly different from 0.
Scheff's Method to test all possible contrasts at the
same time, to see if at least one is significantly
different from 0.
Bonferroni Method to test, or put simultaneous
confidence intervals around, a pre-selected group of
contrasts
Multiple Comparisons Between Proportions
Procedure
for
proportion
defective
data
When we are dealing with population proportion defective
data, the Marascuilo procedure can be used to
simultaneously examine comparisons between all groups
after the data have been collected.
7.4.7. How can we make multiple comparisons?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc47.htm[4/17/2013 7:13:49 PM]
7.4.7.1. Tukey's method
http://www.itl.nist.gov/div898/handbook/prc/section4/prc471.htm[4/17/2013 7:13:50 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.7. How can we make multiple comparisons?
7.4.7.1. Tukey's method
Tukey's
method
considers
all possible
pairwise
differences
of means at
the same
time
The Tukey method applies simultaneously to the set of all
pairwise comparisons
{
i
-
j
}
The confidence coefficient for the set, when all sample sizes
are equal, is exactly 1- . For unequal sample sizes, the
confidence coefficient is greater than 1- . In other words,
the Tukey method is conservative when there are unequal
sample sizes.
Studentized Range Distribution
The
studentized
range q
The Tukey method uses the studentized range distribution.
Suppose we have r independent observations y
1
, ..., y
r
from
a normal distribution with mean and variance
2
. Let w be
the range for this set , i.e., the maximum minus the
minimum. Now suppose that we have an estimate s
2
of the
variance
2
which is based on degrees of freedom and is
independent of the y
i
. The studentized range is defined as
The
distribution
of q is
tabulated in
many
textbooks
and can be
calculated
using
Dataplot
The distribution of q has been tabulated and appears in many
textbooks on statistics. In addition, Dataplot has a CDF
function (SRACDF) and a percentile function (SRAPPF) for
q.
As an example, let r = 5 and = 10. The 95th percentile is
q
.05;5,10
= 4.65. This means:
So, if we have five observations from a normal distribution,
the probability is .95 that their range is not more than 4.65
times as great as an independent sample standard deviation
estimate for which the estimator has 10 degrees of freedom.
7.4.7.1. Tukey's method
http://www.itl.nist.gov/div898/handbook/prc/section4/prc471.htm[4/17/2013 7:13:50 PM]
Tukey's Method
Confidence
limits for
Tukey's
method
The Tukey confidence limits for all pairwise comparisons
with confidence coefficient of at least 1- are:
Notice that the point estimator and the estimated variance are
the same as those for a single pairwise comparison that was
illustrated previously. The only difference between the
confidence limits for simultaneous comparisons and those for
a single comparison is the multiple of the estimated standard
deviation.
Also note that the sample sizes must be equal when using the
studentized range approach.
Example
Data We use the data from a previous example.
Set of all
pairwise
comparisons
The set of all pairwise comparisons consists of:
2
-
1
,
3
-
1
,
1
-
4
,
2
-
3
,
2
-
4
,
3
-
4
Confidence
intervals for
each pair
Assume we want a confidence coefficient of 95 percent, or
.95. Since r = 4 and n
t
= 20, the required percentile of the
studentized range distribution is q
.05; 4,16
. Using the Tukey
method for each of the six comparisons yields:
Conclusions The simultaneous pairwise comparisons indicate that the
differences
1
-
4
and
2
-
3
are not significantly different
from 0 (their confidence intervals include 0), and all the
other pairs are significantly different.
Unequal
sample sizes
It is possible to work with unequal sample sizes. In this case,
one has to calculate the estimated standard deviation for each
7.4.7.1. Tukey's method
http://www.itl.nist.gov/div898/handbook/prc/section4/prc471.htm[4/17/2013 7:13:50 PM]
pairwise comparison. The Tukey procedure for unequal
sample sizes is sometimes referred to as the Tukey-Kramer
Method.
7.4.7.2. Scheffe's method
http://www.itl.nist.gov/div898/handbook/prc/section4/prc472.htm[4/17/2013 7:13:50 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.7. How can we make multiple comparisons?
7.4.7.2. Scheffe's method
Scheffe's
method tests
all possible
contrasts at
the same
time
Scheff's method applies to the set of estimates of all
possible contrasts among the factor level means, not just the
pairwise differences considered by Tukey's method.
Definition of
contrast
An arbitrary contrast is defined by
where
Infinite
number of
contrasts
Technically there is an infinite number of contrasts. The
simultaneous confidence coefficient is exactly 1- , whether
the factor level sample sizes are equal or unequal.
Estimate and
variance for
C
As was described earlier, we estimate C by:
for which the estimated variance is:
Simultaneous
confidence
interval
It can be shown that the probability is 1 - that all
confidence limits of the type
7.4.7.2. Scheffe's method
http://www.itl.nist.gov/div898/handbook/prc/section4/prc472.htm[4/17/2013 7:13:50 PM]
are correct simultaneously.
Scheffe method example
Contrasts to
estimate
We wish to estimate, in our previous experiment, the
following contrasts
and construct 95 percent confidence intervals for them.
Compute the
point
estimates of
the
individual
contrasts
The point estimates are:
Compute the
point
estimate and
variance of
C
Applying the formulas above we obtain in both cases:
and
where = 1.331 was computed in our previous example.
The standard error = .5158 (square root of .2661).
Scheffe
confidence
interval
For a confidence coefficient of 95 percent and degrees of
freedom in the numerator of r - 1 = 4 - 1 = 3, and in the
denominator of 20 - 4 = 16, we have:
The confidence limits for C
1
are -.5 3.12(.5158) = -.5
1.608, and for C
2
they are .34 1.608.
The desired simultaneous 95 percent confidence intervals
are
-2.108 C
1
1.108
7.4.7.2. Scheffe's method
http://www.itl.nist.gov/div898/handbook/prc/section4/prc472.htm[4/17/2013 7:13:50 PM]
-1.268 C
2
1.948
Comparison
to confidence
interval for a
single
contrast
Recall that when we constructed a confidence interval for a
single contrast, we found the 95 percent confidence interval:
-1.594 C 0.594
As expected, the Scheff confidence interval procedure that
generates simultaneous intervals for all contrasts is
considerabley wider.
Comparison of Scheff's Method with Tukey's Method
Tukey
preferred
when only
pairwise
comparisons
are of
interest
If only pairwise comparisons are to be made, the Tukey
method will result in a narrower confidence limit, which is
preferable.
Consider for example the comparison between
3
and
1
.
Tukey: 1.13 <
3
-
1
< 5.31
Scheff: 0.95 <
3
-
1
< 5.49
which gives Tukey's method the edge.
The normalized contrast, using sums, for the Scheff method
is 4.413, which is close to the maximum contrast.
Scheffe
preferred
when many
contrasts are
of interest
In the general case when many or all contrasts might be of
interest, the Scheff method tends to give narrower
confidence limits and is therefore the preferred method.
7.4.7.3. Bonferroni's method
http://www.itl.nist.gov/div898/handbook/prc/section4/prc473.htm[4/17/2013 7:13:51 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.7. How can we make multiple comparisons?
7.4.7.3. Bonferroni's method
Simple
method
The Bonferroni method is a simple method that allows
many comparison statements to be made (or confidence
intervals to be constructed) while still assuring an overall
confidence coefficient is maintained.
Applies for a
finite number
of contrasts
This method applies to an ANOVA situation when the
analyst has picked out a particular set of pairwise
comparisons or contrasts or linear combinations in advance.
This set is not infinite, as in the Scheff case, but may
exceed the set of pairwise comparisons specified in the
Tukey procedure.
Valid for
both equal
and unequal
sample sizes
The Bonferroni method is valid for equal and unequal
sample sizes. We restrict ourselves to only linear
combinations or comparisons of treatment level means
(pairwise comparisons and contrasts are special cases of
linear combinations). We denote the number of statements
or comparisons in the finite set by g.
Bonferroni
general
inequality
Formally, the Bonferroni general inequality is presented by:
where A
i
and its complement are any events.
Interpretation
of Bonferroni
inequality
In particular, if each A
i
is the event that a calculated
confidence interval for a particular linear combination of
treatments includes the true value of that combination, then
the left-hand side of the inequality is the probability that all
the confidence intervals simultaneously cover their
respective true values. The right-hand side is one minus the
sum of the probabilities of each of the intervals missing
their true values. Therefore, if simultaneous multiple
interval estimates are desired with an overall confidence
coefficient 1- , one can construct each interval with
confidence coefficient (1- /g), and the Bonferroni
inequality insures that the overall confidence coefficient is
at least 1- .
7.4.7.3. Bonferroni's method
http://www.itl.nist.gov/div898/handbook/prc/section4/prc473.htm[4/17/2013 7:13:51 PM]
Formula for
Bonferroni
confidence
interval
In summary, the Bonferroni method states that the
confidence coefficient is at least 1- that simultaneously all
the following confidence limits for the g linear
combinations C
i
are "correct" (or capture their respective
true values):
where
Example using Bonferroni method
Contrasts to
estimate
We wish to estimate, as we did using the Scheffe method,
the following linear combinations (contrasts):
and construct 95 % confidence intervals around the
estimates.
Compute the
point
estimates of
the individual
contrasts
The point estimates are:
Compute the
point
estimate and
variance of C
As before, for both contrasts, we have
and
where = 1.331 was computed in our previous example.
The standard error is .5158 (the square root of .2661).
7.4.7.3. Bonferroni's method
http://www.itl.nist.gov/div898/handbook/prc/section4/prc473.htm[4/17/2013 7:13:51 PM]
Compute the
Bonferroni
simultaneous
confidence
interval
For a 95 % overall confidence coefficient using the
Bonferroni method, the t value is t
1-0.05/(2*2),16
= t
0.9875,16
= 2.473 (from the t table in Chapter 1). Now we can
calculate the confidence intervals for the two contrasts. For
C
1
we have confidence limits -0.5 2.473 (.5158) and for
C
2
we have confidence limits 0.34 2.473 (0.5158).
Thus, the confidence intervals are:
-1.776 C
1
0.776
-0.936 C
2
1.616
Comparison
to Scheffe
interval
Notice that the Scheff interval for C
1
is:
-2.108 C
1
1.108
which is wider and therefore less attractive.
Comparison of Bonferroni Method with Scheff and
Tukey Methods
No one
comparison
method is
uniformly
best - each
has its uses
1. If all pairwise comparisons are of interest, Tukey has
the edge. If only a subset of pairwise comparisons are
required, Bonferroni may sometimes be better.
2. When the number of contrasts to be estimated is
small, (about as many as there are factors) Bonferroni
is better than Scheff. Actually, unless the number of
desired contrasts is at least twice the number of
factors, Scheff will always show wider confidence
bands than Bonferroni.
3. Many computer packages include all three methods.
So, study the output and select the method with the
smallest confidence band.
4. No single method of multiple comparisons is
uniformly best among all the methods.
7.4.7.4. Comparing multiple proportions: The Marascuillo procedure
http://www.itl.nist.gov/div898/handbook/prc/section4/prc474.htm[4/17/2013 7:13:52 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.7. How can we make multiple comparisons?
7.4.7.4. Comparing multiple proportions: The
Marascuillo procedure
Testing for
equal
proportions of
defects
Earlier, we discussed how to test whether several
populations have the same proportion of defects. The
example given there led to rejection of the null hypothesis
of equality.
Marascuilo
procedure
allows
comparison of
all possible
pairs of
proportions
Rejecting the null hypothesis only allows us to conclude
that not (in this case) all lots are equal with respect to the
proportion of defectives. However, it does not tell us which
lot or lots caused the rejection.
The Marascuilo procedure enables us to simultaneously test
the differences of all pairs of proportions when there are
several populations under investigation.
The Marascuillo Procedure
Step 1:
compute
differences p
i
- p
j
Assume we have samples of size n
i
(i = 1, 2, ..., k) from k
populations. The first step of this procedure is to compute
the differences p
i
- p
j
, (where i is not equal to j) among all
k(k-1)/2 pairs of proportions.
The absolute values of these differences are the test-
statistics.
Step 2:
compute test
statistics
Step 2 is to pick a significance level and compute the
corresponding critical values for the Marascuilo procedure
from
Step 3:
compare test
statistics
against
corresponding
critical values
The third and last step is to compare each of the k(k-1)/2
test statistics against its corresponding critical r
ij
value.
Those pairs that have a test statistic that exceeds the
critical value are significant at the level.
7.4.7.4. Comparing multiple proportions: The Marascuillo procedure
http://www.itl.nist.gov/div898/handbook/prc/section4/prc474.htm[4/17/2013 7:13:52 PM]
Example
Sample
proportions
To illustrate the Marascuillo procedure, we use the data
from the previous example. Since there were 5 lots, there
are (5 x 4)/2 = 10 possible pairwise comparisons to be
made and ten critical ranges to compute. The five sample
proportions are:
p
1
= 36/300 = .120
p
2
= 46/300 = .153
p
3
= 42/300 = .140
p
4
= 63/300 = .210
p
5
= 38/300 = .127
Table of
critical values
For an overall level of significance of 0.05, the critical
value of the chi-square distribution having four degrees of
freedom is X
2
0.95,4
= 9.488 and the square root of 9.488 is
3.080. Calculating the 10 absolute differences and the 10
critical values leads to the following summary table.
contrast value critical range significant
|p
1
- p
2
| .033 0.086 no
|p
1
- p
3
| .020 0.085 no
|p
1
- p
4
| .090 0.093 no
|p
1
- p
5
| .007 0.083 no
|p
2
- p
3
| .013 0.089 no
|p
2
- p
4
| .057 0.097 no
|p
2
- p
5
| .026 0.087 no
|p
3
- p
4
| .070 0.095 no
|p
3
- p
5
| .013 0.086 no
|p
4
- p
5
| .083 0.094 no
The table of critical values can be generated using both
Dataplot code and R code.
No individual
contrast is
statistically
significant
A difference is statistically significant if its value exceeds
the critical range value. In this example, even though the
null hypothesis of equality was rejected earlier, there is not
enough data to conclude any particular difference is
significant. Note, however, that all the comparisons
involving population 4 come the closest to significance -
leading us to suspect that more data might actually show
that population 4 does have a significantly higher
proportion of defects.
7.4.7.4. Comparing multiple proportions: The Marascuillo procedure
http://www.itl.nist.gov/div898/handbook/prc/section4/prc474.htm[4/17/2013 7:13:52 PM]
7.5. References
http://www.itl.nist.gov/div898/handbook/prc/section5/prc5.htm[4/17/2013 7:13:52 PM]
7. Product and Process Comparisons
7.5. References
Primary
References
Agresti, A. and Coull, B. A. (1998). Approximate is better
than "exact" for interval estimation of binomial proportions",
The American Statistician, 52(2), 119-126.
Berenson M.L. and Levine D.M. (1996) Basic Business
Statistics, Prentice-Hall, Englewood Cliffs, New Jersey.
Bhattacharyya, G. K., and R. A. Johnson, (1997). Statistical
Concepts and Methods, John Wiley and Sons, New York.
Birnbaum, Z. W. (1952). "Numerical tabulation of the
distribution of Kolmogorov's statistic for finite sample size",
Journal of the American Statistical Association, 47, page 425.
Brown, L. D. Cai, T. T. and DasGupta, A. (2001). Interval
estimation for a binomial proportion", Statistical Science,
16(2), 101-133.
Diamond, W. J. (1989). Practical Experiment Designs, Van-
Nostrand Reinhold, New York.
Dixon, W. J. and Massey, F.J. (1969). Introduction to
Statistical Analysis, McGraw-Hill, New York.
Draper, N. and Smith, H., (1981). Applied Regression
Analysis, John Wiley & Sons, New York.
Fliess, J. L., Levin, B. and Paik, M. C. (2003). Statistical
Methods for Rates and Proportions, Third Edition, John Wiley
& Sons, New York.
Hahn, G. J. and Meeker, W. Q. (1991). Statistical Intervals: A
Guide for Practitioners, John Wiley & Sons, New York.
Hicks, C. R. (1973). Fundamental Concepts in the Design of
Experiments, Holt, Rinehart and Winston, New York.
Hollander, M. and Wolfe, D. A. (1973). Nonparametric
Statistical Methods, John Wiley & Sons, New York.
Howe, W. G. (1969). "Two-sided Tolerance Limits for Normal
Populations - Some Improvements", Journal of the Americal
Statistical Association, 64 , pages 610-620.
Kendall, M. and Stuart, A. (1979). The Advanced Theory of
8. Assessing Product Reliability
http://www.itl.nist.gov/div898/handbook/apr/apr.htm[4/17/2013 6:55:32 PM]
8. Assessing Product Reliability
This chapter describes the terms, models and techniques used to evaluate
and predict product reliability.
1. Introduction
1. Why important?
2. Basic terms and models
3. Common difficulties
4. Modeling "physical
acceleration"
5. Common acceleration models
6. Basic non-repairable lifetime
distributions
7. Basic models for repairable
systems
8. Evaluate reliability "bottom-
up"
9. Modeling reliability growth
10. Bayesian methodology
2. Assumptions/Prerequisites
1. Choosing appropriate life
distribution
2. Plotting reliability data
3. Testing assumptions
4. Choosing a physical
acceleration model
5. Models and assumptions for
Bayesian methods
3. Reliability Data Collection
1. Planning reliability assessment
tests
4. Reliability Data Analysis
1. Estimating parameters from
censored data
2. Fitting an acceleration model
3. Projecting reliability at use
conditions
4. Comparing reliability between
two or more populations
5. Fitting system repair rate
models
6. Estimating reliability using a
Bayesian gamma prior
Click here for a detailed table of contents
References for Chapter 8
7.5. References
http://www.itl.nist.gov/div898/handbook/prc/section5/prc5.htm[4/17/2013 7:13:52 PM]
Statistics, Volume 2: Inference and Relationship. Charles
Griffin & Co. Limited, London.
Mendenhall, W., Reinmuth, J. E. and Beaver, R. J. Statistics
for Management and Economics, Duxbury Press, Belmont,
CA.
Montgomery, D. C. (1991). Design and Analysis of
Experiments, John Wiley & Sons, New York.
Moore, D. S. (1986). "Tests of Chi-Square Type". From
Goodness-of-Fit Techniques (D'Agostino & Stephens eds.).
Myers, R. H., (1990). Classical and Modern Regression with
Applications, PWS-Kent, Boston, MA.
Neter, J., Wasserman, W. and Kutner, M. H. (1990). Applied
Linear Statistical Models, 3rd Edition, Irwin, Boston, MA.
Lawless, J. F., (1982). Statistical Models and Methods for
Lifetime Data, John Wiley & Sons, New York.
Pearson, A. V., and Hartley, H. O. (1972). Biometrica Tables
for Statisticians, Vol 2, Cambridge, England, Cambridge
University Press.
Sarhan, A. E. and Greenberg, B. G. (1956). "Estimation of
location and scale parameters by order statistics from singly
and double censored samples," Part I, Annals of Mathematical
Statistics, 27, 427-451.
Searle, S. S., Casella, G. and McCulloch, C. E. (1992).
Variance Components, John Wiley & Sons, New York.
Siegel, S. (1956). Nonparametric Statistics, McGraw-Hill,
New York.
Shapiro, S. S. and Wilk, M. B. (1965). "An analysis of
variance test for normality (complete samples)", Biometrika,
52, 3 and 4, pages 591-611.
Some Additional References and Bibliography
Books D'Agostino, R. B. and Stephens, M. A. (1986). Goodness-of-
FitTechniques, Marcel Dekker, Inc., New York.
Hicks, C. R. 1973. Fundamental Concepts in the Design of
Experiments. Holt, Rhinehart and Winston,New-York
Miller, R. G., Jr. (1981). Simultaneous Statistical Inference,
Springer-Verlag, New York.
Neter, Wasserman, and Whitmore (1993). Applied Statistics,
4th Edition, Allyn and Bacon, Boston, MA.
7.5. References
http://www.itl.nist.gov/div898/handbook/prc/section5/prc5.htm[4/17/2013 7:13:52 PM]
Neter, J., Wasserman, W. and Kutner, M. H. (1990). Applied
Linear Statistical Models, 3rd Edition, Irwin, Boston, MA.
Scheffe, H. (1959). The Analysis of Variance, John Wiley,
New-York.
Articles Begun, J. M. and Gabriel, K. R. (1981). "Closure of the
Newman-Keuls Multiple Comparisons Procedure", Journal of
the American Statistical Association, 76, page 374.
Carmer, S. G. and Swanson, M. R. (1973. "Evaluation of Ten
Pairwise Multiple Comparison Procedures by Monte-Carlo
Methods", Journal of the American Statistical Association, 68,
pages 66-74.
Duncan, D. B. (1975). "t-Tests and Intervals for Comparisons
suggested by the Data" Biometrics, 31, pages 339-359.
Dunnett, C. W. (1980). "Pairwise Multiple Comparisons in the
Homogeneous Variance for Unequal Sample Size Case",
Journal of the American Statistical Association, 75, page 789.
Einot, I. and Gabriel, K. R. (1975). "A Study of the Powers of
Several Methods of Multiple Comparison", Journal of the
American Statistical Association, 70, page 351.
Gabriel, K. R. (1978). "A Simple Method of Multiple
Comparisons of Means", Journal of the American Statistical
Association, 73, page 364.
Hochburg, Y. (1974). "Some Conservative Generalizations of
the T-Method in Simultaneous Inference", Journal of
Multivariate Analysis, 4, pages 224-234.
Kramer, C. Y. (1956). "Extension of Multiple Range Tests to
Group Means with Unequal Sample Sizes", Biometrics, 12,
pages 307-310.
Marcus, R., Peritz, E. and Gabriel, K. R. (1976). "On Closed
Testing Procedures with Special Reference to Ordered
Analysis of Variance", Biometrics, 63, pages 655-660.
Ryan, T. A. (1959). "Multiple Comparisons in Psychological
Research", Psychological Bulletin, 56, pages 26-47.
Ryan, T. A. (1960). "Significance Tests for Multiple
Comparisons of Proportions, Variances, and Other Statistics",
Psychological Bulletin, 57, pages 318-328.
Scheffe, H. (1953). "A Method for Judging All Contrasts in the
Analysis of Variance", Biometrika,40, pages 87-104.
Sidak, Z., (1967). "Rectangular Confidence Regions for the
Means of Multivariate Normal Distributions", Journal of the
American Statistical Association, 62, pages 626-633.
7.5. References
http://www.itl.nist.gov/div898/handbook/prc/section5/prc5.htm[4/17/2013 7:13:52 PM]
Tukey, J. W. (1953). The Problem of Multiple Comparisons,
Unpublished Manuscript.
Waller, R. A. and Duncan, D. B. (1969). "A Bayes Rule for the
Symmetric Multiple Comparison Problem", Journal of the
American Statistical Association 64, pages 1484-1504.
Waller, R. A. and Kemp, K. E. (1976). "Computations of
Bayesian t-Values for Multiple Comparisons", Journal of
Statistical Computation and Simulation, 75, pages 169-172.
Welsch, R. E. (1977). "Stepwise Multiple Comparison
Procedure", Journal of the American Statistical Association,
72, page 359.
8.1. Introduction
http://www.itl.nist.gov/div898/handbook/apr/section1/apr1.htm[4/17/2013 7:14:05 PM]
8. Assessing Product Reliability
8.1. Introduction
This section introduces the terminology and models that will
be used to describe and quantify product reliability. The
terminology, probability distributions and models used for
reliability analysis differ in many cases from those used in
other statistical applications.
Detailed
contents of
Section 1
1. Introduction
1. Why is the assessment and control of product
reliability important?
1. Quality versus reliability
2. Competitive driving factors
3. Safety and health considerations
2. What are the basic terms and models used for
reliability evaluation?
1. Repairable systems, non-repairable
populations and lifetime distribution
models
2. Reliability or survival function
3. Failure (or hazard) rate
4. "Bathtub" curve
5. Repair rate or ROCOF
3. What are some common difficulties with
reliability data and how are they overcome?
1. Censoring
2. Lack of failures
4. What is "physical acceleration" and how do we
model it?
5. What are some common acceleration models?
1. Arrhenius
2. Eyring
3. Other models
6. What are the basic lifetime distribution models
used for non-repairable populations?
1. Exponential
2. Weibull
3. Extreme value distributions
4. Lognormal
5. Gamma
6. Fatigue life (Birnbaum-Saunders)
7. Proportional hazards model
7. What are some basic repair rate models used for
repairable systems?
1. Homogeneous Poisson Process (HPP)
8.1. Introduction
http://www.itl.nist.gov/div898/handbook/apr/section1/apr1.htm[4/17/2013 7:14:05 PM]
2. Non-Homogeneous Poisson Process
(NHPP) with power law
3. Exponential law
8. How can you evaluate reliability from the
"bottom- up" (component failure mode to system
failure rates)?
1. Competing risk model
2. Series model
3. Parallel or redundant model
4. R out of N model
5. Standby model
6. Complex systems
9. How can you model reliability growth?
1. NHPP power law
2. Duane plots
3. NHPP exponential law
10. How can Bayesian methodology be used for
reliability evaluation?
8.1.1. Why is the assessment and control of product reliability important?
http://www.itl.nist.gov/div898/handbook/apr/section1/apr11.htm[4/17/2013 7:14:05 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.1. Why is the assessment and control of
product reliability important?
We depend
on,
demand,
and expect
reliable
products
In today's technological world nearly everyone depends upon
the continued functioning of a wide array of complex
machinery and equipment for their everyday health, safety,
mobility and economic welfare. We expect our cars,
computers, electrical appliances, lights, televisions, etc. to
function whenever we need them - day after day, year after
year. When they fail the results can be catastrophic: injury,
loss of life and/or costly lawsuits can occur. More often,
repeated failure leads to annoyance, inconvenience and a
lasting customer dissatisfaction that can play havoc with the
responsible company's marketplace position.
Shipping
unreliable
products
can
destroy a
company's
reputation
It takes a long time for a company to build up a reputation for
reliability, and only a short time to be branded as "unreliable"
after shipping a flawed product. Continual assessment of new
product reliability and ongoing control of the reliability of
everything shipped are critical necessities in today's
competitive business arena.
8.1.1.1. Quality versus reliability
http://www.itl.nist.gov/div898/handbook/apr/section1/apr111.htm[4/17/2013 7:14:06 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.1. Why is the assessment and control of product reliability important?
8.1.1.1. Quality versus reliability
Reliability
is "quality
changing
over time"
The everyday usage term "quality of a product" is loosely
taken to mean its inherent degree of excellence. In industry,
this is made more precise by defining quality to be
"conformance to requirements at the start of use". Assuming
the product specifications adequately capture customer
requirements, the quality level can now be precisely
measured by the fraction of units shipped that meet
specifications.
A motion
picture
instead of a
snapshot
But how many of these units still meet specifications after a
week of operation? Or after a month, or at the end of a one
year warranty period? That is where "reliability" comes in.
Quality is a snapshot at the start of life and reliability is a
motion picture of the day-by-day operation. Time zero
defects are manufacturing mistakes that escaped final test.
The additional defects that appear over time are "reliability
defects" or reliability fallout.
Life
distributions
model
fraction
fallout over
time
The quality level might be described by a single fraction
defective. To describe reliability fallout a probability model
that describes the fraction fallout over time is needed. This is
known as the life distribution model.
8.1.1.2. Competitive driving factors
http://www.itl.nist.gov/div898/handbook/apr/section1/apr112.htm[4/17/2013 7:14:06 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.1. Why is the assessment and control of product reliability important?
8.1.1.2. Competitive driving factors
Reliability
is a major
economic
factor in
determining
a product's
success
Accurate prediction and control of reliability plays an
important role in the profitability of a product. Service costs
for products within the warranty period or under a service
contract are a major expense and a significant pricing factor.
Proper spare part stocking and support personnel hiring and
training also depend upon good reliability fallout predictions.
On the other hand, missing reliability targets may invoke
contractual penalties and cost future business.
Companies that can economically design and market products
that meet their customers' reliability expectations have a
strong competitive advantage in today's marketplace.
8.1.1.3. Safety and health considerations
http://www.itl.nist.gov/div898/handbook/apr/section1/apr113.htm[4/17/2013 7:14:07 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.1. Why is the assessment and control of product reliability important?
8.1.1.3. Safety and health considerations
Some failures
have serious
social
consequences
and this
should be
taken into
account
when
planning
reliability
studies
Sometimes equipment failure can have a major impact on
human safety and/or health. Automobiles, planes, life
support equipment, and power generating plants are a few
examples.
From the point of view of "assessing product reliability", we
treat these kinds of catastrophic failures no differently from
the failure that occurs when a key parameter measured on a
manufacturing tool drifts slightly out of specification,
calling for an unscheduled maintenance action.
It is up to the reliability engineer (and the relevant
customer) to define what constitutes a failure in any
reliability study. More resource (test time and test units)
should be planned for when an incorrect reliability
assessment could negatively impact safety and/or health.
8.1.2. What are the basic terms and models used for reliability evaluation?
http://www.itl.nist.gov/div898/handbook/apr/section1/apr12.htm[4/17/2013 7:14:07 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.2. What are the basic terms and models used
for reliability evaluation?
Reliability
methods
and
terminology
began with
19th
century
insurance
companies
Reliability theory developed apart from the mainstream of
probability and statistics, and was used primarily as a tool to
help nineteenth century maritime and life insurance
companies compute profitable rates to charge their customers.
Even today, the terms "failure rate" and "hazard rate" are
often used interchangeably.
The following sections will define some of the concepts,
terms, and models we need to describe, estimate and predict
reliability.
8.1.2.1. Repairable systems, non-repairable populations and lifetime distribution models
http://www.itl.nist.gov/div898/handbook/apr/section1/apr121.htm[4/17/2013 7:14:08 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.2. What are the basic terms and models used for reliability evaluation?
8.1.2.1. Repairable systems, non-repairable populations
and lifetime distribution models
Life
distribution
models
describe
how non-
repairable
populations
fail over
time
A repairable system is one which can be restored to satisfactory operation by
any action, including parts replacements or changes to adjustable settings.
When discussing the rate at which failures occur during system operation
time (and are then repaired) we will define a Rate Of Occurrence Of Failure
(ROCF) or "repair rate". It would be incorrect to talk about failure rates or
hazard rates for repairable systems, as these terms apply only to the first
failure times for a population of non repairable components.
A non-repairable population is one for which individual items that fail are
removed permanently from the population. While the system may be
repaired by replacing failed units from either a similar or a different
population, the members of the original population dwindle over time until
all have eventually failed.
We begin with models and definitions for non-repairable populations. Repair
rates for repairable populations will be defined in a later section.
The theoretical population models used to describe unit lifetimes are known
as Lifetime Distribution Models. The population is generally considered to
be all of the possible unit lifetimes for all of the units that could be
manufactured based on a particular design and choice of materials and
manufacturing process. A random sample of size n from this population is
the collection of failure times observed for a randomly selected group of n
units.
Any
continuous
PDF
defined
only for
non-
negative
values can
be a
lifetime
distribution
model
A lifetime distribution model can be any probability density function (or
PDF) f(t) defined over the range of time from t = 0 to t = infinity. The
corresponding cumulative distribution function (or CDF) F(t) is a very
useful function, as it gives the probability that a randomly selected unit will
fail by time t. The figure below shows the relationship between f(t) and F(t)
and gives three descriptions of F(t).
8.1.2.1. Repairable systems, non-repairable populations and lifetime distribution models
http://www.itl.nist.gov/div898/handbook/apr/section1/apr121.htm[4/17/2013 7:14:08 PM]
1. F(t) = the area under the PDF f(t) to the left of
t.
2. F(t) = the probability that a single randomly
chosen new unit will fail by time t.
3. F(t) = the proportion of the entire population
that fails by time t.
The figure above also shows a shaded area under f(t) between the two times
t
1
and t
2
. This area is [F(t
2
) - F(t
1
)] and represents the proportion of the
population that fails between times t
1
and t
2
(or the probability that a brand
new randomly chosen unit will survive to time t
1
but fail before time t
2
).
Note that the PDF f(t) has only non-negative values and eventually either
becomes 0 as t increases, or decreases towards 0. The CDF F(t) is
monotonically increasing and goes from 0 to 1 as t approaches infinity. In
other words, the total area under the curve is always 1.
The
Weibull
model is a
good
example of
a life
distribution
The 2-parameter Weibull distribution is an example of a popular F(t). It has
the CDF and PDF equations given by:
where is the "shape" parameter and is a scale parameter called the
characteristic life.
Example: A company produces automotive fuel pumps that fail according to
a Weibull life distribution model with shape parameter = 1.5 and scale
parameter 8,000 (time measured in use hours). If a typical pump is used 800
hours a year, what proportion are likely to fail within 5 years?
Solution: The probability associated with the 800*5 quantile of a Weibull
distribution with = 1.5 and = 8000 is 0.298. Thus about 30% of the
pumps will fail in the first 5 years.
Functions for computing PDF values and CDF values, are available in both
8.1.2.1. Repairable systems, non-repairable populations and lifetime distribution models
http://www.itl.nist.gov/div898/handbook/apr/section1/apr121.htm[4/17/2013 7:14:08 PM]
Dataplot code and R code.
8.1.2.2. Reliability or survival function
http://www.itl.nist.gov/div898/handbook/apr/section1/apr122.htm[4/17/2013 7:14:08 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.2. What are the basic terms and models used for reliability evaluation?
8.1.2.2. Reliability or survival function
Survival is the
complementary
event to failure
The Reliability FunctionR(t), also known as the Survival
Function S(t), is defined by:
R(t) = S(t) = the probability a unit survives beyond time t.
Since a unit either fails, or survives, and one of these two
mutually exclusive alternatives must occur, we have
R(t) = 1 - F(t), F(t) = 1 - R(t)
Calculations using R(t) often occur when building up from
single components to subsystems with many components.
For example, if one microprocessor comes from a
population with reliability function R
m
(t) and two of them
are used for the CPU in a system, then the system CPU
has a reliability function given by
R
cpu
(t) = R
m
2
(t)
The reliability
of the system is
the product of
the reliability
functions of
the
components
since both must survive in order for the system to survive.
This building up to the system from the individual
components will be discussed in detail when we look at
the "Bottom-Up" method. The general rule is: to calculate
the reliability of a system of independent components,
multiply the reliability functions of all the components
together.
8.1.2.3. Failure (or hazard) rate
http://www.itl.nist.gov/div898/handbook/apr/section1/apr123.htm[4/17/2013 7:14:09 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.2. What are the basic terms and models used for reliability evaluation?
8.1.2.3. Failure (or hazard) rate
The
failure
rate is the
rate at
which the
population
survivors
at any
given
instant are
"falling
over the
cliff"
The failure rate is defined for non repairable populations as the
(instantaneous) rate of failure for the survivors to time t during
the next instant of time. It is a rate per unit of time similar in
meaning to reading a car speedometer at a particular instant and
seeing 45 mph. The next instant the failure rate may change and
the units that have already failed play no further role since only
the survivors count.
The failure rate (or hazard rate) is denoted by h(t) and calculated
from
The failure rate is sometimes called a "conditional failure rate"
since the denominator 1 - F(t) (i.e., the population survivors)
converts the expression into a conditional rate, given survival
past time t.
Since h(t) is also equal to the negative of the derivative of
ln{R(t)}, we have the useful identity:
If we let
be the Cumulative Hazard Function, we then have F(t) = 1 - e
-
H(t)
. Two other useful identities that follow from these formulas
are:
8.1.2.3. Failure (or hazard) rate
http://www.itl.nist.gov/div898/handbook/apr/section1/apr123.htm[4/17/2013 7:14:09 PM]
It is also sometimes useful to define an average failure rate over
any interval (T
1
, T
2
) that "averages" the failure rate over that
interval. This rate, denoted by AFR(T
1
,T
2
), is a single number
that can be used as a specification or target for the population
failure rate over that interval. If T
1
is 0, it is dropped from the
expression. Thus, for example, AFR(40,000) would be the
average failure rate for the population over the first 40,000 hours
of operation.
The formulas for calculating AFR's are:
8.1.2.4. "Bathtub" curve
http://www.itl.nist.gov/div898/handbook/apr/section1/apr124.htm[4/17/2013 7:14:09 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.2. What are the basic terms and models used for reliability evaluation?
8.1.2.4. "Bathtub" curve
A plot of
the
failure
rate
over
time for
most
products
yields a
curve
that
looks
like a
drawing
of a
bathtub
If enough units from a given population are observed operating and failing over
time, it is relatively easy to compute week-by-week (or month-by-month)
estimates of the failure rate h(t). For example, if N
12
units survive to start the
13th month of life and r
13
of them fail during the next month (or 720 hours) of
life, then a simple empirical estimate of h(t) averaged across the 13th month of
life (or between 8640 hours and 9360 hours of age), is given by (r
13
/ N
12
*
720). Similar estimates are discussed in detail in the section on Empirical Model
Fitting.
Over many years, and across a wide variety of mechanical and electronic
components and systems, people have calculated empirical population failure
rates as units age over time and repeatedly obtained a graph such as shown
below. Because of the shape of this failure rate curve, it has become widely
known as the "Bathtub" curve.
The initial region that begins at time zero when a customer first begins to use the
product is characterized by a high but rapidly decreasing failure rate. This region
is known as the Early Failure Period (also referred to as Infant Mortality
Period, from the actuarial origins of the first bathtub curve plots). This
decreasing failure rate typically lasts several weeks to a few months.
Next, the failure rate levels off and remains roughly constant for (hopefully) the
majority of the useful life of the product. This long period of a level failure rate
is known as the Intrinsic Failure Period (also called the Stable Failure
Period) and the constant failure rate level is called the Intrinsic Failure Rate.
Note that most systems spend most of their lifetimes operating in this flat
portion of the bathtub curve
Finally, if units from the population remain in use long enough, the failure rate
begins to increase as materials wear out and degradation failures occur at an ever
increasing rate. This is the Wearout Failure Period.
8.1.2.4. "Bathtub" curve
http://www.itl.nist.gov/div898/handbook/apr/section1/apr124.htm[4/17/2013 7:14:09 PM]
NOTE: The Bathtub Curve also applies (based on much empirical evidence) to
Repairable Systems. In this case, the vertical axis is the Repair Rate or the Rate
of Occurrence of Failures (ROCOF).
8.1.2.5. Repair rate or ROCOF
http://www.itl.nist.gov/div898/handbook/apr/section1/apr125.htm[4/17/2013 7:14:10 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.2. What are the basic terms and models used for reliability evaluation?
8.1.2.5. Repair rate or ROCOF
Repair
Rate
models are
based on
counting
the
cumulative
number of
failures
over time
A different approach is used for modeling the rate of
occurrence of failure incidences for a repairable system. In
this chapter, these rates are called repair rates (not to be
confused with the length of time for a repair, which is not
discussed in this chapter). Time is measured by system power-
on-hours from initial turn-on at time zero, to the end of
system life. Failures occur as given system ages and the
system is repaired to a state that may be the same as new, or
better, or worse. The frequency of repairs may be increasing,
decreasing, or staying at a roughly constant rate.
Let N(t) be a counting function that keeps track of the
cumulative number of failures a given system has had from
time zero to time t. N(t) is a step function that jumps up one
every time a failure occurs and stays at the new level until the
next failure.
Every system will have its own observed N(t) function over
time. If we observed the N(t) curves for a large number of
similar systems and "averaged" these curves, we would have
an estimate of M(t) = the expected number (average number)
of cumulative failures by time t for these systems.
The Repair
Rate (or
ROCOF)
is the
mean rate
of failures
per unit
time
The derivative of M(t), denoted m(t), is defined to be the
Repair Rate or the Rate Of Occurrence Of Failures at Time
t or ROCOF.
Models for N(t), M(t) and m(t) will be described in the section
on Repair Rate Models.
8.1.3. What are some common difficulties with reliability data and how are they overcome?
http://www.itl.nist.gov/div898/handbook/apr/section1/apr13.htm[4/17/2013 7:14:11 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.3. What are some common difficulties with
reliability data and how are they
overcome?
The
Paradox
of
Reliability
Analysis:
The more
reliable a
product is,
the harder
it is to get
the failure
data
needed to
"prove" it
is reliable!
There are two closely related problems that are typical with
reliability data and not common with most other forms of
statistical data. These are:
Censoring (when the observation period ends, not all
units have failed - some are survivors)
Lack of Failures (if there is too much censoring, even
though a large number of units may be under
observation, the information in the data is limited due to
the lack of actual failures)
These problems cause considerable practical difficulty when
planning reliability assessment tests and analyzing failure data.
Some solutions are discussed in the next two sections.
Typically, the solutions involve making additional assumptions
and using complicated models.
8.1.3.1. Censoring
http://www.itl.nist.gov/div898/handbook/apr/section1/apr131.htm[4/17/2013 7:14:11 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.3. What are some common difficulties with reliability data and how are they overcome?
8.1.3.1. Censoring
When not
all units
on test fail
we have
censored
data
Consider a situation in which we are reliability testing n (non repairable) units
taken randomly from a population. We are investigating the population to
determine if its failure rate is acceptable. In the typical test scenario, we have a
fixed time T to run the units to see if they survive or fail. The data obtained are
called Censored Type I data.
Censored Type I Data
During the T hours of test we observe r failures (where r can be any number
from 0 to n). The (exact) failure times are t
1
, t
2
, ..., t
r
and there are (n - r) units
that survived the entire T-hour test without failing. Note that T is fixed in
advance and r is random, since we don't know how many failures will occur until
the test is run. Note also that we assume the exact times of failure are recorded
when there are failures.
This type of censoring is also called "right censored" data since the times of
failure to the right (i.e., larger than T) are missing.
Another (much less common) way to test is to decide in advance that you want
to see exactly r failure times and then test until they occur. For example, you
might put 100 units on test and decide you want to see at least half of them fail.
Then r = 50, but T is unknown until the 50th fail occurs. This is called Censored
Type II data.
Censored Type II Data
We observe t
1
, t
2
, ..., t
r
, where r is specified in advance. The test ends at time T
= t
r
, and (n-r) units have survived. Again we assume it is possible to observe
the exact time of failure for failed units.
Type II censoring has the significant advantage that you know in advance how
many failure times your test will yield - this helps enormously when planning
adequate tests. However, an open-ended random test time is generally
impractical from a management point of view and this type of testing is rarely
seen.
Sometimes
we don't
even know
the exact
time of
failure
Readout or Interval Data
Sometimes exact times of failure are not known; only an interval of time in
which the failure occurred is recorded. This kind of data is called Readout or
Interval data and the situation is shown in the figure below:
8.1.3.1. Censoring
http://www.itl.nist.gov/div898/handbook/apr/section1/apr131.htm[4/17/2013 7:14:11 PM]
.
Multicensored Data
In the most general case, every unit observed yields exactly one of the following
three types of information:
a run-time if the unit did not fail while under observation
an exact failure time
an interval of time during which the unit failed.
The units may all have different run-times and/or readout intervals.
Many
special
methods
have been
developed
to handle
censored
data
How do we handle censored data?
Many statistical methods can be used to fit models and estimate failure rates,
even with censored data. In later sections we will discuss the Kaplan-Meier
approach, Probability Plotting, Hazard Plotting, Graphical Estimation, and
Maximum Likelihood Estimation.
Separating out Failure Modes
Note that when a data set consists of failure times that can be sorted into several
different failure modes, it is possible (and often necessary) to analyze and model
each mode separately. Consider all failures due to modes other than the one
being analyzed as censoring times, with the censored run-time equal to the time
it failed due to the different (independent) failure mode. This is discussed further
in the competing risk section and later analysis sections.
8.1.3.2. Lack of failures
http://www.itl.nist.gov/div898/handbook/apr/section1/apr132.htm[4/17/2013 7:14:12 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.3. What are some common difficulties with reliability data and how are they overcome?
8.1.3.2. Lack of failures
Failure
data is
needed to
accurately
assess and
improve
reliability
- this
poses
problems
when
testing
highly
reliable
parts
When fitting models and estimating failure rates from
reliability data, the precision of the estimates (as measured by
the width of the confidence intervals) tends to vary inversely
with the square root of the number of failures observed - not
the number of units on test or the length of the test. In other
words, a test where 5 fail out of a total of 10 on test gives
more information than a test with 1000 units but only 2
failures.
Since the number of failures r is critical, and not the sample
size n on test, it becomes increasingly difficult to assess the
failure rates of highly reliable components. Parts like memory
chips, that in typical use have failure rates measured in parts
per million per thousand hours, will have few or no failures
when tested for reasonable time periods with affordable
sample sizes. This gives little or no information for
accomplishing the two primary purposes of reliability testing,
namely:
accurately assessing population failure rates
obtaining failure mode information to feedback for
product improvement.
Testing at
much
higher
than
typical
stresses
can yield
failures
but models
are then
needed to
relate
these back
to use
stress
How can tests be designed to overcome an expected lack of
failures?
The answer is to make failures occur by testing at much higher
stresses than the units would normally see in their intended
application. This creates a new problem: how can these
failures at higher-than-normal stresses be related to what
would be expected to happen over the course of many years at
normal use stresses? The models that relate high stress
reliability to normal use reliability are called acceleration
models.
8.1.3.2. Lack of failures
http://www.itl.nist.gov/div898/handbook/apr/section1/apr132.htm[4/17/2013 7:14:12 PM]
8.1.4. What is "physical acceleration" and how do we model it?
http://www.itl.nist.gov/div898/handbook/apr/section1/apr14.htm[4/17/2013 7:14:12 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.4. What is "physical acceleration" and how
do we model it?
When
changing
stress is
equivalent to
multiplying
time to fail
by a
constant, we
have true
(physical)
acceleration
Physical Acceleration (sometimes called True
Acceleration or just Acceleration) means that operating a
unit at high stress (i.e., higher temperature or voltage or
humidity or duty cycle, etc.) produces the same failures that
would occur at typical-use stresses, except that they happen
much quicker.
Failure may be due to mechanical fatigue, corrosion,
chemical reaction, diffusion, migration, etc. These are the
same causes of failure under normal stress; the time scale is
simply different.
An
Acceleration
Factor is the
constant
multiplier
between the
two stress
levels
When there is true acceleration, changing stress is equivalent
to transforming the time scale used to record when failures
occur. The transformations commonly used are linear,
which means that time-to-fail at high stress just has to be
multiplied by a constant (the acceleration factor) to obtain
the equivalent time-to-fail at use stress.
We use the following notation:
t
s
= time-to-fail at stress
t
u
= corresponding time-to-fail at
use
F
s
(t) = CDF at stress F
u
(t) = CDF at use
f
s
(t) = PDF at stress f
u
(t) = PDF at use
h
s
(t) = failure rate at
stress
h
u
(t) = failure rate at use
Then, an acceleration factor AF between stress and use
means the following relationships hold:
Linear Acceleration Relationships
Time-to-Fail
t
u
= AF t
s
Failure Probability
F
u
(t) = F
s
(t/AF)
Reliability
R
u
(t) = R
s
(t/AF)
PDF or Density Function
f
u
(t) = (1/AF)f
s
(t/AF)
Failure Rate
h
u
(t) = (1/AF) h
s
(t/AF)
8.1.4. What is "physical acceleration" and how do we model it?
http://www.itl.nist.gov/div898/handbook/apr/section1/apr14.htm[4/17/2013 7:14:12 PM]
Each failure
mode has its
own
acceleration
factor
Failure data
should be
separated by
failure mode
when
analyzed, if
acceleration
is relevant
Probability
plots of data
from
different
stress cells
have the
same slope
(if there is
acceleration)
Note: Acceleration requires that there be a stress dependent
physical process causing change or degradation that leads to
failure. In general, different failure modes will be affected
differently by stress and have different acceleration factors.
Therefore, it is unlikely that a single acceleration factor will
apply to more than one failure mechanism. In general,
different failure modes will be affected differently by stress
and have different acceleration factors. Separate out
different types of failure when analyzing failure data.
Also, a consequence of the linear acceleration relationships
shown above (which follows directly from "true
acceleration") is the following:
The Shape Parameter for the key life
distribution models (Weibull, Lognormal) does
not change for units operating under different
stresses. Probability plots of data from different
stress cells will line up roughly parallel.
These distributions and probability plotting will be
discussed in later sections.
8.1.5. What are some common acceleration models?
http://www.itl.nist.gov/div898/handbook/apr/section1/apr15.htm[4/17/2013 7:14:13 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.5. What are some common acceleration
models?
Acceleration
models
predict time
to fail as a
function of
stress
Acceleration factors show how time-to-fail at a particular
operating stress level (for one failure mode or mechanism)
can be used to predict the equivalent time to fail at a
different operating stress level.
A model that predicts time-to-fail as a function of stress
would be even better than a collection of acceleration
factors. If we write t
f
= G(S), with G(S) denoting the model
equation for an arbitrary stress level S, then the acceleration
factor between two stress levels S
1
and S
2
can be evaluated
simply by AF = G(S
1
)/G(S
2
). Now we can test at the higher
stress S
2
, obtain a sufficient number of failures to fit life
distribution models and evaluate failure rates, and use the
Linear Acceleration Relationships Table to predict what will
occur at the lower use stress S
1
.
A model that predicts time-to-fail as a function of operating
stresses is known as an acceleration model.
Acceleration
models are
often
derived
from
physics or
kinetics
models
related to
the failure
mechanism
Acceleration models are usually based on the physics or
chemistry underlying a particular failure mechanism.
Successful empirical models often turn out to be
approximations of complicated physics or kinetics models,
when the theory of the failure mechanism is better
understood. The following sections will consider a variety of
powerful and useful models:
Arrhenius
Eyring
Other Models
8.1.5.1. Arrhenius
http://www.itl.nist.gov/div898/handbook/apr/section1/apr151.htm[4/17/2013 7:14:14 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.5. What are some common acceleration models?
8.1.5.1. Arrhenius
The
Arrhenius
model
predicts
failure
acceleration
due to
temperature
increase
One of the earliest and most successful acceleration models
predicts how time-to-fail varies with temperature. This
empirically based model is known as the Arrhenius equation.
It takes the form
with T denoting temperature measured in degrees Kelvin
(273.16 + degrees Celsius) at the point when the failure
process takes place and k is Boltzmann's constant (8.617 x
10
-5
in ev/K). The constant A is a scaling factor that drops
out when calculating acceleration factors, with H
(pronounced "Delta H") denoting the activation energy,
which is the critical parameter in the model.
The
Arrhenius
activation
energy,
H, is all you
need to
know to
calculate
temperature
acceleration
The value of H depends on the failure mechanism and the
materials involved, and typically ranges from .3 or .4 up to
1.5, or even higher. Acceleration factors between two
temperatures increase exponentially as H increases.
The acceleration factor between a higher temperature T
2
and
a lower temperature T
1
is given by
Using the value of k given above, this can be written in
terms of T in degrees Celsius as
Note that the only unknown parameter in this formula is
H.
Example: The acceleration factor between 25C and 125C
is 133 if H = .5 and 17,597 if H = 1.0.
The Arrhenius model has been used successfully for failure
8.1.5.1. Arrhenius
http://www.itl.nist.gov/div898/handbook/apr/section1/apr151.htm[4/17/2013 7:14:14 PM]
mechanisms that depend on chemical reactions, diffusion
processes or migration processes. This covers many of the
non mechanical (or non material fatigue) failure modes that
cause electronic equipment failure.
8.1.5.2. Eyring
http://www.itl.nist.gov/div898/handbook/apr/section1/apr152.htm[4/17/2013 7:14:14 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.5. What are some common acceleration models?
8.1.5.2. Eyring
The Eyring
model has a
theoretical
basis in
chemistry
and quantum
mechanics
and can be
used to
model
acceleration
when many
stresses are
involved
Henry Eyring's contributions to chemical reaction rate theory
have led to a very general and powerful model for
acceleration known as the Eyring Model. This model has
several key features:
It has a theoretical basis from chemistry and quantum
mechanics.
If a chemical process (chemical reaction, diffusion,
corrosion, migration, etc.) is causing degradation
leading to failure, the Eyring model describes how the
rate of degradation varies with stress or, equivalently,
how time to failure varies with stress.
The model includes temperature and can be expanded
to include other relevant stresses.
The temperature term by itself is very similar to the
Arrhenius empirical model, explaining why that model
has been so successful in establishing the connection
between the H parameter and the quantum theory
concept of "activation energy needed to cross an
energy barrier and initiate a reaction".
The model for temperature and one additional stress takes
the general form:
for which S
1
could be some function of voltage or current or
any other relevant stress and the parameters , H, B, and
C determine acceleration between stress combinations. As
with the Arrhenius Model, k is Boltzmann's constant and
temperature is in degrees Kelvin.
If we want to add an additional non-thermal stress term, the
model becomes
and as many stresses as are relevant can be included by
adding similar terms.
8.1.5.2. Eyring
http://www.itl.nist.gov/div898/handbook/apr/section1/apr152.htm[4/17/2013 7:14:14 PM]
Models with
multiple
stresses
generally
have no
interaction
terms -
which means
you can
multiply
acceleration
factors due
to different
stresses
Note that the general Eyring model includes terms that have
stress and temperature interactions (in other words, the
effect of changing temperature varies, depending on the
levels of other stresses). Most models in actual use do not
include any interaction terms, so that the relative change in
acceleration factors when only one stress changes does not
depend on the level of the other stresses.
In models with no interaction, you can compute acceleration
factors for each stress and multiply them together. This
would not be true if the physical mechanism required
interaction terms - but, at least to first approximations, it
seems to work for most examples in the literature.
The Eyring
model can
also be used
to model
rate of
degradation
leading to
failure as a
function of
stress
Advantages of the Eyring Model
Can handle many stresses.
Can be used to model degradation data as well as
failure data.
The H parameter has a physical meaning and has
been studied and estimated for many well known
failure mechanisms and materials.
In practice,
the Eyring
Model is
usually too
complicated
to use in its
most general
form and
must be
"customized"
or simplified
for any
particular
failure
mechanism
Disadvantages of the Eyring Model
Even with just two stresses, there are 5 parameters to
estimate. Each additional stress adds 2 more unknown
parameters.
Many of the parameters may have only a second-
order effect. For example, setting = 0 works quite
well since the temperature term then becomes the
same as in the Arrhenius model. Also, the constants C
and E are only needed if there is a significant
temperature interaction effect with respect to the other
stresses.
The form in which the other stresses appear is not
specified by the general model and may vary
according to the particular failure mechanism. In other
words, S
1
may be voltage or ln (voltage) or some
other function of voltage.
Many well-known models are simplified versions of the
Eyring model with appropriate functions of relevant stresses
chosen for S
1
and S
2
. Some of these will be shown in the
Other Models section. The trick is to find the right
simplification to use for a particular failure mechanism.
8.1.5.3. Other models
http://www.itl.nist.gov/div898/handbook/apr/section1/apr153.htm[4/17/2013 7:14:15 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.5. What are some common acceleration models?
8.1.5.3. Other models
Many
useful 1, 2
and 3
stress
models are
simple
Eyring
models.
Six are
described
This section will discuss several acceleration models whose
successful use has been described in the literature.
The (Inverse) Power Rule for Voltage
The Exponential Voltage Model
Two Temperature/Voltage Models
The Electromigration Model
Three Stress Models (Temperature, Voltage and
Humidity)
The Coffin-Manson Mechanical Crack Growth Model
The (Inverse) Power Rule for Voltage
This model, used for capacitors, has only voltage dependency
and takes the form:
This is a very simplified Eyring model with , H, and C all
0, and S = lnV, and = -B.
The Exponential Voltage Model
In some cases, voltage dependence is modeled better with an
exponential model:
Two Temperature/Voltage Models
Temperature/Voltage models are common in the literature and
take one of the two forms given below:
8.1.5.3. Other models
http://www.itl.nist.gov/div898/handbook/apr/section1/apr153.htm[4/17/2013 7:14:15 PM]
Again, these are just simplified two stress Eyring models with
the appropriate choice of constants and functions of voltage.
The Electromigration Model
Electromigration is a semiconductor failure mechanism where
open failures occur in metal thin film conductors due to the
movement of ions toward the anode. This ionic movement is
accelerated high temperatures and high current density. The
(modified Eyring) model takes the form
with J denoting the current density. H is typically between
.5 and 1.2 electron volts, while an n around 2 is common.
Three-Stress Models (Temperature, Voltage and
Humidity)
Humidity plays an important role in many failure mechanisms
that depend on corrosion or ionic movement. A common 3-
stress model takes the form
Here RH is percent relative humidity. Other obvious variations
on this model would be to use an exponential voltage term
and/or an exponential RH term.
Even this simplified Eyring 3-stress model has 4 unknown
parameters and an extensive experimental setup would be
required to fit the model and calculate acceleration factors.
The
Coffin-
Manson
Model is a
useful
non-
Eyring
model for
crack
growth or
material
fatigue
The Coffin-Manson Mechanical Crack Growth Model
Models for mechanical failure, material fatigue or material
deformation are not forms of the Eyring model. These models
typically have terms relating to cycles of stress or frequency of
use or change in temperatures. A model of this type known as
the (modified) Coffin-Manson model has been used
successfully to model crack growth in solder and other metals
due to repeated temperature cycling as equipment is turned on
and off. This model takes the form
with
N
f
= the number of cycles to fail
8.1.5.3. Other models
http://www.itl.nist.gov/div898/handbook/apr/section1/apr153.htm[4/17/2013 7:14:15 PM]
f = the cycling frequency
T = the temperature range during a cycle
and G(T
max
) is an Arrhenius term evaluated at the maximum
temperature reached in each cycle.
Typical values for the cycling frequency exponent and the
temperature range exponent are around -1/3 and 2,
respectively (note that reducing the cycling frequency reduces
the number of cycles to failure). The H activation energy
term in G(T
max
) is around 1.25.
8.1.6. What are the basic lifetime distribution models used for non-repairable populations?
http://www.itl.nist.gov/div898/handbook/apr/section1/apr16.htm[4/17/2013 7:14:16 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.6. What are the basic lifetime distribution
models used for non-repairable
populations?
A handful
of lifetime
distribution
models
have
enjoyed
great
practical
success
There are a handful of parametric models that have
successfully served as population models for failure times
arising from a wide range of products and failure
mechanisms. Sometimes there are probabilistic arguments
based on the physics of the failure mode that tend to justify
the choice of model. Other times the model is used solely
because of its empirical success in fitting actual failure data.
Seven models will be described in this section:
1. Exponential
2. Weibull
3. Extreme Value
4. Lognormal
5. Gamma
6. Birnbaum-Saunders
7. Proportional hazards
8.1.6.1. Exponential
http://www.itl.nist.gov/div898/handbook/apr/section1/apr161.htm[4/17/2013 7:14:16 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.6. What are the basic lifetime distribution models used for non-repairable populations?
8.1.6.1. Exponential
All the key
formulas
for using
the
exponential
model
Formulas and Plots
The exponential model, with only one unknown parameter, is the simplest of all
life distribution models. The key equations for the exponential are shown below:
Note that the failure rate reduces to the constant for any time. The exponential
distribution is the only distribution to have a constant failure rate. Also, another
name for the exponential mean is the Mean Time To Fail or MTTF and we
have MTTF = 1/.
The cumulative hazard function for the exponential is just the integral of the
failure rate or H(t) = t.
The PDF for the exponential has the familiar shape shown below.
The
Exponential
distribution
'shape'
8.1.6.1. Exponential
http://www.itl.nist.gov/div898/handbook/apr/section1/apr161.htm[4/17/2013 7:14:16 PM]
The
Exponential
CDF
Below is an example of typical exponential lifetime data displayed in Histogram
form with corresponding exponential PDF drawn through the histogram.
Histogram
of
Exponential
Data
8.1.6.1. Exponential
http://www.itl.nist.gov/div898/handbook/apr/section1/apr161.htm[4/17/2013 7:14:16 PM]
The
Exponential
models the
flat portion
of the
"bathtub"
curve -
where most
systems
spend most
of their
'lives'
Uses of the Exponential Distribution Model
1. Because of its constant failure rate property, the exponential distribution is
an excellent model for the long flat "intrinsic failure" portion of the
Bathtub Curve. Since most components and systems spend most of their
lifetimes in this portion of the Bathtub Curve, this justifies frequent use of
the exponential distribution (when early failures or wear out is not a
concern).
2. Just as it is often useful to approximate a curve by piecewise straight line
segments, we can approximate any failure rate curve by week-by-week or
month-by-month constant rates that are the average of the actual changing
rate during the respective time durations. That way we can approximate
any model by piecewise exponential distribution segments patched
together.
3. Some natural phenomena have a constant failure rate (or occurrence rate)
property; for example, the arrival rate of cosmic ray alpha particles or
Geiger counter tics. The exponential model works well for inter arrival
times (while the Poisson distribution describes the total number of events
in a given period). When these events trigger failures, the exponential life
distribution model will naturally apply.
Exponential
probability
plot
We can generate a probability plot of normalized exponential data, so that a
perfect exponential fit is a diagonal line with slope 1. The probability plot for
100 normalized random exponential observations ( = 0.01) is shown below.
8.1.6.1. Exponential
http://www.itl.nist.gov/div898/handbook/apr/section1/apr161.htm[4/17/2013 7:14:16 PM]
We can calculate the exponential PDF and CDF at 100 hours for the case where
= 0.01. The PDF value is 0.0037 and the CDF value is 0.6321.
Functions for computing exponential PDF values, CDF values, and for producing
probability plots, are found in both Dataplot code and R code.
8.1.6.2. Weibull
http://www.itl.nist.gov/div898/handbook/apr/section1/apr162.htm[4/17/2013 7:14:17 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.6. What are the basic lifetime distribution models used for non-repairable populations?
8.1.6.2. Weibull
Weibull
Formulas
Formulas and Plots
The Weibull is a very flexible life distribution model with two parameters. It has CDF and
PDF and other key formulas given by:
with o the scale parameter (the Characteristic Life), y (gamma) the Shape Parameter, and I
is the Gamma function with I(N) = (N-1)! for integer N.
The cumulative hazard function for the Weibull is the integral of the failure rate or
A more general three-parameter form of the Weibull includes an additional waiting time
parameter (sometimes called a shift or location parameter). The formulas for the 3-
parameter Weibull are easily obtained from the above formulas by replacing t by (t - )
wherever t appears. No failure can occur before hours, so the time scale starts at , and not
0. If a shift parameter is known (based, perhaps, on the physics of the failure mode), then all
you have to do is subtract from all the observed failure times and/or readout times and
8.1.6.2. Weibull
http://www.itl.nist.gov/div898/handbook/apr/section1/apr162.htm[4/17/2013 7:14:17 PM]
analyze the resulting shifted data with a two-parameter Weibull.
NOTE: Various texts and articles in the literature use a variety of different symbols for the
same Weibull parameters. For example, the characteristic life is sometimes called c (v = nu or
p = eta) and the shape parameter is also called m (or = beta). To add to the confusion, some
software uses as the characteristic life parameter and o as the shape parameter. Some
authors even parameterize the density function differently, using a scale parameter .
Special Case: When y = 1, the Weibull reduces to the Exponential Model, with o = 1/ = the
mean time to fail (MTTF).
Depending on the value of the shape parameter , the Weibull model can empirically fit a
wide range of data histogram shapes. This is shown by the PDF example curves below.
Weibull
data
'shapes'
From a failure rate model viewpoint, the Weibull is a natural extension of the constant failure
rate exponential model since the Weibull has a polynomial failure rate with exponent {y - 1}.
This makes all the failure rate curves shown in the following plot possible.
Weibull
failure rate
'shapes'
8.1.6.2. Weibull
http://www.itl.nist.gov/div898/handbook/apr/section1/apr162.htm[4/17/2013 7:14:17 PM]
The Weibull
is very
flexible and
also has
theoretical
justification
in many
applications
Uses of the Weibull Distribution Model
1. Because of its flexible shape and ability to model a wide range of failure rates, the
Weibull has been used successfully in many applications as a purely empirical model.
2. The Weibull model can be derived theoretically as a form of Extreme Value
Distribution, governing the time to occurrence of the "weakest link" of many competing
failure processes. This may explain why it has been so successful in applications such
as capacitor, ball bearing, relay and material strength failures.
3. Another special case of the Weibull occurs when the shape parameter is 2. The
distribution is called the Rayleigh Distribution and it turns out to be the theoretical
probability model for the magnitude of radial error when the x and y coordinate errors
are independent normals with 0 mean and the same standard deviation.
Weibull
probability
plot
We generated 100 Weibull random variables using T = 1000, y = 1.5 and o = 5000. To see
how well these random Weibull data points are actually fit by a Weibull distribution, we
generated the probability plot shown below. Note the log scale used is base 10.
8.1.6.2. Weibull
http://www.itl.nist.gov/div898/handbook/apr/section1/apr162.htm[4/17/2013 7:14:17 PM]
If the data follow a Weibull distribution, the points should follow a straight line.
We can comput the PDF and CDF values for failure time T = 1000, using the example
Weibull distribution with y = 1.5 and o = 5000. The PDF value is 0.000123 and the CDF
value is 0.08556.
Functions for computing Weibull PDF values, CDF values, and for producing probability
plots, are found in both Dataplot code and R code.
8.1.6.3. Extreme value distributions
http://www.itl.nist.gov/div898/handbook/apr/section1/apr163.htm[4/17/2013 7:14:18 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.6. What are the basic lifetime distribution models used for non-repairable populations?
8.1.6.3. Extreme value distributions
The Extreme
Value
Distribution
usually
refers to the
distribution
of the
minimum of
a large
number of
unbounded
random
observations
Description, Formulas and Plots
We have already referred to Extreme Value Distributions when describing the uses of the
Weibull distribution. Extreme value distributions are the limiting distributions for the
minimum or the maximum of a very large collection of random observations from the same
arbitrary distribution. Gumbel (1958) showed that for any well-behaved initial distribution
(i.e., F(x) is continuous and has an inverse), only a few models are needed, depending on
whether you are interested in the maximum or the minimum, and also if the observations are
bounded above or below.
In the context of reliability modeling, extreme value distributions for the minimum are
frequently encountered. For example, if a system consists of n identical components in series,
and the system fails when the first of these components fails, then system failure times are the
minimum of n random component failure times. Extreme value theory says that, independent
of the choice of component model, the system model will approach a Weibull as n becomes
large. The same reasoning can also be applied at a component level, if the component failure
occurs when the first of many similar competing failure processes reaches a critical level.
The distribution often referred to as the Extreme Value Distribution (Type I) is the limiting
distribution of the minimum of a large number of unbounded identically distributed random
variables. The PDF and CDF are given by:
Extreme
Value
Distribution
formulas
and PDF
shapes
If the x values are bounded below (as is the case with times of failure) then the limiting
distribution is the Weibull. Formulas and uses of the Weibull have already been discussed.
PDF Shapes for the (minimum) Extreme Value Distribution (Type I) are shown in the
following figure.
8.1.6.3. Extreme value distributions
http://www.itl.nist.gov/div898/handbook/apr/section1/apr163.htm[4/17/2013 7:14:18 PM]
The natural
log of
Weibull
data is
extreme
value data
Uses of the Extreme Value Distribution Model
1. In any modeling application for which the variable of interest is the minimum of many
random factors, all of which can take positive or negative values, try the extreme value
distribution as a likely candidate model. For lifetime distribution modeling, since failure
times are bounded below by zero, the Weibull distribution is a better choice.
2. The Weibull distribution and the extreme value distribution have a useful mathematical
relationship. If t
1
, t
2
, ...,t
n
are a sample of random times of fail from a Weibull
distribution, then ln t
1
, ln t
2
, ...,ln t
n
are random observations from the extreme value
distribution. In other words, the natural log of a Weibull random time is an extreme
value random observation.
Because of this relationship, computer programs designed for the extreme value
distribution can be used to analyze Weibull data. The situation exactly parallels using
normal distribution programs to analyze lognormal data, after first taking natural
logarithms of the data points.
Probability
plot for the
extreme
value
distribution
Assume n = ln 200,000 = 12.206 and = 1/2 = 0.5. The extreme value distribution associated
with these parameters could be obtained by taking natural logarithms of data from a Weibull
population with characteristic life o = 200,000 and shape y = 2.
We generate 100 random numbers from this extreme value distribution and construct the
following probability plot.
8.1.6.3. Extreme value distributions
http://www.itl.nist.gov/div898/handbook/apr/section1/apr163.htm[4/17/2013 7:14:18 PM]
Data from an extreme value distribution will line up approximately along a straight line when
this kind of plot is constructed. The slope of the line is an estimate of , "y-axis" value on the
line corresponding to the "x-axis" 0 point is an estimate of n. For the graph above, these turn
out to be very close to the actual values of and n.
For the example extreme value distribution with n = ln 200,000 = 12.206 and = 1/2 = 0.5,
the PDF values corresponding to the points 5, 8, 10, 12, 12.8. are 0.110E-5, 0.444E-3, 0.024,
0.683 and 0.247. and the CDF values corresponding to the same points are 0.551E-6, 0.222E-
3, 0.012, 0.484 and 0.962.
Functions for computing extreme value distribution PDF values, CDF values, and for
producing probability plots, are found in both Dataplot code and R code.
8.1.6.4. Lognormal
http://www.itl.nist.gov/div898/handbook/apr/section1/apr164.htm[4/17/2013 7:14:19 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.6. What are the basic lifetime distribution models used for non-repairable populations?
8.1.6.4. Lognormal
Lognormal
Formulas and
relationship
to the normal
distribution
Formulas and Plots
The lognormal life distribution, like the Weibull, is a very flexible model that can empirically
fit many types of failure data. The two-parameter form has parameters o is the shape
parameter and T
50
is the median (a scale parameter).
Note: If time to failure, t
f
, has a lognormal distribution, then the (natural) logarithm of time to
failure has a normal distribution with mean = ln T
50
and standard deviation o. This makes
lognormal data convenient to work with; just take natural logarithms of all the failure times
and censoring times and analyze the resulting normal data. Later on, convert back to real time
and lognormal parameters using o as the lognormal shape and T
50
= e