Chapter6 ANOVA
Chapter6 ANOVA
ANOVA
PHOK Ponna
2022–2023
Statistics ITC 1 / 49
Contents
1 Introduction
Statistics ITC 1 / 49
Contents
1 Introduction
Statistics ITC 2 / 49
Introduction
In this chapter, we test several (more than two) populations for the
equality of their means. For example, we compare the mean yields of
different hybrid varieties of tomato. We study the effect of four
different drugs on patients of a certain disease.
The observed differences between various treatment groups are caused
by one or more factors. In the above two examples, the hybrid variety
and drug are the factors that can vary the mean between different
treatment groups.
The methodology used for this comparison is called the Analysis Of
Variance (ANOVA).
Statistics ITC 3 / 49
Introduction
Statistics ITC 4 / 49
Contents
1 Introduction
Statistics ITC 5 / 49
One Factor ANOVA
Statistics ITC 6 / 49
One Factor ANOVA
Definition 1
The mean of a data set 𝑖 is computed as,
1 Í𝑛 𝑖
𝑋¯ 𝑖. = 𝑋𝑖𝑗
𝑛 𝑖 𝑗=1
where the dot in the symbol 𝑋¯ 𝑖. indicates that for a given data set
𝑖, the summation over elements 𝑗 have been carried out. Hence we
replace 𝑗 by dot symbol.
The mean of the whole data set (grand mean) is computed by
summing all the data sets and dividing by the total number of
data points in all of them together:
1 Í𝑚 Í𝑛 𝑖
𝑋¯ .. = 𝑋𝑖𝑗
𝑛 𝑖=1 𝑗=1
The sample variances are denoted by 𝑆12 , 𝑆22 , . . . , 𝑆𝑚2 , respectively:
1 Í𝑛 𝑖 ¯ 𝑖. 2 , 𝑖 = 1, 2, . . . , 𝑚
𝑆 2𝑖 = 𝑋 𝑋
𝑖𝑗 −
𝑛 𝑖 − 1 𝑗=1
Statistics ITC 8 / 49
One factor ANOVA
Definition 2
1 Total Sum of Squares (SST):
Í𝑚 Í𝑛 𝑖 2
𝑆𝑆𝑇 = 𝑖=1 𝑗=1
𝑋𝑖𝑗 − 𝑋¯ ..
2 Treatment Sum of Squares (SSTr):
Í𝑚 Í𝑛 𝑖 2
𝑆𝑆𝑇𝑟 = 𝑖=1 𝑗=1
𝑋¯ 𝑖. − 𝑋¯ ..
3 Error Sum of Squares (SSE):
Í𝑚 Í𝑛 𝑖 2
𝑆𝑆𝐸 = 𝑖=1 𝑗=1
𝑋𝑖𝑗 − 𝑋¯ 𝑖.
Theorem 1
𝑆𝑆𝑇 = 𝑆𝑆𝑇𝑟 + 𝑆𝑆𝐸
Statistics ITC 9 / 49
One Factor ANOVA
Remark 1
𝑋..2
Let 𝐶𝐹 = = 𝑛 𝑋¯ 2 . In practice, we use the following formulas:
𝑛Í Í ..
𝑛𝑖
1 𝑆𝑆𝑇 = 𝑚 𝑖=1 𝑋 2 − 𝐶𝐹
𝑗=1 𝑖𝑗
Í𝑚
2 𝑆𝑆𝑇𝑟 = 𝑛 𝑖 𝑋¯ 𝑖.2 − 𝐶𝐹
Í𝑚𝑖=1
3 𝑆𝑆𝐸 = 𝑖=1 (𝑛 𝑖 − 1) 𝑆 𝑖 = 𝑆𝑆𝑇 − 𝑆𝑆𝑇𝑟
2
Definition 3
Mean square for treatment (MSTr) is defined by
𝑆𝑆𝑇𝑟
𝑀𝑆𝑇𝑟 =
𝑚−1
Mean square for error (MSE) is defined by
𝑆𝑆𝐸
𝑀𝑆𝐸 =
𝑛−𝑚
Statistics ITC 10 / 49
One Factor ANOVA
Theorem 2
𝑖𝑖𝑑
For all 𝑖 = 1, 2, . . . , 𝑚, if 𝑋𝑖1 , 𝑋𝑖2 , . . . , 𝑋𝑖𝑛 𝑖 ∼ 𝑁(𝜇𝑖 , 𝜎2 ), then
𝑋¯ 𝑖. and 𝑆 2 are independent,
𝑖
𝑋¯ 𝑖. ∼ 𝑁(𝜇𝑖 , 𝜎2 /𝑛),
(𝑛 𝑖 − 1) 𝑆 2𝑖 /𝜎2 ∼ 𝜒2 (𝑛 𝑖 − 1).
ANOVA assumptions
𝑖𝑖𝑑
For all 𝑖 = 1, 2, . . . , 𝑚, 𝑋𝑖1 , 𝑋𝑖2 , . . . , 𝑋𝑖𝑛 𝑖 ∼ 𝑁(𝜇𝑖 , 𝜎2 ).
Statistics ITC 11 / 49
One Factor ANOVA
Theorem 3
Assume that the ANOVA assumptions are satisfied. Then
𝑆𝑆𝐸
1 ∼ 𝜒 2 (𝑛 − 𝑚)
𝜎2
𝑆𝑆𝑡𝑟
2 When 𝐻 is true, ∼ 𝜒2 (𝑚 − 1)
0
𝜎2
3 𝑆𝑆𝑇𝑟 and 𝑆𝑆𝐸 are independent random variables
Theorem 4
Assume that the ANOVA assumptions are satisfied. Then
1 𝑆𝑆𝐸 and 𝑆𝑆𝑇𝑟 are independent and 𝑆𝑆𝑇 = 𝑆𝑆𝐸 + 𝑆𝑆𝑇𝑟.
2 𝑆𝑆𝑇𝑟/𝜎2 ∼ 𝜒 2 (𝑚 − 1) under the null hypothesis 𝐻0 .
𝑀𝑆𝑇𝑟
3 𝐹= ∼ 𝐹(𝑚 − 1, 𝑛 − 𝑚) under 𝐻0 .
𝑀𝑆𝐸
Statistics ITC 12 / 49
One Factor ANOVA
Theorem 5
Assume that the ANOVA assumptions are satisfied. Then the statistic
in single-factor ANOVA is 𝐹 = 𝑀𝑆𝑇𝑟/𝑀𝑆𝐸. When 𝐻0 is true
𝐹 ∼ 𝐹(𝑚 − 1, 𝑛 − 𝑚), and the rejection region at the level of significance
𝛼 using the test statistic value 𝑓 , is given by
𝑅𝑅 = { 𝑓 : 𝑓 ≥ 𝐹𝛼,𝑚−1,𝑛−𝑚 }.
Statistics ITC 13 / 49
One Factor ANOVA
Statistics ITC 14 / 49
One Factor ANOVA
Example 1
The cholesterol level of four groups of adults with distinct food habits
among the groups were compared in an experiment. The results are
present below:
Group1 : 220 214 203 184 186 200 165
Group2 : 262 193 225 200 164 266 179
Group3 : 272 192 190 208 231 235 141
Group4 : 190 255 247 278 230 269 289
Assuming that these four data sets follow Normal distributions
𝑁(𝜇𝑖 , 𝜎2 ), test the null hypothesis that 𝐻0 : 𝜇1 = 𝜇2 = 𝜇3 = 𝜇4 against
𝐻𝑎 : at least two of 𝜇𝑖 ’s are different, to a significance level of 0.05.
Statistics ITC 15 / 49
One Factor ANOVA
Example 2
It is common practice in many countries to destroy (shred)
refrigerators at the end of their useful lives. In this process material
from insulating foam may be released into the atmosphere. The article
”Release of Fluorocarbons from Insulation Foam in Home Appliances
during Shredding” (J. of the Air and Waste Mgmt. Assoc., 2007:
1452–1460) gave the following data on foam density (g/L) for each of
two refrigerators produced by four different manufacturers:
1. 30.4, 29.2 2. 27.7, 27.1
3. 27.1, 24.8 4. 25.5, 28.8
Does it appear that true average foam density is not the same for all
these manufacturers? Carry out an appropriate test of hypotheses by
obtaining as much P-value information as possible, and summarize
your analysis in an ANOVA table.
Statistics ITC 16 / 49
Contents
1 Introduction
Statistics ITC 17 / 49
Tukey’s Procedure (the 𝑇 Method)
Statistics ITC 18 / 49
Tukey’s Procedure (the 𝑇 Method)
Statistics ITC 19 / 49
Tukey’s Procedure (the 𝑇 Method)
Statistics ITC 20 / 49
Tukey’s Procedure (the 𝑇 Method)
Statistics ITC 21 / 49
Tukey’s Procedure (the 𝑇 Method)–Example
Example 3
An experiment was carried out to compare five different brands of
automobile oil filters with respect to their ability to capture foreign
material. Let 𝜇𝑖 denote the true average amount of material captured
by brand 𝑖 filters (𝑖 = 1, ..., 5) under controlled conditions. A sample of
nine filters of each brand was used, resulting in the following sample
mean amounts: 𝑥¯ 1. =14.5, 𝑥¯ 2. =13.8, 𝑥¯ 3. =13.3, 𝑥¯ 4. =14.3, and 𝑥¯ 5. = 13.1.
We have:
Source of Sum of
Variation 𝑑𝑓 Squares Mean Square 𝑓
Statistics ITC 22 / 49
Tukey’s Procedure (the 𝑇 Method)–Example
Statistics ITC 23 / 49
Contents
1 Introduction
Statistics ITC 24 / 49
The ANOVA Model
𝑋𝑖𝑗 = 𝜇 + 𝛼 𝑖 + 𝜀𝑖𝑗 (𝑖 = 1, . . . , 𝑚; 𝑗 = 1, . . . , 𝑛 𝑖 )
The claim that the 𝜇𝑖 ’s are identical is equivalent to the equality of the
𝛼 𝑖 ’s, and because 𝐼𝑖=1 𝛼 𝑖 = 0 the null hypothesis becomes
Í
𝐻0 : 𝛼 1 = 𝛼2 = · · · 𝛼 𝑚 = 0
𝑚
1 Õ
𝐸(MSTr) = 𝜎 + 2
𝑛 𝑖 𝛼2𝑖 .
𝑚−1
𝑖=1
Statistics ITC 26 / 49
Contents
1 Introduction
Statistics ITC 27 / 49
Two-Factor ANOVA with 𝐾 𝑖𝑗 = 1
Statistics ITC 28 / 49
Two-Factor ANOVA with 𝐾 𝑖𝑗 = 1
Statistics ITC 29 / 49
The Fixed Effects Model
so that
𝜇𝑖𝑗 = 𝛼 𝑖 + 𝛽 𝑗 (3)
where
𝜇𝑖𝑗 is the true average response when factor 𝐴 is at level 𝑖 and
factor 𝐵 at level 𝑗.
𝜖 𝑖𝑗 is the random amount by which the observed value differs from
its expectation.
The model specified in Eq. (2) and (3) is called an additive model
because each mean response 𝜇𝑖𝑗 is the sum of an effect due to factor 𝐴
at level 𝑖 (𝛼 𝑖 ) and an effect due to factor 𝐵 at level 𝑗 (𝛽 𝑗 ) .
Statistics ITC 30 / 49
The Fixed Effects Model
𝑋𝑖𝑗 = 𝜇 + 𝛼 𝑖 + 𝛽 𝑗 + 𝜖 𝑖𝑗 (4)
𝐼 𝐽
𝛼 𝑖 = 0, and 𝛽 𝑖 = 0 the 𝜖 𝑖𝑗 ’s are assumed independent,
Í Í
where
𝑖=1 𝑗=1
normally distributed, with mean 0 and common variance 𝜎2 .
Statistics ITC 31 / 49
The Fixed Effects Model
𝐻0𝐴 : 𝛼1 = 𝛼 2 = 𝛼3 = .... = 𝛼 𝐼 = 0
versus 𝐻𝑎𝐴 : at least one 𝛼 𝑖 ≠ 0
𝐻0𝐵 : 𝛽1 = 𝛽 2 = 𝛽 3 = .... = 𝛽 𝐽 = 0
versus 𝐻𝑎𝐵 : at least one 𝛽 𝑗 ≠ 0
Statistics ITC 32 / 49
Test Procedures
𝐽
𝐼 Õ
Õ 2
𝑆𝑆𝑇 = (𝑋𝑖𝑗 − 𝑋¯ .. ) 𝑑 𝑓 = 𝐼𝐽 − 1
𝑖=1 𝑗=1
Õ 𝐽
𝐼 Õ 𝐼
Õ
2
𝑆𝑆𝐴 = (𝑋¯ 𝑖. − 𝑋¯ .. ) = 𝐽 (𝑋¯ 𝑖. − 𝑋¯ .. )2 𝑑𝑓 = 𝐼 − 1
𝑖=1 𝑗=1 𝑖=1
𝐽
𝐼 Õ
Õ 𝐽
Õ
2
𝑆𝑆𝐵 = (𝑋¯ .𝑗 − 𝑋¯ .. ) = 𝐼 (𝑋.𝑗 − 𝑋¯ .. )2 𝑑𝑓 = 𝐽 − 1
𝑖=1 𝑗=1 𝑗=1
𝐽
𝐼 Õ
Õ 2
𝑆𝑆𝐸 = (𝑋𝑖𝑗 − 𝑋¯ 𝑖. − 𝑋¯ .𝑗 + 𝑋.. ) 𝑑 𝑓 = (𝐼 − 1)(𝐽 − 1)
𝑖=1 𝑗=1
The fundamental
Statistics
identity is 𝑆𝑆𝑇 =ITC
𝑆𝑆𝐴 + 𝑆𝑆𝐵 + 𝑆𝑆𝐸. 33 / 49
Test Procedures
Statistics ITC 34 / 49
Expected Mean Squares
𝐸(𝑀𝑆𝐸) = 𝜎2
𝐼
𝐽 Õ 2
𝐸(𝑀𝑆𝐴) = 𝜎 +
2
𝛼𝑖
𝐼−1
𝑖=1
𝐽
𝐼 Õ
𝐸(𝑀𝑆𝐵) = 𝜎2 + 𝛽2𝑗
𝐽−1
𝑗=1
Statistics ITC 35 / 49
Multiple Comparisons
After rejecting either 𝐻0𝐴 or 𝐻0𝐵 , Tukey’s procedure can be used to
identify significant differences between the levels of the factor under
investigation.
1. For comparing levels of factor 𝐴, obtain 𝑄 𝛼,𝐼,(𝐼−1)(𝐽−1) .
For comparing levels of factor 𝐵, obtain 𝑄 𝛼,𝐽,(𝐼−1)(𝐽−1)
2. Compute
Example 4
The article “Adiabatic Humidification of Air with Water in a Packed
Tower (Chem. Eng. Prog., 1952: 362-370) reports data on gas film
heat transfer coefficient (Btu/hr 𝑓 𝑡 2 on ◦ 𝐹) as a function of gas rate
(factor 𝐴) and liquid rate (factor 𝐵).
B
1(190) 2(250) 3(300) 4(400)
1(200) 200 226 240 261
A 2(400) 278 312 330 381
3(700) 369 416 462 517
4(1100) 500 575 645 733
Statistics ITC 37 / 49
Two-Factor ANOVA with 𝐾 𝑖𝑗 = 1
Statistics ITC 38 / 49
——————————————————–
Statistics ITC 39 / 49
Contents
1 Introduction
Statistics ITC 39 / 49
Fixed Effects Parameters and Hypotheses
Notation
𝜇= 1
𝜇𝑖𝑗 , 𝜇𝑖. = 1Í
𝜇𝑖𝑗 , 𝜇.𝑗 = 1Í
𝜇𝑖𝑗
ÍÍ
𝐼𝐽 𝐽 𝐼
𝑖 𝑗 𝑗 𝑖
Definition
from which
𝜇𝑖𝑗 = 𝜇 + 𝛼 𝑖 + 𝛽 𝑗 + 𝛾𝑖𝑗
Statistics ITC 40 / 49
Fixed Effects Parameters and Hypothes
The model is additive if and only if all 𝛾𝑖𝑗′ 𝑠 = 0. The 𝛾𝑖𝑗′ 𝑠 are referred
to as the interaction parameters. The 𝛼′𝑖 𝑠 are called the main
effects for factor 𝐴, and the 𝛽′𝑗 𝑠 are the main effects for factor 𝐵.
Although there are 𝐼𝛼 𝑖 ’s,𝐽𝛽 𝐽 ’s, and 𝐼𝐽𝛾𝑖𝑗 ’s in addition to 𝜇, the
conditions 𝛼 𝑖 = 0, 𝛽 𝑗 = 0, 𝑗 𝛾𝑖𝑗 = 0 for any 𝑖, and 𝑖 𝛾𝑖𝑗 = 0 for
Í Í Í Í
any 𝑗 imply that only 𝐼𝐽 of these new parameters are independently
determined: 𝜇, 𝐼 − 1, of the 𝛼 𝑖 ’s,𝐽 − 1 of the 𝛽 𝑗 ’s and (𝐼 − 1)(𝐽 − 1) of the
𝛾𝑖𝑗 ’s.
There are three sets of hypotheses to be considered:
Statistics ITC 41 / 49
Model and Test Procedures
We now use triple subscripts for both random variables and observed
values, with 𝑋𝑖𝑗 𝑘 and 𝑥 𝑖𝑗 𝑘 referring to the 𝑘th observation (replication)
when factor 𝐴 is at level 𝑖 and factor 𝐵 is at level 𝑗.
𝑋𝑖𝑗 𝑘 = 𝜇 + 𝛼 𝑖 + 𝛽 𝑗 + 𝛾𝑖𝑗 + 𝜖 𝑖𝑗 𝑘
𝑖 = 1, . . . , 𝐼, 𝑗 = 1, . . . , 𝐽, 𝑘 = 1, . . . , 𝐾
where the 𝜖 𝑖𝑗 𝑘 ’s are independent and normally distributed, each with
mean 0 and variance 𝜎2
Statistics ITC 42 / 49
Model and Test Procedures
Statistics ITC 43 / 49
Model and Test Procedures
Definition
The fundamental identity is
Statistics ITC 44 / 49
Model and Test Procedures
The expected mean squares suggest that each set of hypotheses should
be tested using the appropriate ratio of mean squares with MSE in the
denominator:
𝐸(𝑀𝑆𝐸) = 𝜎2
𝐼 𝐽
𝐽𝐾 Õ 2 𝐼𝐾 Õ 2
𝐸(𝑀𝑆𝐴) = 𝜎 + 2
𝛼𝑖 , 𝐸(𝑀𝑆𝐵) = 𝜎 + 2
𝛽𝑗
𝐼−1 𝐽−1
𝑖=1 𝑗=1
𝐽
𝐼 Õ
𝐾 Õ
𝐸(𝑀𝑆𝐴𝐵) = 𝜎2 + 𝛾𝑖𝑗2
(𝐼 − 1)(𝐽 − 1)
𝑖=1 𝑗=1
Statistics ITC 45 / 49
Model and Test Procedures
Statistics ITC 46 / 49
Multiple Comparisons
When the no-interaction hypothesis 𝐻0𝐴𝐵 is not rejected and at least
one of the two main effect null hypotheses is rejected, Tukey’ s method
can be used to identify significant differences in levels. For identifying
differences among the 𝛼 𝑖 ’s when 𝐻0𝐴 is rejected,
1. Obtain ,𝑄 𝛼,𝐼,𝐼𝐽(𝐾−1) where the second subscript 𝐼 identifies the
number of levels being compared and the third subscript refers to
the number of degrees of freedom for error.
p
2. Compute 𝑤 = 𝑄 𝑀𝑆𝐸/(𝐽𝐾) where 𝐽𝐾 is the number of
observations averaged to obtain each of the 𝑥¯ 𝑖.. ’s compared in Step
3.
3. Order the 𝑥¯ 𝑖.. ’s from smallest to largest and, as before, underscore
all pairs that differ by less than w. Pairs not underscored
correspond to significantly different levels of factor 𝐴.
To identify different levels of factor 𝐵 when 𝐻0𝐵 is rejected, replace the
second subscript in 𝑄 by 𝐽, replace 𝐽𝐾 by 𝐼𝐾 in 𝑤, and replace 𝑥¯ 𝑖.. by
𝑥¯ .𝑗. .
Statistics ITC 47 / 49
Multiple Comparisons
Example 5
In an experiment to assess the effects of curing time (factor A) and
type of mix (factor B) on the compressive strength of hardened cement
cubes, three different curing times were used in combination with four
different mixes, with three observations obtained for each of the 12
curing time –mix combinations. The resulting sums of squares were
computed to be 𝑆𝑆𝐴 = 30, 763.0, 𝑆𝑆𝐵 = 34, 185.6, 𝑆𝑆𝐸 = 97, 436.8 and
𝑆𝑆𝑇 = 205, 966.6.
a. Construct an ANOVA table.
b. Test at level .05 the null hypothesis 𝐻0𝐴𝐵 : all 𝛾𝑖𝑗 ’s =0 (no
interaction of factors) against 𝐻𝑎𝐴𝐵 : at least one 𝛾𝑖𝑗 ≠ 0.
c. Test at level .05 the null hypothesis 𝐻0𝐴 : 𝛼1 = 𝛼 2 = 𝛼3 = 0 (factor
A main effects are absent) against 𝐻𝑎𝐴 : at least one 𝛼 𝑖 ≠ 0.
Statistics ITC 48 / 49
Multiple Comparisons
Statistics ITC 49 / 49