0% found this document useful (0 votes)
49 views51 pages

Chapter6 ANOVA

Uploaded by

sarakyuth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views51 pages

Chapter6 ANOVA

Uploaded by

sarakyuth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

CHAPTER V

ANOVA

PHOK Ponna

Institute of Technology of Cambodia


Department of Applied Mathematics and Statistics (AMS)

2022–2023

Statistics ITC 1 / 49
Contents

1 Introduction

2 One Factor ANOVA

3 Multiple Comparisons in ANOVA

4 More on Single-Factor ANOVA

5 Two-Factor ANOVA with 𝐾 𝑖𝑗 = 1

6 Two-Factor ANOVA with 𝐾 𝑖𝑗 > 1

Statistics ITC 1 / 49
Contents

1 Introduction

2 One Factor ANOVA

3 Multiple Comparisons in ANOVA

4 More on Single-Factor ANOVA

5 Two-Factor ANOVA with 𝐾 𝑖𝑗 = 1

6 Two-Factor ANOVA with 𝐾 𝑖𝑗 > 1

Statistics ITC 2 / 49
Introduction

In this chapter, we test several (more than two) populations for the
equality of their means. For example, we compare the mean yields of
different hybrid varieties of tomato. We study the effect of four
different drugs on patients of a certain disease.
The observed differences between various treatment groups are caused
by one or more factors. In the above two examples, the hybrid variety
and drug are the factors that can vary the mean between different
treatment groups.
The methodology used for this comparison is called the Analysis Of
Variance (ANOVA).

Statistics ITC 3 / 49
Introduction

In the Analysis Of Variance, the total variance in the data is resolved


into components. Each one of these components of variance is
contributed by a factor in the study. From this we can estimate the
fractional contribution of a factor to the total variance. Under the null
hypothesis of equality of population means compared, a test statistic
involving the ratios of these variances is known to follow an 𝐹
distribution with two degrees of freedom determined by the sample
sizes.
We consider the following three categories of the Analysis Of variance:
1 One factor ANOVA (one way ANOVA)
2 Two factor ANOVA with one observation per cell (𝐾 𝑖𝑗 = 1)
3 Two factor ANOVA with multiple observation per cell
(𝐾 𝑖𝑗 > 1)

Statistics ITC 4 / 49
Contents

1 Introduction

2 One Factor ANOVA

3 Multiple Comparisons in ANOVA

4 More on Single-Factor ANOVA

5 Two-Factor ANOVA with 𝐾 𝑖𝑗 = 1

6 Two-Factor ANOVA with 𝐾 𝑖𝑗 > 1

Statistics ITC 5 / 49
One Factor ANOVA

In One factor Analysis Of Variance, we compare three or more


treatment data sets. Suppose there are 𝑚 data sets to compare. These
𝑚 data sets are assumed to have been randomly drawn from 𝑚 normal
distributions of unknown means 𝜇1 , 𝜇2 , 𝜇3 , . . . , 𝜇𝑚 and unknown but
common variance 𝜎2 .
We wish to test the equality of means of the 𝑚 normal distributions
against all possible alternate hypothesis:

𝐻0 : 𝜇1 = 𝜇2 = . . . = 𝜇𝑚 vs 𝐻𝑎 : at least two of the 𝜇𝑖 ’s are different

Statistics ITC 6 / 49
One Factor ANOVA

We use the index 𝑖, with 𝑖 = 1, 2, . . . , 𝑚 to represent a data set 𝑋𝑖 of


size 𝑛 𝑖 randomly drawn from a normal distribution 𝑁(𝜇𝑖 , 𝜎2 ).
The individual elements of the data set 𝑋𝑖 are written with a double
index 𝑋𝑖𝑗 as follows:

𝑋1 = {𝑋11 , 𝑋12 , . . . , 𝑋1𝑛1 } first data set with 𝑛1 observations


𝑋2 = {𝑋21 , 𝑋22 , . . . , 𝑋2𝑛2 } second data set with 𝑛2 observations
...
𝑋𝑖 = {𝑋𝑖1 , 𝑋𝑖2 , . . . , 𝑋𝑖𝑛 𝑖 } data set 𝑖 with 𝑛 𝑖 observations observations
...
𝑋𝑚 = {𝑋𝑚1 , 𝑋𝑚2 , . . . , 𝑋𝑚𝑛𝑚 } data set 𝑖 with 𝑛 𝑖 last data set 𝑚 with 𝑛 𝑚
observations.

Here, 𝑖 = 1, 2, 3, . . . , 𝑚 denotes individual data sets and


𝑗 = 1, 2, 3, . . . , 𝑛 𝑖 denotes elements of a data set 𝑖.
Let 𝑛 = 𝑛1 + 𝑛2 + . . . + 𝑛 𝑚 be the total number of observations in all
data sets.
Statistics ITC 7 / 49
One Factor ANOVA

Definition 1
The mean of a data set 𝑖 is computed as,
1 Í𝑛 𝑖
𝑋¯ 𝑖. = 𝑋𝑖𝑗
𝑛 𝑖 𝑗=1
where the dot in the symbol 𝑋¯ 𝑖. indicates that for a given data set
𝑖, the summation over elements 𝑗 have been carried out. Hence we
replace 𝑗 by dot symbol.
The mean of the whole data set (grand mean) is computed by
summing all the data sets and dividing by the total number of
data points in all of them together:
1 Í𝑚 Í𝑛 𝑖
𝑋¯ .. = 𝑋𝑖𝑗
𝑛 𝑖=1 𝑗=1
The sample variances are denoted by 𝑆12 , 𝑆22 , . . . , 𝑆𝑚2 , respectively:

1 Í𝑛 𝑖 ¯ 𝑖. 2 , 𝑖 = 1, 2, . . . , 𝑚
𝑆 2𝑖 = 𝑋 𝑋

𝑖𝑗 −
𝑛 𝑖 − 1 𝑗=1

Statistics ITC 8 / 49
One factor ANOVA

Definition 2
1 Total Sum of Squares (SST):
Í𝑚 Í𝑛 𝑖 2
𝑆𝑆𝑇 = 𝑖=1 𝑗=1
𝑋𝑖𝑗 − 𝑋¯ ..
2 Treatment Sum of Squares (SSTr):
Í𝑚 Í𝑛 𝑖 2
𝑆𝑆𝑇𝑟 = 𝑖=1 𝑗=1
𝑋¯ 𝑖. − 𝑋¯ ..
3 Error Sum of Squares (SSE):
Í𝑚 Í𝑛 𝑖 2
𝑆𝑆𝐸 = 𝑖=1 𝑗=1
𝑋𝑖𝑗 − 𝑋¯ 𝑖.

Theorem 1
𝑆𝑆𝑇 = 𝑆𝑆𝑇𝑟 + 𝑆𝑆𝐸

Statistics ITC 9 / 49
One Factor ANOVA

Remark 1
𝑋..2
Let 𝐶𝐹 = = 𝑛 𝑋¯ 2 . In practice, we use the following formulas:
𝑛Í Í ..
𝑛𝑖
1 𝑆𝑆𝑇 = 𝑚 𝑖=1 𝑋 2 − 𝐶𝐹
𝑗=1 𝑖𝑗
Í𝑚
2 𝑆𝑆𝑇𝑟 = 𝑛 𝑖 𝑋¯ 𝑖.2 − 𝐶𝐹
Í𝑚𝑖=1
3 𝑆𝑆𝐸 = 𝑖=1 (𝑛 𝑖 − 1) 𝑆 𝑖 = 𝑆𝑆𝑇 − 𝑆𝑆𝑇𝑟
2

Definition 3
Mean square for treatment (MSTr) is defined by
𝑆𝑆𝑇𝑟
𝑀𝑆𝑇𝑟 =
𝑚−1
Mean square for error (MSE) is defined by
𝑆𝑆𝐸
𝑀𝑆𝐸 =
𝑛−𝑚

Statistics ITC 10 / 49
One Factor ANOVA

Theorem 2
𝑖𝑖𝑑
For all 𝑖 = 1, 2, . . . , 𝑚, if 𝑋𝑖1 , 𝑋𝑖2 , . . . , 𝑋𝑖𝑛 𝑖 ∼ 𝑁(𝜇𝑖 , 𝜎2 ), then
𝑋¯ 𝑖. and 𝑆 2 are independent,
𝑖
𝑋¯ 𝑖. ∼ 𝑁(𝜇𝑖 , 𝜎2 /𝑛),
(𝑛 𝑖 − 1) 𝑆 2𝑖 /𝜎2 ∼ 𝜒2 (𝑛 𝑖 − 1).

ANOVA assumptions
𝑖𝑖𝑑
For all 𝑖 = 1, 2, . . . , 𝑚, 𝑋𝑖1 , 𝑋𝑖2 , . . . , 𝑋𝑖𝑛 𝑖 ∼ 𝑁(𝜇𝑖 , 𝜎2 ).

Statistics ITC 11 / 49
One Factor ANOVA

Theorem 3
Assume that the ANOVA assumptions are satisfied. Then
𝑆𝑆𝐸
1 ∼ 𝜒 2 (𝑛 − 𝑚)
𝜎2
𝑆𝑆𝑡𝑟
2 When 𝐻 is true, ∼ 𝜒2 (𝑚 − 1)
0
𝜎2
3 𝑆𝑆𝑇𝑟 and 𝑆𝑆𝐸 are independent random variables

4 When 𝐻0 is true, 𝐸(𝑆𝑆𝑇𝑟) = (𝑚 − 1)𝜎2 .

Theorem 4
Assume that the ANOVA assumptions are satisfied. Then
1 𝑆𝑆𝐸 and 𝑆𝑆𝑇𝑟 are independent and 𝑆𝑆𝑇 = 𝑆𝑆𝐸 + 𝑆𝑆𝑇𝑟.
2 𝑆𝑆𝑇𝑟/𝜎2 ∼ 𝜒 2 (𝑚 − 1) under the null hypothesis 𝐻0 .
𝑀𝑆𝑇𝑟
3 𝐹= ∼ 𝐹(𝑚 − 1, 𝑛 − 𝑚) under 𝐻0 .
𝑀𝑆𝐸
Statistics ITC 12 / 49
One Factor ANOVA

Theorem 5
Assume that the ANOVA assumptions are satisfied. Then the statistic
in single-factor ANOVA is 𝐹 = 𝑀𝑆𝑇𝑟/𝑀𝑆𝐸. When 𝐻0 is true
𝐹 ∼ 𝐹(𝑚 − 1, 𝑛 − 𝑚), and the rejection region at the level of significance
𝛼 using the test statistic value 𝑓 , is given by

𝑅𝑅 = { 𝑓 : 𝑓 ≥ 𝐹𝛼,𝑚−1,𝑛−𝑚 }.

Statistics ITC 13 / 49
One Factor ANOVA

The computations are often summarized in a tabular format, called an


ANOVA table. Tables produced by statistical software customarily
include a 𝑃-value column to the right of 𝑓 .
ANOVA Table

Source of Variation 𝑑𝑓 Sum of Squares Mean Square 𝑓

Treatments 𝑚−1 SSTr MSTr = SSTr


𝑚−1
MSTr
MSE
Error 𝑛−𝑚 SSE SSE
MSE = 𝑛−𝑚
Total 𝑛−1 SST

Statistics ITC 14 / 49
One Factor ANOVA

Example 1
The cholesterol level of four groups of adults with distinct food habits
among the groups were compared in an experiment. The results are
present below:
Group1 : 220 214 203 184 186 200 165
Group2 : 262 193 225 200 164 266 179
Group3 : 272 192 190 208 231 235 141
Group4 : 190 255 247 278 230 269 289
Assuming that these four data sets follow Normal distributions
𝑁(𝜇𝑖 , 𝜎2 ), test the null hypothesis that 𝐻0 : 𝜇1 = 𝜇2 = 𝜇3 = 𝜇4 against
𝐻𝑎 : at least two of 𝜇𝑖 ’s are different, to a significance level of 0.05.

Statistics ITC 15 / 49
One Factor ANOVA

Example 2
It is common practice in many countries to destroy (shred)
refrigerators at the end of their useful lives. In this process material
from insulating foam may be released into the atmosphere. The article
”Release of Fluorocarbons from Insulation Foam in Home Appliances
during Shredding” (J. of the Air and Waste Mgmt. Assoc., 2007:
1452–1460) gave the following data on foam density (g/L) for each of
two refrigerators produced by four different manufacturers:
1. 30.4, 29.2 2. 27.7, 27.1
3. 27.1, 24.8 4. 25.5, 28.8
Does it appear that true average foam density is not the same for all
these manufacturers? Carry out an appropriate test of hypotheses by
obtaining as much P-value information as possible, and summarize
your analysis in an ANOVA table.

Statistics ITC 16 / 49
Contents

1 Introduction

2 One Factor ANOVA

3 Multiple Comparisons in ANOVA

4 More on Single-Factor ANOVA

5 Two-Factor ANOVA with 𝐾 𝑖𝑗 = 1

6 Two-Factor ANOVA with 𝐾 𝑖𝑗 > 1

Statistics ITC 17 / 49
Tukey’s Procedure (the 𝑇 Method)

Tukey’s procedure involves the use of another probability distribution


called the Studentized range distribution. The distribution
depends on two parameters: a numerator df 𝑚 and a denominator df 𝜈.
Definition 1
Let 𝑍1 , . . . , 𝑍 𝑚 be iid 𝑁(0, 1) and 𝑊 ∼ 𝜒 2 (𝜈), and 𝑊 is independent of
𝑍 𝑖 ’s. Then the distribution of

max |𝑍 𝑖 − 𝑍 𝑗 | max{𝑍1 . . . , 𝑍 𝑚 } − min{𝑍1 , . . . , 𝑍 𝑚 }


𝑄= p = p
𝑊/𝜈 𝑊/𝜈

is called the studentized range distribution with parameters 𝑚


numerator df and 𝜈 denominator df.

Statistics ITC 18 / 49
Tukey’s Procedure (the 𝑇 Method)

Let 𝑄 𝛼,𝑚,𝜈 denote the upper-tail 𝛼 critical value of the Studentized


range distribution with 𝑚 numerator df and 𝜈 denominator df
(analogous to 𝐹𝛼,𝜈1 ,𝜈2 ).
Proposition 2
Let s  
MSE 1 1
𝑤 𝑖𝑗 = 𝑄 𝛼,𝑚,𝑛−𝑚 . +
2 𝑛𝑖 𝑛 𝑗
Then with (approximately) probability 1 − 𝛼,

𝑋¯𝑖. − 𝑋¯𝑗. − 𝑤 𝑖𝑗 ≤ 𝜇𝑖 − 𝜇 𝑗 ≤ 𝑋¯𝑖. − 𝑋¯𝑗. + 𝑤 𝑖𝑗 (1)

for every 𝑖 and 𝑗 (𝑖 = 1, 2, ..., 𝑚 and 𝑗 = 1, 2, ..., 𝑚) with 𝑖 ≠ 𝑗.

Statistics ITC 19 / 49
Tukey’s Procedure (the 𝑇 Method)

Suppose, for example, that 𝑚 = 5 and that

𝑥¯ 2. < 𝑥¯ 5. < 𝑥¯ 4. < 𝑥¯ 1. < 𝑥¯ 3.

1. Consider first the smallest mean 𝑥¯ 2. . If 𝑥¯ 5. − 𝑥¯ 2. ≥ 𝑤52 , proceed to


Step 2. However, if 𝑥¯ 5. − 𝑥¯ 2. < 𝑤 52 , connect these first two means
with a line segment. Then if possible extend this line segment even
further to the right to the largest 𝑥¯ 𝑖. that differs from 𝑥¯ 2. by less
than 𝑤 𝑖2 (so the line may connect two, three, or even more means).
2. Now move 𝑥¯ 5. to and again extend a line segment to the largest 𝑥¯ 𝑖.
to its right that differs 𝑥¯ 5. from by less than 𝑤 𝑖5 (it may not be
possible to draw this line, or alternatively it may underscore just
two means, or three, or even all four remaining means).

Statistics ITC 20 / 49
Tukey’s Procedure (the 𝑇 Method)

3. Continue by moving to 𝑥¯ 4. and repeating, and then finally move to


𝑥¯ 1. .
To summarize, starting from each mean in the ordered list, a line
segment is extended as far to the right as possible as long as the
difference between the means is smaller than 𝑤 𝑖𝑗 . It is easily verified
that a particular interval of the form Eq.(1) will contain 0 if and only if
the corresponding pair of sample means is underscored by the same
line segment.

Statistics ITC 21 / 49
Tukey’s Procedure (the 𝑇 Method)–Example

Example 3
An experiment was carried out to compare five different brands of
automobile oil filters with respect to their ability to capture foreign
material. Let 𝜇𝑖 denote the true average amount of material captured
by brand 𝑖 filters (𝑖 = 1, ..., 5) under controlled conditions. A sample of
nine filters of each brand was used, resulting in the following sample
mean amounts: 𝑥¯ 1. =14.5, 𝑥¯ 2. =13.8, 𝑥¯ 3. =13.3, 𝑥¯ 4. =14.3, and 𝑥¯ 5. = 13.1.

We have:
Source of Sum of
Variation 𝑑𝑓 Squares Mean Square 𝑓

Treatments 4 13.32 3.33 37.84


Error 40 3.53 0.088
Total 44 16.85

Statistics ITC 22 / 49
Tukey’s Procedure (the 𝑇 Method)–Example

Since 𝐹0.05,4,40 = 2.61, 𝐻0 is rejected at level 𝛼 = 0.05.


Then by Tukey’sp procedure: 𝑄0.05,5,40 = 4.04, thus
𝑤 = 𝑤 𝑖𝑗 = 4.04 0.088/9 = 0.4 for all 𝑖, 𝑗 = 1, . . . , 5, 𝑖 ≠ 𝑗.
Arrange 𝑥¯ 𝑖· ’s in increasing order: 𝑥¯ 5· < 𝑥¯ 3· < 𝑥¯ 2· < 𝑥¯ 4· < 𝑥¯ 1· .
Compare 𝑥¯ 𝑖· − 𝑥¯ 𝑗· with 𝑤 𝑖𝑗 using the three steps above and
underline.
Then we obtain:

Table: 𝑇 Method Example


𝑥¯ 5· 𝑥¯ 3· 𝑥¯ 2· 𝑥¯ 4· 𝑥¯ 1·
13.1 13.3 13.8 14.3 14.5

What if 𝑥¯ 2· = 14.15 other than 13.8?

Statistics ITC 23 / 49
Contents

1 Introduction

2 One Factor ANOVA

3 Multiple Comparisons in ANOVA

4 More on Single-Factor ANOVA

5 Two-Factor ANOVA with 𝐾 𝑖𝑗 = 1

6 Two-Factor ANOVA with 𝐾 𝑖𝑗 > 1

Statistics ITC 24 / 49
The ANOVA Model

The assumptions of single-factor ANOVA can be described succinctly


by means of the model equation
𝑋𝑖𝑗 = 𝜇𝑖 + 𝜀𝑖𝑗
where 𝜀𝑖𝑗 represents a random deviation from the population or true
treatment mean 𝜇𝑖 . The 𝜀𝑖𝑗 are assumed to be independent, normally
distributed rv’s (implying that the 𝑋𝑖𝑗′ 𝑠 are also) with 𝐸(𝜀𝑖𝑗 ) = 0 [so
that 𝐸(𝑋𝑖𝑗 ) = 𝜇𝑖 ] and 𝑉(𝜀𝑖𝑗 ) = 𝜎2 [from which 𝑉(𝑋𝑖𝑗 ) = 𝜎2 for every 𝑖
and 𝑗]. An alternative description of single-factor ANOVA will give
added insight and suggest appropriate generalizations to models
involving more than one factor. Define a parameter 𝜇 by
𝑚
1 Õ
𝜇= 𝜇𝑖
𝑚
𝑖=1
And the parameter 𝛼1 , . . . , 𝛼 𝑚 by
𝛼 𝑖 = 𝜇𝑖 − 𝜇 (𝑖 = 1, . . . , 𝑚)
Statistics ITC 25 / 49
The ANOVA Model

In terms of 𝜇 and the 𝛼 𝑖 ’s, the model becomes

𝑋𝑖𝑗 = 𝜇 + 𝛼 𝑖 + 𝜀𝑖𝑗 (𝑖 = 1, . . . , 𝑚; 𝑗 = 1, . . . , 𝑛 𝑖 )

The claim that the 𝜇𝑖 ’s are identical is equivalent to the equality of the
𝛼 𝑖 ’s, and because 𝐼𝑖=1 𝛼 𝑖 = 0 the null hypothesis becomes
Í

𝐻0 : 𝛼 1 = 𝛼2 = · · · 𝛼 𝑚 = 0

Recall that MSTr is an unbiased estimator of 𝜎2 when 𝐻0 is true but


otherwise tends to overestimate 𝜎2 . Here is a more precise result (with
the restriction 𝑚𝑖=1 𝑛 𝑖 𝛼 𝑖 = 0):
Í

𝑚
1 Õ
𝐸(MSTr) = 𝜎 + 2
𝑛 𝑖 𝛼2𝑖 .
𝑚−1
𝑖=1

Statistics ITC 26 / 49
Contents

1 Introduction

2 One Factor ANOVA

3 Multiple Comparisons in ANOVA

4 More on Single-Factor ANOVA

5 Two-Factor ANOVA with 𝐾 𝑖𝑗 = 1

6 Two-Factor ANOVA with 𝐾 𝑖𝑗 > 1

Statistics ITC 27 / 49
Two-Factor ANOVA with 𝐾 𝑖𝑗 = 1

When factor 𝐴 consists of 𝐼 levels and factor 𝐵 consists of 𝐽 levels,


there are 𝐼𝐽 different combinations (pairs) of levels of the two factors,
each called a treatment. With 𝐾 𝑖𝑗 = the number of observations on the
treatment consisting of factor 𝐴 at level 𝑖 and factor 𝐵 at level 𝑗, we
restrict attention in this section to the case 𝐾 𝑖𝑗 = 1, so that the data
consists of 𝐼𝐽 observations. Our focus is on the fixed effects model, in
which the only levels of interest for the two factors are those actually
represented in the experiment.
Let

𝑋𝑖𝑗 = the rv denoting the measurement when factor𝐴


is held at level 𝑖 and factor 𝐵 is held at level 𝑗
𝑥 𝑖𝑗 = the observed value of 𝑋𝑖𝑗

Statistics ITC 28 / 49
Two-Factor ANOVA with 𝐾 𝑖𝑗 = 1

𝑋¯ 𝑖. = the average of measurements obtained


𝐽
𝑋𝑖𝑗
Í
𝑗=1
when factor 𝐴 is held at level 𝑖 =
𝐽
¯
𝑋.𝑗 = the average of measurements obtained
𝐼
𝑋𝑖𝑗
Í
𝑖=1
when factor 𝐴 is held at level 𝑗 =
𝐼
𝐽
𝐼 Í
𝑋𝑖𝑗
Í
𝑖=1 𝑗=1
𝑋¯ .. = the grand mean =
𝐼𝐽

Statistics ITC 29 / 49
The Fixed Effects Model

Assume the existence of 𝐼 parameters 𝛼1 , 𝛼 2 , ..., 𝛼 𝐼 and 𝐽 parameters


𝛽 1 , 𝛽 2 , ..., 𝛽 𝐽 , such that

𝑋𝑖𝑗 = 𝛼 𝑖 + 𝛽 𝑗 + 𝜖 𝑖𝑗 (𝑖 = 1, 2, ..., 𝐼, 𝑗 = 1, 2, ..., 𝐽) (2)

so that
𝜇𝑖𝑗 = 𝛼 𝑖 + 𝛽 𝑗 (3)
where
𝜇𝑖𝑗 is the true average response when factor 𝐴 is at level 𝑖 and
factor 𝐵 at level 𝑗.
𝜖 𝑖𝑗 is the random amount by which the observed value differs from
its expectation.

The model specified in Eq. (2) and (3) is called an additive model
because each mean response 𝜇𝑖𝑗 is the sum of an effect due to factor 𝐴
at level 𝑖 (𝛼 𝑖 ) and an effect due to factor 𝐵 at level 𝑗 (𝛽 𝑗 ) .
Statistics ITC 30 / 49
The Fixed Effects Model

This non-uniqueness is eliminated by use of the following model.

𝑋𝑖𝑗 = 𝜇 + 𝛼 𝑖 + 𝛽 𝑗 + 𝜖 𝑖𝑗 (4)

𝐼 𝐽
𝛼 𝑖 = 0, and 𝛽 𝑖 = 0 the 𝜖 𝑖𝑗 ’s are assumed independent,
Í Í
where
𝑖=1 𝑗=1
normally distributed, with mean 0 and common variance 𝜎2 .

The model specified in (1) and (2) is called an additive model


because each mean response 𝜇𝑖𝑗 is the sum of an effect due to factor 𝐴
at level 𝑖 (𝛼 𝑖 ) and an effect due to factor 𝐵 at level 𝑗 (𝛽 𝑗 ) .

Statistics ITC 31 / 49
The Fixed Effects Model

There are two different null hypotheses of interest in a two-factor


experiment with . The first, denoted by 𝐻0𝐴 , states that the different
levels of factor 𝐴 have no effect on true average response. The second,
denoted by 𝐻0𝐵 , asserts that there is no factor 𝐵 effect.

𝐻0𝐴 : 𝛼1 = 𝛼 2 = 𝛼3 = .... = 𝛼 𝐼 = 0
versus 𝐻𝑎𝐴 : at least one 𝛼 𝑖 ≠ 0
𝐻0𝐵 : 𝛽1 = 𝛽 2 = 𝛽 3 = .... = 𝛽 𝐽 = 0
versus 𝐻𝑎𝐵 : at least one 𝛽 𝑗 ≠ 0

Statistics ITC 32 / 49
Test Procedures

There are four sums of squares, each with an associated number of 𝑑 𝑓 :


Definition

𝐽
𝐼 Õ
Õ 2
𝑆𝑆𝑇 = (𝑋𝑖𝑗 − 𝑋¯ .. ) 𝑑 𝑓 = 𝐼𝐽 − 1
𝑖=1 𝑗=1

Õ 𝐽
𝐼 Õ 𝐼
Õ
2
𝑆𝑆𝐴 = (𝑋¯ 𝑖. − 𝑋¯ .. ) = 𝐽 (𝑋¯ 𝑖. − 𝑋¯ .. )2 𝑑𝑓 = 𝐼 − 1
𝑖=1 𝑗=1 𝑖=1
𝐽
𝐼 Õ
Õ 𝐽
Õ
2
𝑆𝑆𝐵 = (𝑋¯ .𝑗 − 𝑋¯ .. ) = 𝐼 (𝑋.𝑗 − 𝑋¯ .. )2 𝑑𝑓 = 𝐽 − 1
𝑖=1 𝑗=1 𝑗=1
𝐽
𝐼 Õ
Õ 2
𝑆𝑆𝐸 = (𝑋𝑖𝑗 − 𝑋¯ 𝑖. − 𝑋¯ .𝑗 + 𝑋.. ) 𝑑 𝑓 = (𝐼 − 1)(𝐽 − 1)
𝑖=1 𝑗=1

The fundamental
Statistics
identity is 𝑆𝑆𝑇 =ITC
𝑆𝑆𝐴 + 𝑆𝑆𝐵 + 𝑆𝑆𝐸. 33 / 49
Test Procedures

Statistical theory now says that if we form 𝐹 ratios as in single-factor


ANOVA, when 𝐻0𝐴 (𝐻0𝐵 ) is true, the corresponding 𝐹 ratio has an 𝐹
distribution with numerator 𝑑 𝑓 = 𝐼 − 1(𝐽 − 1) and denominator
𝑑 𝑓 = (𝐼 − 1)(𝐽 − 1).

Test Statistic Value


Hypothesis Rejection Region
𝑀𝑆𝐴
𝑓𝐴 =
𝐻0𝐴 versus 𝐻𝑎𝐴 𝑀𝑆𝐸 𝑓𝐴 ≥ 𝐹𝛼,𝐼−1,(𝐼−1)(𝐽−1)

𝐻0𝐵 versus 𝐻𝑎𝐵 𝑀𝑆𝐵 𝑓𝐵 ≥ 𝐹𝛼,𝐽−1,(𝐼−1)(𝐽−1)


𝑓𝐵 =
𝑀𝑆𝐸

Statistics ITC 34 / 49
Expected Mean Squares

The plausibility of using the 𝐹 tests just described is demonstrated by


computing the expected mean squares. For the additive model

𝐸(𝑀𝑆𝐸) = 𝜎2
𝐼
𝐽 Õ 2
𝐸(𝑀𝑆𝐴) = 𝜎 +
2
𝛼𝑖
𝐼−1
𝑖=1
𝐽
𝐼 Õ
𝐸(𝑀𝑆𝐵) = 𝜎2 + 𝛽2𝑗
𝐽−1
𝑗=1

If 𝐻0𝐴 is true, MSA is an unbiased estimator of 𝜎2 , so 𝐹 is a ratio of


two unbiased estimators of 𝜎2 . When 𝐻0𝐴 is false, MSA tends to
overestimate 𝜎2 . Thus 𝐻0𝐴 should be rejected when the ratio 𝐹𝐴 is too
large. Similar comments apply to MSB and 𝐻0𝐵 .

Statistics ITC 35 / 49
Multiple Comparisons
After rejecting either 𝐻0𝐴 or 𝐻0𝐵 , Tukey’s procedure can be used to
identify significant differences between the levels of the factor under
investigation.
1. For comparing levels of factor 𝐴, obtain 𝑄 𝛼,𝐼,(𝐼−1)(𝐽−1) .
For comparing levels of factor 𝐵, obtain 𝑄 𝛼,𝐽,(𝐼−1)(𝐽−1)
2. Compute

𝑤 = 𝑄.(estimated s.d of the sample mean being compared)


( p
𝑄 𝛼,𝐼,(𝐼−1)(𝐽−1) . 𝑀𝑆𝐸/𝐽 for factor 𝐴 comparisons
= p
𝑄 𝛼,𝐽,(𝐼−1)(𝐽−1) . 𝑀𝑆𝐸/𝐼 for factor 𝐵 comparisons

(because, e.g., the standard deviation of 𝑋¯𝑖. is 𝜎/ 𝐽).
3. Arrange the sample means in increasing order, underscore those
pairs differing by less than w, and identify pairs not underscored
by the same line as corresponding to significantly different levels of
the given factor.
Statistics ITC 36 / 49
Two-Factor ANOVA with 𝐾 𝑖𝑗 = 1

Example 4
The article “Adiabatic Humidification of Air with Water in a Packed
Tower (Chem. Eng. Prog., 1952: 362-370) reports data on gas film
heat transfer coefficient (Btu/hr 𝑓 𝑡 2 on ◦ 𝐹) as a function of gas rate
(factor 𝐴) and liquid rate (factor 𝐵).

B
1(190) 2(250) 3(300) 4(400)
1(200) 200 226 240 261
A 2(400) 278 312 330 381
3(700) 369 416 462 517
4(1100) 500 575 645 733

Statistics ITC 37 / 49
Two-Factor ANOVA with 𝐾 𝑖𝑗 = 1

a. After constructing an ANOVA table, test at level .01 both the


hypothesis of no gas-rate effect against the appropriate alternative
and the hypothesis of no liquid-rate effect against the appropriate
alternative.
b. Use Tukey’s procedure to investigate differences in expected heat
transfer coefficient due to different gas rates.
c. Repeat part (b) for liquid rates.

Statistics ITC 38 / 49
——————————————————–

Statistics ITC 39 / 49
Contents

1 Introduction

2 One Factor ANOVA

3 Multiple Comparisons in ANOVA

4 More on Single-Factor ANOVA

5 Two-Factor ANOVA with 𝐾 𝑖𝑗 = 1

6 Two-Factor ANOVA with 𝐾 𝑖𝑗 > 1

Statistics ITC 39 / 49
Fixed Effects Parameters and Hypotheses

Notation
𝜇= 1
𝜇𝑖𝑗 , 𝜇𝑖. = 1Í
𝜇𝑖𝑗 , 𝜇.𝑗 = 1Í
𝜇𝑖𝑗
ÍÍ
𝐼𝐽 𝐽 𝐼
𝑖 𝑗 𝑗 𝑖

Definition

𝛼 𝑖 = 𝜇𝑖. − 𝜇 = the effect of factor 𝐴 at level 𝑖


𝛽 𝑗 = 𝜇.𝑗 − 𝜇 = the effect of factor 𝐵 at level 𝑗
𝛾𝑖𝑗 = 𝜇𝑖𝑗 − (𝜇 + 𝛼 𝑖 + 𝛽 𝑗 )
= interation between factor 𝐴 at level 𝑖 and factor 𝐵 at level 𝑗

from which
𝜇𝑖𝑗 = 𝜇 + 𝛼 𝑖 + 𝛽 𝑗 + 𝛾𝑖𝑗

Statistics ITC 40 / 49
Fixed Effects Parameters and Hypothes

The model is additive if and only if all 𝛾𝑖𝑗′ 𝑠 = 0. The 𝛾𝑖𝑗′ 𝑠 are referred
to as the interaction parameters. The 𝛼′𝑖 𝑠 are called the main
effects for factor 𝐴, and the 𝛽′𝑗 𝑠 are the main effects for factor 𝐵.
Although there are 𝐼𝛼 𝑖 ’s,𝐽𝛽 𝐽 ’s, and 𝐼𝐽𝛾𝑖𝑗 ’s in addition to 𝜇, the
conditions 𝛼 𝑖 = 0, 𝛽 𝑗 = 0, 𝑗 𝛾𝑖𝑗 = 0 for any 𝑖, and 𝑖 𝛾𝑖𝑗 = 0 for
Í Í Í Í
any 𝑗 imply that only 𝐼𝐽 of these new parameters are independently
determined: 𝜇, 𝐼 − 1, of the 𝛼 𝑖 ’s,𝐽 − 1 of the 𝛽 𝑗 ’s and (𝐼 − 1)(𝐽 − 1) of the
𝛾𝑖𝑗 ’s.
There are three sets of hypotheses to be considered:

𝐻0𝐴𝐵 : 𝛾𝑖𝑗 = 0 for all 𝑖, 𝑗 versus 𝐻𝑎𝐴𝐵 : at least one 𝛾𝑖𝑗 ≠ 0


𝐻0𝐴 : 𝛼1 = · · · = 𝛼 𝐼 = 0 versus 𝐻𝑎𝐴 : at least one 𝛼 𝑖 ≠ 0
𝐻0𝐵 : 𝛽1 = · · · = 𝛽 𝐽 = 0 versus 𝐻𝑎𝐵 : at least one 𝛽 𝑗 ≠ 0

Statistics ITC 41 / 49
Model and Test Procedures

We now use triple subscripts for both random variables and observed
values, with 𝑋𝑖𝑗 𝑘 and 𝑥 𝑖𝑗 𝑘 referring to the 𝑘th observation (replication)
when factor 𝐴 is at level 𝑖 and factor 𝐵 is at level 𝑗.

The fixed effects model is

𝑋𝑖𝑗 𝑘 = 𝜇 + 𝛼 𝑖 + 𝛽 𝑗 + 𝛾𝑖𝑗 + 𝜖 𝑖𝑗 𝑘

𝑖 = 1, . . . , 𝐼, 𝑗 = 1, . . . , 𝐽, 𝑘 = 1, . . . , 𝐾
where the 𝜖 𝑖𝑗 𝑘 ’s are independent and normally distributed, each with
mean 0 and variance 𝜎2

Statistics ITC 42 / 49
Model and Test Procedures

Test procedures are based on the following sums of squares:


Definition
ÕÕÕ 2
𝑆𝑆𝑇 = (𝑋𝑖𝑗 𝑘 − 𝑋¯ ... ) 𝑑 𝑓 = 𝐼𝐽𝐾 − 1
𝑖 𝑗 𝑘
ÕÕÕ 2
𝑆𝑆𝐸 = (𝑋𝑖𝑗 𝑘 − 𝑋¯ 𝑖𝑗. ) 𝑑 𝑓 = 𝐼𝐽(𝐾 − 1)
𝑖 𝑗 𝑘
ÕÕÕ 2
𝑆𝑆𝐴 = (𝑋¯ 𝑖.. − 𝑋¯ ... ) 𝑑𝑓 = 𝐼 − 1
𝑖 𝑗 𝑘
ÕÕÕ 2
𝑆𝑆𝐵 = (𝑋¯ .𝑗. − 𝑋¯ ... ) 𝑑𝑓 = 𝐽 − 1
𝑖 𝑗 𝑘
ÕÕÕ 2
𝑆𝑆𝐴𝐵 = (𝑋¯ 𝑖𝑗. − 𝑋¯ 𝑖.. − 𝑋¯ .𝑗. + 𝑋¯ ... ) 𝑑 𝑓 = (𝐼 − 1)(𝐽 − 1)
𝑖 𝑗 𝑘

Statistics ITC 43 / 49
Model and Test Procedures

Definition
The fundamental identity is

𝑆𝑆𝑇 = 𝑆𝑆𝐴 + 𝑆𝑆𝐵 + 𝑆𝑆𝐴𝐵 + 𝑆𝑆𝐸

𝑆𝑆𝐴𝐵 is referred to as interaction sum of squares.

Statistics ITC 44 / 49
Model and Test Procedures

The expected mean squares suggest that each set of hypotheses should
be tested using the appropriate ratio of mean squares with MSE in the
denominator:

𝐸(𝑀𝑆𝐸) = 𝜎2
𝐼 𝐽
𝐽𝐾 Õ 2 𝐼𝐾 Õ 2
𝐸(𝑀𝑆𝐴) = 𝜎 + 2
𝛼𝑖 , 𝐸(𝑀𝑆𝐵) = 𝜎 + 2
𝛽𝑗
𝐼−1 𝐽−1
𝑖=1 𝑗=1
𝐽
𝐼 Õ
𝐾 Õ
𝐸(𝑀𝑆𝐴𝐵) = 𝜎2 + 𝛾𝑖𝑗2
(𝐼 − 1)(𝐽 − 1)
𝑖=1 𝑗=1

Statistics ITC 45 / 49
Model and Test Procedures

Each of the three mean square ratios can be shown to have an 𝐹


distribution when the associated 𝐻0 is true, which yields the following
level 𝛼 test procedures.

Test Statistic Value


Hypothesis Rejection Region
𝑀𝑆𝐴
𝑓𝐴 =
𝐻0𝐴 versus 𝐻𝑎𝐴 𝑀𝑆𝐸 𝑓𝐴 ≥ 𝐹𝛼,𝐼−1,𝐼𝐽(𝐾−1)
𝑀𝑆𝐵
𝐻0𝐵 versus 𝐻𝑎𝐵 𝑓𝐵 = 𝑓𝐵 ≥ 𝐹𝛼,𝐽−1,𝐼𝐽(𝐾−1)
𝑀𝑆𝐸
𝐻0𝐴𝐵 versus 𝐻𝑎𝐴𝐵 𝑀𝑆𝐴𝐵 𝑓𝐴𝐵 ≥ 𝐹𝛼,(𝐼−1)(𝐽−1),𝐼𝐽(𝐾−1)
𝑓𝐴𝐵 =
𝑀𝑆𝐸

Statistics ITC 46 / 49
Multiple Comparisons
When the no-interaction hypothesis 𝐻0𝐴𝐵 is not rejected and at least
one of the two main effect null hypotheses is rejected, Tukey’ s method
can be used to identify significant differences in levels. For identifying
differences among the 𝛼 𝑖 ’s when 𝐻0𝐴 is rejected,
1. Obtain ,𝑄 𝛼,𝐼,𝐼𝐽(𝐾−1) where the second subscript 𝐼 identifies the
number of levels being compared and the third subscript refers to
the number of degrees of freedom for error.
p
2. Compute 𝑤 = 𝑄 𝑀𝑆𝐸/(𝐽𝐾) where 𝐽𝐾 is the number of
observations averaged to obtain each of the 𝑥¯ 𝑖.. ’s compared in Step
3.
3. Order the 𝑥¯ 𝑖.. ’s from smallest to largest and, as before, underscore
all pairs that differ by less than w. Pairs not underscored
correspond to significantly different levels of factor 𝐴.
To identify different levels of factor 𝐵 when 𝐻0𝐵 is rejected, replace the
second subscript in 𝑄 by 𝐽, replace 𝐽𝐾 by 𝐼𝐾 in 𝑤, and replace 𝑥¯ 𝑖.. by
𝑥¯ .𝑗. .
Statistics ITC 47 / 49
Multiple Comparisons

Example 5
In an experiment to assess the effects of curing time (factor A) and
type of mix (factor B) on the compressive strength of hardened cement
cubes, three different curing times were used in combination with four
different mixes, with three observations obtained for each of the 12
curing time –mix combinations. The resulting sums of squares were
computed to be 𝑆𝑆𝐴 = 30, 763.0, 𝑆𝑆𝐵 = 34, 185.6, 𝑆𝑆𝐸 = 97, 436.8 and
𝑆𝑆𝑇 = 205, 966.6.
a. Construct an ANOVA table.
b. Test at level .05 the null hypothesis 𝐻0𝐴𝐵 : all 𝛾𝑖𝑗 ’s =0 (no
interaction of factors) against 𝐻𝑎𝐴𝐵 : at least one 𝛾𝑖𝑗 ≠ 0.
c. Test at level .05 the null hypothesis 𝐻0𝐴 : 𝛼1 = 𝛼 2 = 𝛼3 = 0 (factor
A main effects are absent) against 𝐻𝑎𝐴 : at least one 𝛼 𝑖 ≠ 0.

Statistics ITC 48 / 49
Multiple Comparisons

d. Test 𝐻0𝐵 : 𝛽1 = 𝛽 2 = 𝛽 3 = 𝛽4 = 0 versus 𝐻𝑎𝐵 : at least one 𝛽 𝑗 ≠ 0


using a level .05 test.
e. The values of 𝑥¯ 𝑖.. ’s the were 𝑥¯ 1.. = 4010.88, 𝑥¯ 2.. = 4029.10, and
𝑥¯ 3.. = 3960.02. Use Tukey’s procedure to investigate significant
differences among the three curing times.

Statistics ITC 49 / 49

You might also like