ANOVA
ANOVA
Assumptions
For the F-test to be applicable in ANOVA, the following assumptions are made
ONE-WAY ANOVA
Let there be 𝑛 observations which are classified into 𝑘 distinct classes of respective sizes
𝑛1 , 𝑛2 , … , 𝑛𝑘 (∑𝑘𝑖=1 𝑛𝑖 = 𝑛). Let the observations be denoted by 𝑥𝑖𝑗 (𝑖 = 1,2, … , 𝑘 ; 𝑗 =
1,2, … , 𝑛𝑖 ). The total variation that is found in 𝑥𝑖𝑗 has two components:
Assumptions
(i) The observations 𝑥𝑖𝑗 are independent and normally distributed with variance 𝜎 2
(ii) Different classes are additive in nature
Hypothesis
𝐻0 : 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑘
Notations
𝑥𝑖∗ = ∑ 𝑥𝑖𝑗
𝑗
𝑥∗∗ = ∑ ∑ 𝑥𝑖𝑗
𝑖 𝑗
Test statistic
𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
𝐹=
𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝑤𝑖𝑡ℎ𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
ANOVA Table
2
𝑥∗∗
𝐶𝐹 (𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑖𝑜𝑛 𝐹𝑎𝑐𝑡𝑜𝑟) = where 𝑁 is total number of observations, 𝑟 is number of
𝑁
varieties/independent elements
TWO-WAY ANOVA
A two-way ANOVA has two independent variables, i.e., two factors of variation – along
rows and along columns.
Let there be 𝑟 rows and 𝑐 columns and the observations be denoted by 𝑥𝑖𝑗 , 𝑖 =
1,2, … , 𝑟 ; 𝑗 = 1,2, … , 𝑐
Assumptions
(i) Observations are independently and normally distributed with mean 0 and
variance 𝜎 2
(ii) Effects of causes are additive in nature
Hypothesis
Notations
𝑥𝑖∗ = ∑ 𝑥𝑖𝑗
𝑗
𝑥∗𝑗 = ∑ 𝑥𝑖𝑗
𝑖
𝑥∗∗ = ∑ ∑ 𝑥𝑖𝑗
𝑖 𝑗
ANOVA Table
2
∑ ∑ 𝑥𝑖𝑗
TSS 𝑖 𝑗 𝑁−1
− 𝐶𝐹 𝐹𝐶
𝐶𝑆𝑆 𝐹𝑐−1,(𝑐−1)(𝑟−1))
ESS 𝑇𝑆𝑆 − 𝑅𝑆𝑆 (𝑐 − 1)(𝑟 𝐸𝑆𝑆 = 𝑐−1
− 𝐶𝑆𝑆 − 1) 𝐸𝑆𝑆
(𝑐 − 1)(𝑟 − 1) (𝑐 − 1)(𝑟 − 1)
2
𝑥∗∗
𝐶𝐹 (𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑖𝑜𝑛 𝐹𝑎𝑐𝑡𝑜𝑟) = where 𝑁 is total number of observations, 𝑟 is number of
𝑁
rows and 𝑐 is number of columns.
Problems
School→ A B C D
Marks
8 12 18 13
10 11 12 9
12 9 16 12
8 14 6 16
7 4 8 15
Solution:
𝐻0 : There is no significant difference in the performance of students of the four
schools, i.e., 𝜇1 = 𝜇2 = 𝜇3 = 𝜇4
𝐻1 : There is a significant difference in the performance of students of the four
schools.
2 2
𝑥𝑖∗ 𝑥∗∗ 452 502 602 652 2202
𝐵𝑆𝑆 = ∑𝑖 − =( + + + )− = 50 with d.o.f= 𝑐 − 1 = 4 − 1 = 3
𝑛𝑖 𝑁 5 5 5 5 20
2
𝑥∗∗ 2202
2
𝑇𝑆𝑆 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 − = 82 + 102 + ⋯ + 82 + 152 − = 258 with d.o.f= 𝑁 − 1 =
𝑁 20
20 − 1 = 19
𝑊𝑆𝑆 = 𝑇𝑆𝑆 − 𝐵𝑆𝑆 = 258 − 50 = 208 with d.o.f = 𝑁 − 𝑐 = 20 − 4 = 16
BSS 50 3 50
= 16.67 16.67
3
𝐹𝑐𝑎𝑙𝑐 =
13 𝐹3,16 𝐹𝑐𝑎𝑙𝑐 < 𝐹𝑡𝑎𝑏 ,
TSS 258 19 = 1.28 = 3.24 Accept 𝐻0
(𝐹𝑡𝑎𝑏 )
Since 𝐹𝑐𝑎𝑙𝑐 < 𝐹𝑡𝑎𝑏 at 5% level of significance, we accept 𝐻0 , i.e., there is no significant
difference in the performance of students in 4 schools
2. Three samples below have been obtained from normal population with equal
variances. Test the hypothesis that the sample means are equal.
A B C
8 8 17
10 6 10
7 11 12
14 8 12
11 8 15
16 13 12
Solution:
𝐻0 : Sample means are equal, i.e., 𝜇1 = 𝜇2 = 𝜇3
𝐻1 : Sample means are not equal
2 2
𝑥𝑖∗ 𝑥∗∗ 662 542 782 1982
𝐵𝑆𝑆 = ∑𝑖 − =( + + )− = 48 with d.o.f= 𝑐 − 1 = 2
𝑛𝑖 𝑁 6 6 6 18
2
𝑥∗∗ 1982
2
𝑇𝑆𝑆 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 − = 82 + 82 + ⋯ + 132 + 122 − = 172 with d.o.f= 𝑁 − 1 =
𝑁 18
17
𝑊𝑆𝑆 = 𝑇𝑆𝑆 − 𝐵𝑆𝑆 = 124 with d.o.f = 𝑁 − 𝑐 = 18 − 3 = 15
BSS 48 2 48
= 24 24
2
𝐹𝑐𝑎𝑙𝑐 =
8.27 𝐹2,15 𝐹𝑐𝑎𝑙𝑐 < 𝐹𝑡𝑎𝑏 ,
TSS 172 17 = 2.90 = 3.68 Accept 𝐻0
(𝐹𝑡𝑎𝑏 )
Since 𝐹𝑐𝑎𝑙𝑐 < 𝐹𝑡𝑎𝑏 at 5% level of significance, we accept 𝐻0 , i.e., there is no significant
difference between the sample means.
3. Following are the lifetime of four batches of electric lamps. Perform an analysis of
variance of this data and test the homogeneity of four batches.
Solution:
𝐻0 : The four batches are homogeneous, i.e., 𝜇1 = 𝜇2 = 𝜇3 = 𝜇4
𝐻1 : The four batches are not homogeneous, i.e., atleast two are not homogeneous,
i.e., 𝜇𝑖 ≠ 𝜇𝑗 for atleast one 𝑖 and 𝑗
′ 𝑥𝑖 −𝑎 𝑥𝑖𝑗 −1600
Since 𝑥𝑖𝑗 𝑠 are big, we first find 𝑢𝑖𝑗 = =
𝑐 10
𝑥𝑖𝑗 − 1600
𝑢𝑖𝑗 =
Batch 10
1 0 1 5 10 12 20 8
2 -2 6 4 10 15
3 -14 -5 0 4 2 6 14 22
4 -9 -8 -7 -3 0 8
2 2
𝑢𝑖∗ 𝑢∗∗ 562 332 292 (−19)2 992
𝐵𝑆𝑆 = 𝑐 2 (∑𝑖 − ) = 100 [( + + + )− ] = 45413.02 with
𝑛𝑖 𝑁 7 5 8 6 26
d.o.f= 𝑐 − 1 = 3
2
𝑥∗∗
2
𝑇𝑆𝑆 = 𝑐 2 (∑𝑖 ∑𝑗 𝑥𝑖𝑗 − ) = 196203.8462 with d.o.f= 𝑁 − 1 = 25
𝑁
𝑊𝑆𝑆 = 𝑇𝑆𝑆 − 𝐵𝑆𝑆 = 150790.8262 with d.o.f = 𝑁 − 𝑐 = 22
Since 𝐹𝑐𝑎𝑙𝑐 < 𝐹𝑡𝑎𝑏 at 5% level of significance, we accept 𝐻0 , i.e., there is homogeneity
between four batches
4. 35 plots, approximately of equal quality were sold with different varieties of rices, 5
plots in each variety, the distribution of variety among the groups being random. The
following table gives the yield of rice of 7 grows corresponding to 7 different varieties.
Does the data indicate significant difference in the yield of varieties?
Plot→ 1 2 3 4 5
Variety
A 13.1 11.1 10.1 26.1 12.1
B 15.1 11.1 13.1 18.1 12.1
C 14.1 10.1 12.1 13.1 11.1
D 14.1 10.1 15.1 17.1 10.1
E 17.1 15.1 14.1 19.1 12.1
F 15.1 9.1 13.1 14.1 12.1
G 16.1 12.1 13.1 15.1 11.1
Solution:
Since 𝐹𝑐𝑎𝑙𝑐 < 𝐹𝑡𝑎𝑏 at 5% level of significance, we accept 𝐻0 , i.e., there is no significant
difference in the yield of varieties
5. The annual packages of students from different IIMs offering different specializations
are given by the following data
(i) Does the annual package significantly differ among the IIMs?
(ii) Does the annual package significantly differ across the specializations?
IIMs→ A B C D
Specializations
Marketing 9.4 8.4 8.5 10.9
Finance 10.6 8.8 11.3 9.8
System 8.6 10.5 9.9 10
Operations 11.2 10.6 11 9.3
Solution:
𝐻11 : There are atleast two IIMs with different annual packages
𝐻12 : There are atleast two specializations with different annual packages
2 2
𝑥∗𝑗 𝑥∗∗ 37.22 40.52 392 42.12 158.82
𝑅𝑆𝑆 = ∑𝑗 − =( + + + )− = 3.285 with d.o.f= 𝑟 − 1 = 3
𝑛𝑗 𝑁 4 4 4 4 16
2 2
𝑥𝑖∗ 𝑥∗∗ 39.82 38.32 40.72 402 158.82
𝐶𝑆𝑆 = ∑𝑖 − =( + + + )− = 0.765 with d.o.f= 𝑐 − 1 = 3
𝑛𝑖 𝑁 4 4 4 4 16
2
𝑥∗∗ 158.82
2
𝑇𝑆𝑆 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 − = 9.42 + 8.42 + ⋯ + 9.32 − = 14.93 with d.o.f= 𝑁 − 1 =
𝑁 16
15
𝐹3,9 = 3.86
ESS 10.88 9 10.88 Accept 𝐻02
= 1.209
9
Since 𝐹𝑐𝑎𝑙𝑐 < 𝐹𝑡𝑎𝑏 at 5% level of significance, we accept 𝐻01 and 𝐻02 , i.e., annual package
in different IIMs and for different specializations are equal
6. A tea company appoints four salesmen 𝐴, 𝐵, 𝐶, 𝐷 and observes their sales in three
seasons – Summer, Winter and Monsoons. The details are given below
Salesman → A B C D
Seasons
Summer 36 36 21 35
Winter 28 29 31 32
Monsoon 26 28 29 29
(i) Do the salesmen significantly differ in performance?
(ii) Is there significant difference between the seasons?
Solution:
2 2
𝑥𝑖∗ 𝑥∗∗ 902 932 812 962 3602
𝐶𝑆𝑆 = ∑𝑖 − =( + + + )− = 42 with d.o.f= 𝑐 − 1 = 3
𝑛𝑖 𝑁 3 3 3 3 12
2
𝑥∗∗ 3602
2
𝑇𝑆𝑆 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 − = 362 + 362 + ⋯ + 292 − = 210 with d.o.f= 𝑁 − 1 = 11
𝑁 12
RSS 32 2 32
= 16 16
2
𝐹𝑅 = = 0.706
22.67 𝐹2,6 = 5.14
Accept 𝐻01
CSS 42 3 42
= 14
3
14
𝐹𝐶 = = 0.618
TSS 210 11 22.67
𝐹3,6 = 4.76
ESS 136 6 136 Accept 𝐻02
= 22.67
6
Since 𝐹𝑐𝑎𝑙𝑐 < 𝐹𝑡𝑎𝑏 at 5% level of significance, we accept 𝐻01 and 𝐻02 , i.e., there is no
significant difference in the performance of salesmen or seasons
7. Following table gives four different crops treated with three different fertilizers. Test at
1% level of significance whether
(i) there is significant difference in yield due to fertilizers
(ii) there is significant difference in yield due to crop
Crop→ 1 2 3 4
Fertilizer
1 4.5 6.4 7.2 6.7
2 8.6 7.8 9.6 7.0
3 5.9 6.8 5.7 6.7
Solution:
2 2
𝑥𝑖∗ 𝑥∗∗ 192 212 22.52 20.42 82.92
𝐶𝑆𝑆 = ∑𝑖 − =( + + + )− = 2.1025 with d.o.f= 𝑐 − 1 = 3
𝑛𝑖 𝑁 3 3 3 3 12
2
𝑥∗∗ 82.92
2
𝑇𝑆𝑆 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 − = 4.52 + 6.42 + ⋯ + 6.72 − = 19.6292 with d.o.f= 𝑁 −
𝑁 12
1 = 11
0.7008
TSS 19.6292 11 𝐹𝐶 =
1.1192
= 0.6262 𝐹3,6 = 9.78
Accept 𝐻02
ESS 6.715 6 6.715
= 1.1192
6
Since 𝐹𝑐𝑎𝑙𝑐 < 𝐹𝑡𝑎𝑏 at 1% level of significance, we accept 𝐻01 and 𝐻02 , i.e., there is no
significant difference in the yield due to fertilizers as well as crops