0% found this document useful (0 votes)
7 views

ANOVA

The document discusses the ANOVA (Analysis of Variance) technique, which separates variation due to assignable causes from chance causes using the F-test. It outlines the assumptions, hypotheses, and calculations involved in both one-way and two-way ANOVA, including the formulation of test statistics and ANOVA tables. Additionally, it provides examples of problems and solutions related to the significance of differences in means across multiple groups.

Uploaded by

khushpatel1222
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

ANOVA

The document discusses the ANOVA (Analysis of Variance) technique, which separates variation due to assignable causes from chance causes using the F-test. It outlines the assumptions, hypotheses, and calculations involved in both one-way and two-way ANOVA, including the formulation of test statistics and ANOVA tables. Additionally, it provides examples of problems and solutions related to the significance of differences in means across multiple groups.

Uploaded by

khushpatel1222
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

TEST FOR EQUALITY OF SEVERAL MEANS

The total variation present in a data can be classified into two:

(i) Variation due to assignable causes


(ii) Variation due to chance causes

ANOVA (Analysis of Variance) is a technique of separating variation due to one type of


assignable causes from the other. ANOVA makes use of F-test.

Assumptions

For the F-test to be applicable in ANOVA, the following assumptions are made

(i) The observations are independent


(ii) The populations where the observations are drawn from are normally distributed
with mean 0 and variance 𝜎 2
(iii) Variation due to assignable causes and chance causes are additive in nature

ONE-WAY ANOVA

A one-way ANOVA has one independent variable.

Let there be 𝑛 observations which are classified into 𝑘 distinct classes of respective sizes
𝑛1 , 𝑛2 , … , 𝑛𝑘 (∑𝑘𝑖=1 𝑛𝑖 = 𝑛). Let the observations be denoted by 𝑥𝑖𝑗 (𝑖 = 1,2, … , 𝑘 ; 𝑗 =
1,2, … , 𝑛𝑖 ). The total variation that is found in 𝑥𝑖𝑗 has two components:

(i) Variation between classes


(ii) Variation within classes

Assumptions

(i) The observations 𝑥𝑖𝑗 are independent and normally distributed with variance 𝜎 2
(ii) Different classes are additive in nature

Hypothesis

𝐻0 : 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑘

𝐻1 : Atleast one mean is different

Notations

TSS – Total Sum of Squares

WSS – Within Sum of Squares


BSS – Between Sum of Squares

𝑇𝑆𝑆 = 𝑊𝑆𝑆 + 𝐵𝑆𝑆

𝑥𝑖∗ = ∑ 𝑥𝑖𝑗
𝑗

𝑥∗∗ = ∑ ∑ 𝑥𝑖𝑗
𝑖 𝑗

Test statistic
𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
𝐹=
𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝑤𝑖𝑡ℎ𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒𝑠

ANOVA Table

Source Sum of Mean Table


of squares d.o.f sum of F ratio value Conclusion
variation squares
2
𝑥𝑖∗
BSS ∑ − 𝐶𝐹 𝑟−1 𝐵𝑆𝑆 𝐹𝑐𝑎𝑙𝑐 < 𝐹𝑡𝑎𝑏 ,
𝑖 𝑛𝑖
𝑟−1 𝐵𝑆𝑆 Accept 𝐻0
2
∑ ∑ 𝑥𝑖𝑗 𝐹 = −1
𝑟 𝐹𝑟−1,𝑁−𝑟
𝑊𝑆𝑆
TSS 𝑖 𝑗 𝑁−1 (𝐹𝑡𝑎𝑏 )
𝑁−𝑟
− 𝐶𝐹
𝐹𝑐𝑎𝑙𝑐 > 𝐹𝑡𝑎𝑏 ,
WSS 𝑇𝑆𝑆 − 𝐵𝑆𝑆 𝑁−𝑟 𝑊𝑆𝑆 Reject 𝐻0
𝑁−𝑟

2
𝑥∗∗
𝐶𝐹 (𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑖𝑜𝑛 𝐹𝑎𝑐𝑡𝑜𝑟) = where 𝑁 is total number of observations, 𝑟 is number of
𝑁
varieties/independent elements

TWO-WAY ANOVA

A two-way ANOVA has two independent variables, i.e., two factors of variation – along
rows and along columns.

Let there be 𝑟 rows and 𝑐 columns and the observations be denoted by 𝑥𝑖𝑗 , 𝑖 =
1,2, … , 𝑟 ; 𝑗 = 1,2, … , 𝑐
Assumptions

(i) Observations are independently and normally distributed with mean 0 and
variance 𝜎 2
(ii) Effects of causes are additive in nature

Hypothesis

𝐻01 : 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑟 , i.e., effects of all rows are equal

𝐻11 : Atleast one row effect is different

𝐻02 : 𝛽1 = 𝛽2 = ⋯ = 𝛽𝑟 , i.e., effects of all columns are equal

𝐻12 : Atleast one column effect is different

Notations

TSS – Total Sum of Squares

RSS – Row Sum of Squares

CSS – Column Sum of Squares

ESS – Error Sum of Squares/Within Sum of Squares

𝑇𝑆𝑆 = 𝑅𝑆𝑆 + 𝐶𝑆𝑆 + 𝐸𝑆𝑆

𝑥𝑖∗ = ∑ 𝑥𝑖𝑗
𝑗

𝑥∗𝑗 = ∑ 𝑥𝑖𝑗
𝑖

𝑥∗∗ = ∑ ∑ 𝑥𝑖𝑗
𝑖 𝑗
ANOVA Table

Source Sum of Mean sum of


of squares d.o.f squares F ratio Table value
variation
2
𝑥∗𝑗
RSS ∑ 𝑟−1 𝑅𝑆𝑆
𝑗 𝑛𝑗
− 𝐶𝐹 𝑟−1 𝐹𝑅
𝑅𝑆𝑆 𝐹𝑟−1,(𝑐−1)(𝑟−1))
2
𝑥𝑖∗ 𝑟−1
∑ − 𝐶𝐹 =
CSS 𝑖 𝑛𝑖
𝑐−1 𝐶𝑆𝑆 𝐸𝑆𝑆
𝑐−1 (𝑐 − 1)(𝑟 − 1)

2
∑ ∑ 𝑥𝑖𝑗
TSS 𝑖 𝑗 𝑁−1
− 𝐶𝐹 𝐹𝐶
𝐶𝑆𝑆 𝐹𝑐−1,(𝑐−1)(𝑟−1))
ESS 𝑇𝑆𝑆 − 𝑅𝑆𝑆 (𝑐 − 1)(𝑟 𝐸𝑆𝑆 = 𝑐−1
− 𝐶𝑆𝑆 − 1) 𝐸𝑆𝑆
(𝑐 − 1)(𝑟 − 1) (𝑐 − 1)(𝑟 − 1)
2
𝑥∗∗
𝐶𝐹 (𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑖𝑜𝑛 𝐹𝑎𝑐𝑡𝑜𝑟) = where 𝑁 is total number of observations, 𝑟 is number of
𝑁
rows and 𝑐 is number of columns.

Problems

1. To assess the significance of possible variation in performance in a certain test


between the convent schools in a city, a common test was given to a number of
students taken at random from the senior fifth class of each of the four schools
concerned. The results are given below. Make an analysis of variance for the data.

School→ A B C D
Marks
8 12 18 13
10 11 12 9
12 9 16 12
8 14 6 16
7 4 8 15
Solution:
𝐻0 : There is no significant difference in the performance of students of the four
schools, i.e., 𝜇1 = 𝜇2 = 𝜇3 = 𝜇4
𝐻1 : There is a significant difference in the performance of students of the four
schools.
2 2
𝑥𝑖∗ 𝑥∗∗ 452 502 602 652 2202
𝐵𝑆𝑆 = ∑𝑖 − =( + + + )− = 50 with d.o.f= 𝑐 − 1 = 4 − 1 = 3
𝑛𝑖 𝑁 5 5 5 5 20
2
𝑥∗∗ 2202
2
𝑇𝑆𝑆 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 − = 82 + 102 + ⋯ + 82 + 152 − = 258 with d.o.f= 𝑁 − 1 =
𝑁 20
20 − 1 = 19
𝑊𝑆𝑆 = 𝑇𝑆𝑆 − 𝐵𝑆𝑆 = 258 − 50 = 208 with d.o.f = 𝑁 − 𝑐 = 20 − 4 = 16

Source Sum of Mean sum of Table


of squares d.o.f squares F ratio value Conclusion
variation

BSS 50 3 50
= 16.67 16.67
3
𝐹𝑐𝑎𝑙𝑐 =
13 𝐹3,16 𝐹𝑐𝑎𝑙𝑐 < 𝐹𝑡𝑎𝑏 ,
TSS 258 19 = 1.28 = 3.24 Accept 𝐻0
(𝐹𝑡𝑎𝑏 )

WSS 208 16 208


= 13
16

Since 𝐹𝑐𝑎𝑙𝑐 < 𝐹𝑡𝑎𝑏 at 5% level of significance, we accept 𝐻0 , i.e., there is no significant
difference in the performance of students in 4 schools

2. Three samples below have been obtained from normal population with equal
variances. Test the hypothesis that the sample means are equal.

A B C
8 8 17
10 6 10
7 11 12
14 8 12
11 8 15
16 13 12

Solution:
𝐻0 : Sample means are equal, i.e., 𝜇1 = 𝜇2 = 𝜇3
𝐻1 : Sample means are not equal
2 2
𝑥𝑖∗ 𝑥∗∗ 662 542 782 1982
𝐵𝑆𝑆 = ∑𝑖 − =( + + )− = 48 with d.o.f= 𝑐 − 1 = 2
𝑛𝑖 𝑁 6 6 6 18
2
𝑥∗∗ 1982
2
𝑇𝑆𝑆 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 − = 82 + 82 + ⋯ + 132 + 122 − = 172 with d.o.f= 𝑁 − 1 =
𝑁 18
17
𝑊𝑆𝑆 = 𝑇𝑆𝑆 − 𝐵𝑆𝑆 = 124 with d.o.f = 𝑁 − 𝑐 = 18 − 3 = 15

Source Sum of Mean sum Table


of squares d.o.f of squares F ratio value Conclusion
variation

BSS 48 2 48
= 24 24
2
𝐹𝑐𝑎𝑙𝑐 =
8.27 𝐹2,15 𝐹𝑐𝑎𝑙𝑐 < 𝐹𝑡𝑎𝑏 ,
TSS 172 17 = 2.90 = 3.68 Accept 𝐻0
(𝐹𝑡𝑎𝑏 )

WSS 124 15 124


= 8.27
15

Since 𝐹𝑐𝑎𝑙𝑐 < 𝐹𝑡𝑎𝑏 at 5% level of significance, we accept 𝐻0 , i.e., there is no significant
difference between the sample means.

3. Following are the lifetime of four batches of electric lamps. Perform an analysis of
variance of this data and test the homogeneity of four batches.

1 1600 1610 1650 1700 1720 1800 1680


2 1580 1660 1640 1700 1750
3 1460 1550 1600 1640 1620 1660 1740 1820
4 1510 1520 1530 1570 1600 1680

Solution:
𝐻0 : The four batches are homogeneous, i.e., 𝜇1 = 𝜇2 = 𝜇3 = 𝜇4
𝐻1 : The four batches are not homogeneous, i.e., atleast two are not homogeneous,
i.e., 𝜇𝑖 ≠ 𝜇𝑗 for atleast one 𝑖 and 𝑗
′ 𝑥𝑖 −𝑎 𝑥𝑖𝑗 −1600
Since 𝑥𝑖𝑗 𝑠 are big, we first find 𝑢𝑖𝑗 = =
𝑐 10
𝑥𝑖𝑗 − 1600
𝑢𝑖𝑗 =
Batch 10

1 0 1 5 10 12 20 8
2 -2 6 4 10 15
3 -14 -5 0 4 2 6 14 22
4 -9 -8 -7 -3 0 8

2 2
𝑢𝑖∗ 𝑢∗∗ 562 332 292 (−19)2 992
𝐵𝑆𝑆 = 𝑐 2 (∑𝑖 − ) = 100 [( + + + )− ] = 45413.02 with
𝑛𝑖 𝑁 7 5 8 6 26
d.o.f= 𝑐 − 1 = 3
2
𝑥∗∗
2
𝑇𝑆𝑆 = 𝑐 2 (∑𝑖 ∑𝑗 𝑥𝑖𝑗 − ) = 196203.8462 with d.o.f= 𝑁 − 1 = 25
𝑁
𝑊𝑆𝑆 = 𝑇𝑆𝑆 − 𝐵𝑆𝑆 = 150790.8262 with d.o.f = 𝑁 − 𝑐 = 22

Source Sum of Mean sum Table


of squares d.o.f of squares F ratio value Conclusion
variation

BSS 45413.02 3 45413.02


3 𝐹𝑐𝑎𝑙𝑐
= 15137.67 15137.67 𝐹3,22 𝐹𝑐𝑎𝑙𝑐 < 𝐹𝑡𝑎𝑏 ,
=
6854.13 = 3.05 Accept 𝐻0
TSS 196203.8462 25 = 2.208 (𝐹𝑡𝑎𝑏 )

WSS 150790.8262 22 150790.8262


22
= 6854.13

Since 𝐹𝑐𝑎𝑙𝑐 < 𝐹𝑡𝑎𝑏 at 5% level of significance, we accept 𝐻0 , i.e., there is homogeneity
between four batches

4. 35 plots, approximately of equal quality were sold with different varieties of rices, 5
plots in each variety, the distribution of variety among the groups being random. The
following table gives the yield of rice of 7 grows corresponding to 7 different varieties.
Does the data indicate significant difference in the yield of varieties?
Plot→ 1 2 3 4 5
Variety
A 13.1 11.1 10.1 26.1 12.1
B 15.1 11.1 13.1 18.1 12.1
C 14.1 10.1 12.1 13.1 11.1
D 14.1 10.1 15.1 17.1 10.1
E 17.1 15.1 14.1 19.1 12.1
F 15.1 9.1 13.1 14.1 12.1
G 16.1 12.1 13.1 15.1 11.1

Solution:

𝐻0 : There is no significant difference in the yield of varieties, i.e., 𝜇1 = 𝜇2 = ⋯ = 𝜇7


𝐻1 : There is significant difference in the yield of varieties
(Note: Since it is to check for significant difference in the yield of varieties, we need
to consider among A,B,C… i.e.row wise)
2 2
𝑥𝑖∗ 𝑥∗∗ 72.52 69.52 60.52 66.52 77.52 63.52 67.52 477.52
𝐵𝑆𝑆 = ∑𝑖 − =( + + + + + + )− =
𝑛𝑖 𝑁 5 5 5 5 5 5 5 35
38.29 with d.o.f= 7 − 1 = 6
2
𝑥∗∗
2
𝑇𝑆𝑆 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 − = 6873.15 − 6514.46 = 358.69 with d.o.f= 𝑁 − 1 = 34
𝑁
𝑊𝑆𝑆 = 𝑇𝑆𝑆 − 𝐵𝑆𝑆 = 320.4 with d.o.f = 35 − 7 = 28

Source Sum of Mean sum Table


of squares d.o.f of squares F ratio value Conclusion
variation

BSS 38.29 6 38.29


6 𝐹𝑐𝑎𝑙𝑐
= 6.38 6.38 𝐹6,28 𝐹𝑐𝑎𝑙𝑐 < 𝐹𝑡𝑎𝑏 ,
=
11.44 = 2.45 Accept 𝐻0
TSS 358.69 34 = 0.56 (𝐹𝑡𝑎𝑏 )

WSS 320.4 28 320.4


28
= 11.44

Since 𝐹𝑐𝑎𝑙𝑐 < 𝐹𝑡𝑎𝑏 at 5% level of significance, we accept 𝐻0 , i.e., there is no significant
difference in the yield of varieties
5. The annual packages of students from different IIMs offering different specializations
are given by the following data
(i) Does the annual package significantly differ among the IIMs?
(ii) Does the annual package significantly differ across the specializations?

IIMs→ A B C D
Specializations
Marketing 9.4 8.4 8.5 10.9
Finance 10.6 8.8 11.3 9.8
System 8.6 10.5 9.9 10
Operations 11.2 10.6 11 9.3

Solution:

𝐻01 : Annual package in different IIMs are equal, i.e., 𝜇1 = 𝜇2 = 𝜇3 = 𝜇4

𝐻11 : There are atleast two IIMs with different annual packages

𝐻02 : Annual package for different specializations are equal, i.e., 𝛽1 = 𝛽2 = 𝛽3 = 𝛽4

𝐻12 : There are atleast two specializations with different annual packages
2 2
𝑥∗𝑗 𝑥∗∗ 37.22 40.52 392 42.12 158.82
𝑅𝑆𝑆 = ∑𝑗 − =( + + + )− = 3.285 with d.o.f= 𝑟 − 1 = 3
𝑛𝑗 𝑁 4 4 4 4 16

2 2
𝑥𝑖∗ 𝑥∗∗ 39.82 38.32 40.72 402 158.82
𝐶𝑆𝑆 = ∑𝑖 − =( + + + )− = 0.765 with d.o.f= 𝑐 − 1 = 3
𝑛𝑖 𝑁 4 4 4 4 16

2
𝑥∗∗ 158.82
2
𝑇𝑆𝑆 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 − = 9.42 + 8.42 + ⋯ + 9.32 − = 14.93 with d.o.f= 𝑁 − 1 =
𝑁 16
15

𝐸𝑆𝑆 = 𝑇𝑆𝑆 − 𝑅𝑆𝑆 − 𝐶𝑆𝑆 = 10.88 with d.o.f= (𝑐 − 1)(𝑟 − 1) = 9

Source Sum of Mean sum of


of squares d.o.f squares F ratio Table value
variation

RSS 3.285 3 3.285


= 1.095 1.095
3
𝐹𝑅 = = 0.906
1.209 𝐹3,9 = 3.86
Accept 𝐻01
CSS 0.765 3 0.765
= 0.255
3
0.255
𝐹𝐶 = = 0.211
TSS 14.93 15 1.209

𝐹3,9 = 3.86
ESS 10.88 9 10.88 Accept 𝐻02
= 1.209
9

Since 𝐹𝑐𝑎𝑙𝑐 < 𝐹𝑡𝑎𝑏 at 5% level of significance, we accept 𝐻01 and 𝐻02 , i.e., annual package
in different IIMs and for different specializations are equal

6. A tea company appoints four salesmen 𝐴, 𝐵, 𝐶, 𝐷 and observes their sales in three
seasons – Summer, Winter and Monsoons. The details are given below

Salesman → A B C D
Seasons
Summer 36 36 21 35
Winter 28 29 31 32
Monsoon 26 28 29 29
(i) Do the salesmen significantly differ in performance?
(ii) Is there significant difference between the seasons?

Solution:

𝐻01 : There is no significant difference in the performance of salesmen, i.e., 𝜇1 = 𝜇2 =


𝜇3 = 𝜇4

𝐻11 : There is significant difference in the performance of atleast two salesmen

𝐻02 : There is no significant difference between the seasons, i.e., 𝛽1 = 𝛽2 = 𝛽3

𝐻12 : There is significant difference between the seasons


2 2
𝑥∗𝑗 𝑥∗∗ 1282 1202 1122 3602
𝑅𝑆𝑆 = ∑𝑗 − =( + + )− = 32 with d.o.f= 𝑟 − 1 = 2
𝑛𝑗 𝑁 4 4 4 12

2 2
𝑥𝑖∗ 𝑥∗∗ 902 932 812 962 3602
𝐶𝑆𝑆 = ∑𝑖 − =( + + + )− = 42 with d.o.f= 𝑐 − 1 = 3
𝑛𝑖 𝑁 3 3 3 3 12

2
𝑥∗∗ 3602
2
𝑇𝑆𝑆 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 − = 362 + 362 + ⋯ + 292 − = 210 with d.o.f= 𝑁 − 1 = 11
𝑁 12

𝐸𝑆𝑆 = 𝑇𝑆𝑆 − 𝑅𝑆𝑆 − 𝐶𝑆𝑆 = 136 with d.o.f= (𝑐 − 1)(𝑟 − 1) = 6


Source Sum of Mean sum of
of squares d.o.f squares F ratio Table value
variation

RSS 32 2 32
= 16 16
2
𝐹𝑅 = = 0.706
22.67 𝐹2,6 = 5.14
Accept 𝐻01
CSS 42 3 42
= 14
3
14
𝐹𝐶 = = 0.618
TSS 210 11 22.67

𝐹3,6 = 4.76
ESS 136 6 136 Accept 𝐻02
= 22.67
6

Since 𝐹𝑐𝑎𝑙𝑐 < 𝐹𝑡𝑎𝑏 at 5% level of significance, we accept 𝐻01 and 𝐻02 , i.e., there is no
significant difference in the performance of salesmen or seasons

7. Following table gives four different crops treated with three different fertilizers. Test at
1% level of significance whether
(i) there is significant difference in yield due to fertilizers
(ii) there is significant difference in yield due to crop

Crop→ 1 2 3 4
Fertilizer
1 4.5 6.4 7.2 6.7
2 8.6 7.8 9.6 7.0
3 5.9 6.8 5.7 6.7

Solution:

𝐻01 : There is no significant difference in yield due to fertilizer, i.e., 𝜇1 = 𝜇2 = 𝜇3

𝐻11 : There is significant difference in yield due to fertilizer

𝐻02 : There is no significant difference in yield due to crop, i.e., 𝛽1 = 𝛽2 = 𝛽3 = 𝛽4


𝐻12 : There is significant difference in yield due to drop
2 2
𝑥∗𝑗 𝑥∗∗ 24.82 332 25.12 82.92
𝑅𝑆𝑆 = ∑𝑗 − =( + + )− = 10.8117 with d.o.f= 𝑟 − 1 = 2
𝑛𝑗 𝑁 4 4 4 12

2 2
𝑥𝑖∗ 𝑥∗∗ 192 212 22.52 20.42 82.92
𝐶𝑆𝑆 = ∑𝑖 − =( + + + )− = 2.1025 with d.o.f= 𝑐 − 1 = 3
𝑛𝑖 𝑁 3 3 3 3 12

2
𝑥∗∗ 82.92
2
𝑇𝑆𝑆 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 − = 4.52 + 6.42 + ⋯ + 6.72 − = 19.6292 with d.o.f= 𝑁 −
𝑁 12
1 = 11

𝐸𝑆𝑆 = 𝑇𝑆𝑆 − 𝑅𝑆𝑆 − 𝐶𝑆𝑆 = 6.715 with d.o.f= (𝑐 − 1)(𝑟 − 1) = 6

Source Sum of Mean sum of


of squares d.o.f squares F ratio Table value
variation

RSS 10.8117 2 10.8117


2 5.4059
= 5.4059 𝐹𝑅 =
1.1192 𝐹2,6 = 10.93
= 4.8301 Accept 𝐻01
CSS 2.1025 3 2.1025
= 0.7008
3

0.7008
TSS 19.6292 11 𝐹𝐶 =
1.1192
= 0.6262 𝐹3,6 = 9.78
Accept 𝐻02
ESS 6.715 6 6.715
= 1.1192
6

Since 𝐹𝑐𝑎𝑙𝑐 < 𝐹𝑡𝑎𝑏 at 1% level of significance, we accept 𝐻01 and 𝐻02 , i.e., there is no
significant difference in the yield due to fertilizers as well as crops

You might also like