ANOVA
ANOVA
In Analysis of Variance our object is “to test on the basis of sample observations whether the
means of three populations are equal or not”.
The technique of Analysis of variance split up the variance into two components:
1. Variance between the samples
2. Variance within the samples
On-way Classification
In this method the observations are classified according to one factor or criterion. Let us
formulate the required null hypothesis H 0 regarding population mean from which samples are
drawn.
Variance Between Samples or Groups
It is the sum of squares of deviations of the means of various samples or groups taken from
the grand mean. To calculate the variance between the samples, follow the steps:
(i) Calculate the sample means X 1 , X 2 , X 3 , X 4 ..........X k of all k samples.
X X 2 X 3 ....X k T2
(ii) Calculate the grand mean X by formula X 1 or , where T is
k N
grand total of all observations and N represents total number of observations in all
samples X 1 , X 2 , X 3 , X 4 ..........X k
(iii) Find the square of deviations of the means of the different samples from the grand
mean X and multiply by number of items in the corresponding sample will give
SSB which is expressed by the formula SSB ni ( X i X ) 2 .
Mean Square Between Samples (MSB) SSB , where 1 denotes degrees of freedom and
1
given by 1 k 1 .
Variance within Samples
Variance within samples is the total of the sum of the squares of the deviations of the various
items from the mean values of the respective sample and given by formula:
SSW ( X 1 X1 ) 2 ( X 2 X 2 )2 ( X 3 X 3 ) 2 ............ ( X k X k )2 .
Mean Square Within Samples (MSW) SSW , where 2 denotes degrees of freedom and given
2
MSB
by 2 N k . Calculation of Test Statistic F which follows F distribution with
MSW
degrees of freedom 1 k 1 and 2 N k .
ANOVA Table
Source of Variation Sum of the Degrees of Mean Square Test Statistic
Squares (SS) Freedom (df) (MS)
Between Samples SSB k-1 SSB
MSB
k 1
MSB
F
SSW
Within Samples SSW N-k SSW
MSW
N k
Total SST N-1
Is there any significant difference in the average yields on 12 plots of lands in three samples,
each of 04 plots under 03 fertilizers A, B and C (Given F.05 4.26 for 1 2 , 2 9 )
Solution. Let us take the null hypothesis that there is no significant difference in the average
yield under the variety of three fertilizers.
The calculation of sample means, the variance between and within samples is give below:
25 20 24
22 17 26
24 16 30
21 19 20
X 1 92 X 2 72 X 3 100
__
X1
X 1
92 __
23 , X 2
X 2
72 __
18 , X 3
X 3
100
25
n1 4 n2 4 n3 4
__ X 1 X 2 X 3 66
Grand Mean X 22
3 3
Sum of the squares between samples (SSB) ni ( X i X ) 2
n1 ( X1 X ) 2 n2 ( X 2 X ) 2 n3 ( X 3 X ) 2
4(23 22)2 4(18 22) 4(25 22) 104
1 Degrees of freedom k 1 3 1 2
Mean square between samples (MSB) SSB 104 52 .
1 2
Calculation of SSW
Sample I Sample II Sample III
X1 X2 X3
( X1 X 1 ) 2 ( X 2 X 2 )2 ( X 3 X 3 )2
25 20 24
04 04 01
22 17 01
26 01
01
24 01
16 04 30 25
21 04 19 01 20 25
(X (X ( X
Total X 1 ) 2 10
-- X 2 ) 2 10
-- X 3 ) 2 52
1 2 3
Sum of the squares within samples (SSW) ( X1 X1)2 ( X 2 X 2 )2 ( X 3 X 3 )2
10 10 52 72
2 Degrees of freedom N k 12 3 9
The calculated value of F is greater than the table value. Hence, the hypothesis is rejected at
5% level of significance and conclude that the difference in the average yields under three
varieties of fertilizers is significant.
T 2 (264) 2
Correction factor 5808
N 12
T2
Total Sum of Squares (SST) X 12 X 22 X 32 2126 1306 2552 5808 176
N
The calculated value of F is less than the table value. Hence, the hypothesis is accepted at 5%
level of significance and conclude that the difference in the average values of the sample is not
significant i.e. the samples have come from the same population.
Remark. If the values of the given observations are large (more than 30) then we reduce these
observations by using the basic operations of arithmetic. Such data are known as coded data
explained in the following examples.
Example. Three samples of five, four and five motor car tyres drawn respectively from three
brands P, Q and R manufactured by three machines. The life-time of these tyres (in 000’ miles)
is given below. Test whether the average life-time of the three brands are equal or not (Given
F.05 3.98 for 1 2 , 2 11 ).
P Q R
45 41 41
42 40 42
43 42 38
44 43 43
42 39
Solution. Let us take the null hypothesis that the average life-time of three brands of tyres are
equal. Subtracting 40 from each of the given values, the coded data are given below:
MSB
SSB
11.4
5.7 and MSW SSW 38.6 3.51
1 2 2 11
(ANOVA) Table
Source of Sum of the Degrees of Mean Square Test Statistic
Variation Squares (SS) Freedom (df) (MS)
Between 11.4 2 5.7
Samples
5.7
F 1.624
3.51
Within 38.6 11 3.51
Samples
Total SST =50 N-1= 13
The calculated value of F is less than the table value. Hence, the hypothesis is accepted at 5%
level of significance and conclude that the average life-time of the three brands of tyres are
equal.
Example. The Amrit Merchandising Co. wishes to test whether its three salesmen A, B and C
ten to make sales of the same size or whether they differ in their selling ability as measured
by the average size of their sale. During the last week there have been 14 sale calls; A made
05 calls, B made 04 calls and C made 05 calls. Following are weekly sales records (in Rs.) of
the three salesmen:
MSB
SSB
10
5 and MSW SSW 30 2.73
1 2 2 11
(ANOVA) Table
Source of Sum of the Degrees of Mean Square Test Statistic
Variation Squares (SS) Freedom (df) (MS)
Between 10 2 5.7
Samples
5
F 1.83
2.73
Within 30 11 3.51
Samples
Total SST =40 N-1= 13
The calculated value of F is less than the table value. Hence, the hypothesis is accepted at 5%
level of significance and conclude that the three salesmen tend to make sales of the same size.
Example. An experimenter wishes to study the effect of four fertilizers on the yield of a crop.
He divided the field into 24 plots and assigned each fertilizer at random to 06 plots. Part of
his calculation are shown below:
Source d. f. SS MS F F.05
Between groups … 2940 …
… 3.10
Within groups … … …
Total … 6212
(ii) We can observe that the calculated value of F is 5.99 which is greater that table value
3.10 of F at 5%leve of significance with 1 2 and 2 20 . Hence, we conclude that the
fertilizers differ significantly.
Exercise
1.Three samples below have been obtained from normal population with equal variances.
Test the hypothesis that the sample means are equal.
Brands Groups
A B C D
I 00 4 08 10
II 05 08 13 06
III 18 19 11 13
Is there any significant difference in brands performance? The table value of F.05 3.88 for
1 2 , 2 9 .
Two-Way Classification
In two-way classification statistical data are classified according to two different factors or
criterion. For example
1. The yield of a crop in several plots of land may be classified according to different
varieties of seeds and fertilizers.
2. The petrol mileage may be affected by the type of car driven, the way it is driven,
road conditions.
3. The sales of cosmetics, in addition to being affected by the point of sale display, price
charged, the size or location of the store, the number of competitive products sold by
store.
ANOVA table
Source of variation Sum of d. f . Mean Sum F
Squares of Squares
Between columns SSC (c 1) SSC MSC
MSC
(c 1) MSE
Between rows SSR (r 1) SSR MSR
MSR
(r 1) MSE
Residual or error SSE (c 1) (r 1) SSE
MSE
(c 1) (r 1)
Total SST ( N 1)
Where SSC = Sum of the squares between columns
SSR = Sum of the squares between rows
SSE = Sum of the squares due to error
SST = Total sum of squares
Example. A company appoints four salesmen A, B, C and D and observes their sales in three
seasons, summer, winter and monsoon. The figures (in lakhs) are given in the following
table:
T 2 (0) 2
Correction factor 0
N 12
Sum of the Squares Between Salesmen (SSC)
( X ) 1
2
( X 2)
2
( X 3)
2
( X 4)
2
T2
n1 n2 n3 n4 N
(0) 2 (3) 2 (9) 2 (6) 2
0 42
3 3 3 3
d. f . c 1 4 1 3
The table value of F.05 8.94 for 1 6 , 2 3 is less than calculated value of F , we conclude
that the sales of different salesmen do not differ significantly.
The table value of F.05 8.94 for 1 6 , 2 2 is less than calculated value of F , we conclude
that there is no significant difference in the seasons as far the sales are concerned.
Thus, the test shows that the salesmen and seasons are alike so for as the sales are concerned.
Example. The following data represent the number of units of production per day turned out
by 5different workers using 4 different types of machines:
Workers Machines
A B C D
I 44 38 47
II 36
III 46 40 52
IV 43
V 34 36 44
32
43 38 45
33
38 42 49
39
(a) Test whether the mean productivity is same for the different machines.
(b) Test whether 5 men differ with respect to mean productivity.(Given F.05 3.49 for
1 12 , 2 3 , F.05 3.26 for 1 12 , 2 5 )
Solution. Let us take the null hypotheses that (a) the mean productivity is same for four
different machines and (b)the 5 men do not differ with respect to mean productivity.
To simplify calculations let us subtract each value by 40. The coded data is given below:
T2
Correction factor 20
N
(5) 2 (6) 2 (38) 2 (17 2
Sum of the Squares Between Machines 20 338.8
5 5 5 5
d. f . 3
(5) 2 (21) 2 (14) 2 (0) 2 (8) 2
Sum of the Squares between workers 20 161.5
5 5 5 5 5
d. f . 4
Total Sum of squares (SST) = adding the square of all items in the table and subtracting the
correction factor = 574
d . f . N 1 20 1 19
SSE SST (SSC SSR) 574 (338.8 161.5) 73.7
d . f . 12 .
The above information can be presented in following ANOVA table