0% found this document useful (0 votes)
59 views13 pages

ANOVA

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views13 pages

ANOVA

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Analysis of Variance (ANOVA)

In Analysis of Variance our object is “to test on the basis of sample observations whether the
means of three populations are equal or not”.
The technique of Analysis of variance split up the variance into two components:
1. Variance between the samples
2. Variance within the samples

Assumptions in Analysis of Variance


ANOVA is based on the following assumptions:
1. The samples are independently (randomly) drawn from the populations
2. All the populations from which samples have been drawn are normally distributed.
3. The variance of all populations is same.

Technique of Analysis of Variance


1. One-way classification 2. Two-way classification

On-way Classification
In this method the observations are classified according to one factor or criterion. Let us
formulate the required null hypothesis H 0 regarding population mean from which samples are
drawn.
Variance Between Samples or Groups
It is the sum of squares of deviations of the means of various samples or groups taken from
the grand mean. To calculate the variance between the samples, follow the steps:
    
(i) Calculate the sample means X 1 , X 2 , X 3 , X 4 ..........X k of all k samples.
   
  X  X 2  X 3  ....X k T2
(ii) Calculate the grand mean X by formula X 1 or , where T is
k N
grand total of all observations and N represents total number of observations in all
    
samples X 1 , X 2 , X 3 , X 4 ..........X k
(iii) Find the square of deviations of the means of the different samples from the grand

mean X and multiply by number of items in the corresponding sample will give
 
SSB which is expressed by the formula SSB   ni ( X i  X ) 2 .
Mean Square Between Samples (MSB)  SSB , where  1 denotes degrees of freedom and
1
given by  1  k  1 .
Variance within Samples
Variance within samples is the total of the sum of the squares of the deviations of the various
items from the mean values of the respective sample and given by formula:
   
SSW  ( X 1  X1 ) 2  ( X 2  X 2 )2  ( X 3  X 3 ) 2 ............  ( X k  X k )2 .
Mean Square Within Samples (MSW)  SSW , where  2 denotes degrees of freedom and given
2
MSB
by  2  N  k . Calculation of Test Statistic F  which follows F distribution with
MSW
degrees of freedom  1  k  1 and  2  N  k .
ANOVA Table
Source of Variation Sum of the Degrees of Mean Square Test Statistic
Squares (SS) Freedom (df) (MS)
Between Samples SSB k-1 SSB
MSB 
k 1
MSB
F
SSW
Within Samples SSW N-k SSW
MSW 
N k
Total SST N-1

Example. Fill in the blanks in the following ANOVA table:

Source of Variation Sum of the Degrees of Mean Square Test Statistic


Squares Freedom (df) (MS)
(SS)
Between Samples ? 2 5
F ?
Within Samples 14 ? ?
Total ? 9

Solution. d . f .  (within samples)  9 - 2  7 i.e. k  1  2 and N  k  7 .


SSB SSB SSW
Now MSB  or 5   SSB  10 and MSW  2
k 1 2 N k
MSB
Total SS  SSB  SSW  24 and F   2.5
MSW
The required ANOVA table is given below:

Source of Variation Sum of the Degrees of Mean Square Test Statistic


Squares (SS) Freedom (df) (MS)
Between Samples 10 2 5
F  2.5
Within Samples 14 7 2
Total 24 9
Example. The following data give the yields on 12 plots of land in 03 samples, each of 04
plots under 03 varieties of fertilizers A, B and C
A B C
25 20 24
22 17 26
24 16 30
21 19 20

Is there any significant difference in the average yields on 12 plots of lands in three samples,
each of 04 plots under 03 fertilizers A, B and C (Given F.05  4.26 for  1  2 ,  2  9 )
Solution. Let us take the null hypothesis that there is no significant difference in the average
yield under the variety of three fertilizers.
The calculation of sample means, the variance between and within samples is give below:

Sample I ( X 1 ) Sample II ( X 2 ) Sample III ( X 3 )

25 20 24
22 17 26
24 16 30
21 19 20
X 1  92 X 2  72 X 3  100

__
X1 
X 1

92 __
 23 , X 2 
X 2

72 __
 18 , X 3 
X 3

100
 25
n1 4 n2 4 n3 4
  
__ X 1  X 2  X 3 66
Grand Mean X    22
3 3
 
Sum of the squares between samples (SSB)   ni ( X i  X ) 2
     
 n1 ( X1  X ) 2  n2 ( X 2  X ) 2  n3 ( X 3  X ) 2
 4(23  22)2  4(18  22)  4(25  22)  104
 1  Degrees of freedom  k  1  3  1  2
Mean square between samples (MSB)  SSB  104  52 .
1 2
Calculation of SSW
Sample I Sample II Sample III
X1  X2  X3 
( X1  X 1 ) 2 ( X 2  X 2 )2 ( X 3  X 3 )2
25 20 24
04 04 01
22 17 01
26 01
01
24 01
16 04 30 25
21 04 19 01 20 25
  
(X (X ( X
Total  X 1 ) 2  10
--  X 2 ) 2  10
--  X 3 ) 2  52
1 2 3

  
Sum of the squares within samples (SSW)   ( X1  X1)2   ( X 2  X 2 )2   ( X 3  X 3 )2
 10  10  52  72
 2  Degrees of freedom  N  k  12  3  9

Mean square within sample(MSW)  SSW  72  8


2 9
(ANOVA) Table
Source of Sum of the Degrees of Mean Square Test Statistic
Variation Squares (SS) Freedom (df) (MS)
Between 104 2 52
Samples
52
F  6.5
8
Within 72 9 8
Samples
Total SST =176 N-1= 11

The calculated value of F is greater than the table value. Hence, the hypothesis is rejected at
5% level of significance and conclude that the difference in the average yields under three
varieties of fertilizers is significant.

Short cut Method:

Sample I Sample II Sample III


X1 X 12 X2 X 22 X3 X 32
25 625 20 400 24 576
22 484 17 289 26 676
24 576 16 256 30 900
21 441 19 361 20 400
X 1  92 X 2
1  2126 X 2  72 X 2
2  1306 X 3  100 X 2
3  2552

T  Sum of all the values of three samples  X X X


1 2 3  92  72  100  264 .

T 2 (264) 2
Correction factor    5808
N 12
T2
Total Sum of Squares (SST)   X 12  X 22  X 32   2126  1306  2552  5808  176
N

Sum of the Squares Between Samples (SSB) 


( X ) 1
2

( X ) 2
2

( X ) 3
2

T2
n1 n2 n3 N
2 2
(92) (72) (100) 2
    5808  104
4 4 4
 1  Degrees of freedom  k  1  3  1  2
Sum of the Squares Within Samples (SSW)  SST  SSB  176  104  72
 2  Degrees of freedom  N  k  12  3  9
Example. To assess the significance of possible variation in performance in a certain test
between the grammar schools of a city, a common test was given to a number of students taken
at random from the senior fifth class of each of the four schools concerned. The results are
given below:
A B C D
08 12 18 13
10 11 12 09
12 09 16 12
08 14 06 16
07 04 08 15

Make an analysis of variance (Given F.05  3.24 for  1  3 ,  2  16 ).


Solution. Let us take the null hypothesis that there is no significant difference in the
performance of the students from all four schools.

Sample I Sample II Sample III Sample IV


X1 X 12 X2 X 22 X3 X 32 X4 X 42
08 64 12 144 18 324 13 169
10 100 11 121 12 144 09 81
12 144 09 81 16 256 12 144
08 64 14 196 06 36 16 256
07 49 04 16 08 64 15 225
45 421 50 558 60 824 65 875

T X X X X


1 2 3 4  220 .
T 2 (220) 2
Correction factor    2420
N 20
T2
Total Sum of Squares (SST)   X  X  X   X 
2
1  258
2
2
2
3
2
4
N

Sum of the Squares Between Samples (SSB) 


( X ) 1
2

( X 2)
2

( X 3)
2

( X 4)
2

T2
n1 n2 n3 n4 N
 50
 1  Degrees of freedom = 3
Sum of the Squares Within Samples (SSW)  SST  SSB  258  50  208
 2  Degrees of freedom  16
(ANOVA) Table
Source of Variation Sum of the Degrees of Mean Square Test Statistic
Squares (SS) Freedom (df) (MS)
Between Samples 50 3 16.6
16.6
F  1.27
13
Within Samples 208 16 13
Total SST =258 N-1= 19

The calculated value of F is less than the table value. Hence, the hypothesis is accepted at 5%
level of significance and conclude that the difference in the average values of the sample is not
significant i.e. the samples have come from the same population.
Remark. If the values of the given observations are large (more than 30) then we reduce these
observations by using the basic operations of arithmetic. Such data are known as coded data
explained in the following examples.
Example. Three samples of five, four and five motor car tyres drawn respectively from three
brands P, Q and R manufactured by three machines. The life-time of these tyres (in 000’ miles)
is given below. Test whether the average life-time of the three brands are equal or not (Given
F.05  3.98 for  1  2 ,  2  11 ).

P Q R
45 41 41
42 40 42
43 42 38
44 43 43
42 39

Solution. Let us take the null hypothesis that the average life-time of three brands of tyres are
equal. Subtracting 40 from each of the given values, the coded data are given below:

Sample I Sample II Sample III


X1 X 12 X2 X 22 X3 X 32
05 25 01 01 04 16
02 04 0 0 02 04
03 09 02 09 -2 04
04 16 03 03 09
02 -1 01
X 1  16 X 2
1  58 X 2 6 X 2
2  14 X 3 6 X 2
3  34

T  Sum of all the values of three samples   X1   X 2   X 3  16  6  6  28 .


T 2 (28)2
Correction factor    56
N 14
T2
Total Sum of Squares (SST)   X 12  X 22  X 32   58  14  34  56  50
N

Sum of the Squares Between Samples (SSB) 


( X ) 1
2

( X ) 2
2

( X ) 3
2

T2
n1 n2 n3 N
2 2 2
(16) (6) (6)
    56  11.4
5 4 5
1  Degrees of freedom  k  1  3 1  2
Sum of the Squares Within Samples (SSW)  SST  SSB  50 11.4  38.6
 2  Degrees of freedom  N  k  14  3  11

MSB 
SSB

11.4
 5.7 and MSW  SSW  38.6  3.51
1 2 2 11
(ANOVA) Table
Source of Sum of the Degrees of Mean Square Test Statistic
Variation Squares (SS) Freedom (df) (MS)
Between 11.4 2 5.7
Samples
5.7
F  1.624
3.51
Within 38.6 11 3.51
Samples
Total SST =50 N-1= 13

The calculated value of F is less than the table value. Hence, the hypothesis is accepted at 5%
level of significance and conclude that the average life-time of the three brands of tyres are
equal.
Example. The Amrit Merchandising Co. wishes to test whether its three salesmen A, B and C
ten to make sales of the same size or whether they differ in their selling ability as measured
by the average size of their sale. During the last week there have been 14 sale calls; A made
05 calls, B made 04 calls and C made 05 calls. Following are weekly sales records (in Rs.) of
the three salesmen:

A 300 400 300 500 0


B 600 300 300 400 …
C 700 3004 400 600 500
Perform analysis of variance and draw your conclusion (Given F.05  3.98 for  1  2 ,  2  11 ).
Solution. Let us take the null hypothesis that the three salesmen tend to make average sales of
the same size. Dividing each observation by the common factor 100then the coded data is given
below:

Sample I Sample II Sample III


X1 X 12 X2 X 22 X3 X 32
03 09 06 36 07 49
04 16 03 09 03 09
03 09 03 09 04 16
05 25 04 10 06 36
00 00 .. .. 05 25
X 1  15 X 2
1  59 X 2  16 X 2
2  70 X 3  25 X 2
3  135

T  Sum of all the values of three samples   X1   X 2   X 3  15  16  26  56 .


T 2 (56)2
Correction factor    224
N 14
T2
Total Sum of Squares (SST)   X 12  X 22  X 32   59  70  135  224  40
N

Sum of the Squares Between Samples (SSB) 


( X ) 1
2

( X ) 2
2

( X ) 3
2

T2
n1 n2 n3 N
(15) 2 (16) 2 (25) 2
    224  10
5 4 5
1  Degrees of freedom  k  1  3  1  2
Sum of the Squares Within Samples (SSW)  SST  SSB  40 10  30
 2  Degrees of freedom  N  k  14  3  11

MSB 
SSB

10
5 and MSW  SSW  30  2.73
1 2 2 11
(ANOVA) Table
Source of Sum of the Degrees of Mean Square Test Statistic
Variation Squares (SS) Freedom (df) (MS)
Between 10 2 5.7
Samples
5
F  1.83
2.73
Within 30 11 3.51
Samples
Total SST =40 N-1= 13

The calculated value of F is less than the table value. Hence, the hypothesis is accepted at 5%
level of significance and conclude that the three salesmen tend to make sales of the same size.
Example. An experimenter wishes to study the effect of four fertilizers on the yield of a crop.
He divided the field into 24 plots and assigned each fertilizer at random to 06 plots. Part of
his calculation are shown below:
Source d. f. SS MS F F.05
Between groups … 2940 …
… 3.10
Within groups … … …
Total … 6212

(i) Complete the above table by filling in the values marked …


(ii) Test at 5% level of significance to see whether the fertilizers differ significantly.

Solution. (i) Here N  total no. of observations =24, k  No. of samples = 4


Therefore, Total d . f .  N  1  24  1  23 , d . f . (between the group) = k  1  4  1  3
d . f . (within the group) = N  k  24  4  20
SSB = Sum of the squares between groups = 2940
SSW = Sum of the squares within groups =SST-SSB= 6212- 2940 = 3272
2940 3272 980
MSB   980 , MSW   163.6 and F   5.99
3 20 163.6
Complete ANOVA table
Source d. f. SS MS F F.05
Between groups 03 2940 980
5.99 3.10
Within groups 20 3272 163.6
Total 23 6212

(ii) We can observe that the calculated value of F is 5.99 which is greater that table value
3.10 of F at 5%leve of significance with  1  2 and  2  20 . Hence, we conclude that the
fertilizers differ significantly.
Exercise
1.Three samples below have been obtained from normal population with equal variances.
Test the hypothesis that the sample means are equal.

Sample I Sample II Sample III


08 07 12
10 05 09
07 10 13
14 09 12
11 09 14

The table value of F.05  3.88 for  1  2 ,  2  12 .


2. The following table gives the yields on 15 sample plots under three varieties of seeds. Find
out if the average yields of lands under different varieties show significant difference:
Sample A Sample B Sample C
20 18 25
21 20 28
23 17 22
16 15 28
20 22 32

The table value of F.05  3.88 for  1  2 ,  2  12 .


3. There are three main brands of a certain powder. A set of its 120 sales is examined and
found to be allocated among four groups A, B, C and D and brands I, II and III as shown
below:

Brands Groups
A B C D
I 00 4 08 10
II 05 08 13 06
III 18 19 11 13

Is there any significant difference in brands performance? The table value of F.05  3.88 for
1  2 ,  2  9 .

Two-Way Classification
In two-way classification statistical data are classified according to two different factors or
criterion. For example
1. The yield of a crop in several plots of land may be classified according to different
varieties of seeds and fertilizers.
2. The petrol mileage may be affected by the type of car driven, the way it is driven,
road conditions.
3. The sales of cosmetics, in addition to being affected by the point of sale display, price
charged, the size or location of the store, the number of competitive products sold by
store.

ANOVA table
Source of variation Sum of d. f . Mean Sum F
Squares of Squares
Between columns SSC (c  1) SSC MSC
MSC 
(c  1) MSE
Between rows SSR (r  1) SSR MSR
MSR 
(r  1) MSE
Residual or error SSE (c  1) (r  1) SSE
MSE 
(c  1) (r  1)
Total SST ( N  1)
Where SSC = Sum of the squares between columns
SSR = Sum of the squares between rows
SSE = Sum of the squares due to error
SST = Total sum of squares
Example. A company appoints four salesmen A, B, C and D and observes their sales in three
seasons, summer, winter and monsoon. The figures (in lakhs) are given in the following
table:

Seasons Salesmen Total


A B C D
Summer 36 36 21 35 128
Winter 28 29 31 32 120
Monsoon 26 28 29 29 112
Total 90 93 81 96 360

Carry out an analysis of variance.


Solution. Let us take the null hypotheses that there is no significant difference between
salesmen and seasons. To simplify calculations let us subtract each value by 30. The coded
data is given below:

Seasons Salesmen Total


A B C D
Summer 6 6 -9 8
Winter 5 0
Monsoon -2 -1 1 -8
2
-4 -2 -1
-1
Total 0 3 -9 0
6

T 2 (0) 2
Correction factor   0
N 12
Sum of the Squares Between Salesmen (SSC)


( X ) 1
2

( X 2)
2

( X 3)
2

( X 4)
2

T2
n1 n2 n3 n4 N
(0) 2 (3) 2 (9) 2 (6) 2
     0  42
3 3 3 3
d. f .  c  1  4  1  3

Sum of the Squares between seasons (SSR) 


( X ) 1
2

( X 2)
2

( X 3)
2

T2
n1 n2 n3 N
(8) 2
(0) (8) 2 2
    0  32
4 4 4
d. f .  r 1  3 1  2
Total Sum of squares (SST) = adding the square of all items in the table and subtracting the
correction factor = 36 + 4 + 16 + 36 + 1 + 4 + 81 + 1 + 1 + 25 + 4 + 1 - 0 = 210
d . f .  N  1  12  1  11
SSE  SST  (SSC  SSR)  210  74  136
d . f .  (c  1) (r  1)  6 .
The above information can be presented in following ANOVA table

Source of variation Sum of d. f . Mean Sum F


Squares of Squares
Between columns 42 3 14 1.62
Between rows 32 2 16 1.42
Residual or error 136 6 22.67
Total 210 11

The table value of F.05  8.94 for  1  6 ,  2  3 is less than calculated value of F , we conclude
that the sales of different salesmen do not differ significantly.
The table value of F.05  8.94 for  1  6 ,  2  2 is less than calculated value of F , we conclude
that there is no significant difference in the seasons as far the sales are concerned.
Thus, the test shows that the salesmen and seasons are alike so for as the sales are concerned.
Example. The following data represent the number of units of production per day turned out
by 5different workers using 4 different types of machines:

Workers Machines
A B C D
I 44 38 47
II 36
III 46 40 52
IV 43
V 34 36 44
32
43 38 45
33
38 42 49
39

(a) Test whether the mean productivity is same for the different machines.
(b) Test whether 5 men differ with respect to mean productivity.(Given F.05  3.49 for
 1  12 ,  2  3 , F.05  3.26 for  1  12 ,  2  5 )

Solution. Let us take the null hypotheses that (a) the mean productivity is same for four
different machines and (b)the 5 men do not differ with respect to mean productivity.
To simplify calculations let us subtract each value by 40. The coded data is given below:

Seasons Salesmen Total


A B C
D
I 4 -2 7 5
II -4 21
III 6 0 12 -14
IV 3 0
V 6 -4 4 8
-8
3 -2 6
-7
-2 2 9
-1
Total 5 -6 38 20
-17

T2
Correction factor   20
N
(5) 2 (6) 2 (38) 2 (17 2
Sum of the Squares Between Machines      20  338.8
5 5 5 5
d. f .  3
(5) 2 (21) 2 (14) 2 (0) 2 (8) 2
Sum of the Squares between workers       20  161.5
5 5 5 5 5
d. f .  4
Total Sum of squares (SST) = adding the square of all items in the table and subtracting the
correction factor = 574
d . f .  N  1  20  1  19
SSE  SST  (SSC  SSR)  574  (338.8  161.5)  73.7
d . f .  12 .
The above information can be presented in following ANOVA table

Source of variation Sum of d. f . Mean Sum F


Squares of Squares
Between Machines 338.8 3 112.933 18.38
Between Workers 161.5 4 40.375 6.57
Residual or error 73.7 12 6.142
Total 574 19
The table value of F.05  3.49 for  1  12 ,  2  3 is greater than calculated value of F , we
conclude that the mean productivity is not same for four different types of machines.
The table value of F.05  3.26 for  1  12 ,  2  5 is greater than calculated value of F , we
conclude that the workers differ with respect to their mean productivity.

You might also like