Unit-3 ANOVA STATS
Unit-3 ANOVA STATS
Structure
3.0 Introduction
3.1 Objectives
3.2 Analysis of Variance
3.2.1 Meaning of the Variance
3.2.2 Characteristics of Variance
3.2.3 The Procedure of Analysis of Variance (ANOVA)
3.2.4 Steps of One Way Analysis of Variance
3.2.5 Assumptions Underlying Analysis of Variance
3.2.6 Relationship between F test and t test
3.2.7 Merits or Advantages of Analysis of Variance
3.2.8 Demerits or Limitations of Analysis of Variance
3.0 INTRODUCTION
In the foregoing unit you have learned about how to test the significance of a mean
obtained on the basis of observations taken from a group of persons and the test of
significance of the differences between the two means. No doubt the test of significance
of the difference between the two means is a very important technique of inferential
statistics, which is used to test the null hypothesis scientifically and help to draw
concrete conclusion. But its scope is very limited. It is only applicable to the two sets
of scores or the scores obtained from two samples taken from a single population
or from two different populations.
Now imagine if we have to compare the means of more than two populations or the
number of groups, then what would happen? Can we apply successfully the Critical
Ratio Test (CR) or the t test? The answer is yes, but not convenient to apply CR
test or t test. The reason can be stated with an example. Suppose we have three
groups A,B & C and we want to compare the significance difference in the means
of the three groups, then first we have to make the pairs of groups e.g. A and B,
then B and C, and then A and C and apply C.R. test or t test as the conditions
required. In such condition we are to calculate three C.R. values or t values instead
of one.
Now suppose we have eight groups and want to compare the difference in the
means of the groups, in such condition we have to calculate 28 C.R. or t values as
the condition may require.
It means when there are more than two groups say 3, 4, 5 ….. and k, it is not easy
to apply ‘C.R.’ or ‘t’ test of significance very conveniently.
Further ‘C.R.’ or ‘t’ test of significance simply consider the means of two groups and
test the significance of difference exists between the two means. It has no concern 77
Normal Distribution in the variance that exist in the scores of the two groups or variance of the scores
from the mean value of the groups.
For example let us say that A reaction time test was given to 5 boys and 5 girls of
age group 15+ yrs. The scores were obtained in milliseconds are as given in the
table below.
Girls 15 20 5 10 35 85 17M.Sec.
Boys 20 15 20 20 10 85 17M.Sec.
From the mean values shown in the table we can say that the two groups are equal
in their reaction time and the average reaction time is 17 M. Sec. In this example,
if we apply ‘t’ test of significance, we will find, the difference in the two means
insignificant and our null hypothesis is retained.
But if we look carefully to the individual scores of the reaction time of boys and girls,
we will find that there is a difference in the two groups. The group of girls is very
heterogeneous in their reaction time in comparison to the boys.
As the variation between the scores is ranging from 5 to 30 and deviation of scores
from mean varies from 12 M. Sec. to 18 M. Sec.
While the group of boys is more homogeneous in their reaction time, as the variation
in the individual scores is ranging from 5 to 10 and deviation of the scores from mean
is 3 M. Sec to 7 M. Sec therefore group B is much better in their reaction time in
comparison to the group A.
From, this example, you have seen that the test of significance of difference between
the two means, some time lead us to draw wrong conclusion and we may wrongly
retain the null hypotheses, though it should be rejected in real conditions.
Therefore, when we have more than two, say three or four or so forth and so on,
the ‘CR’ or ‘t’ test of significance are not very useful. In such condition, ‘F’ test is
more suitable and it is known as one way analysis of variance. Because we are
testing the significance difference in the average variance exists between the two or
more than two groups, instead to test the significance of the difference of the means
of the groups.
In this unit we will be dealing with F test or the analysis of variance.
3.1 OBJECTIVES
After going through this unit, you will be able to:
z Define variance;
z Differentiate between variance and standard deviation;
z Define analysis of variance;
z Explain when to use the analysis of variance;
78
z Describe the process of analysis of variance; One Way Analysis of
Variance
z Apply analysis of variance to obtain ‘F’ Ratio and to solve related problems;
z Analyse inferences after having the value of ‘F’ Ratio;
z Elucidate the assumptions of analysis of variance;
z List out the precautions while using analysis of variance; and
z consult the ‘F’ table correctly and interpret the results.
The technique of analysis of variance was first devised by Sir Ronald Fisher, an
English statistician who is also known as the father of modern statistics as applied
to social and behavioural sciences. It was first reported in 1923 and its early
applications were in the field of agriculture. Since then it has found wide application
in many areas of experimentation.
In the study of sampling theory, some of the results may be some what more simply
interpreted if the variance of a sample is defined as the sum of the squares of the
deviation divided by its degree of freedom (N-1) rather than as the mean of the
squares deviations.
The variance is the most important measure of variability of a group. It is simply the
square of S.D. of the group, but its nature is quite different from standard deviation, 79
Normal Distribution though formula for computing variance is same as standard deviation (S.D.)
Σ( X − M )
2
∴ Variance = S.D. or σ =
2 2
N
80
One Way Analysis of
4) What do you mean by Analysis of Variance? Why it is preferred in comparison Variance
to ‘t’ test while determining the significance difference in the means.
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
81
Normal Distribution To test the difference in the means i.e. MA, MB and MC, the one way analysis of
variance is used. To apply one way analysis of variance, the following steps are to
be followed:
( ∑ xa + ∑ xb + ∑ xx )
2
(∑ x)
2
(∑ x)
2
= ∑x 2
−
N
( ∑ x3 )
= ( ∑ xa + ∑ xb + ∑ xc ) −
2 2 2
(∑ x)
2
( ∑ xa ) ( ∑ xb ) ( ∑ xc ) (∑ x)
2 2 2 2
= + + −
n1 n2 n3 N
Step 4 Sum of Squares Within the Groups SSW = SST – SSA
SS A
Step 5 Mean Scores of Squares Among the Groups MSSA =
k −1
Where k = number of groups.
SSW
Step 6 Mean Sum of Squares Within the Groups MSSW =
n−k
Where N = Total number of units.
MSS A
Step 7 F Ratio i.e. F = MSS
W
The obtained F ratio in the summary table, furnishes a comprehensive or overall test
of the significance of the difference among means of the groups. A significant F does
not tell us which mean differ significantly from others.
If F-Ratio is not significant, the difference among means is insignificant. The existing
or observed differences in the means is due to chance factors or some sampling
82
fluctuations.
To decide whether obtain F-Ratio is significant or not we are taking the help of F One Way Analysis of
Variance
table from a statistics book.
The obtained F-Ratio is compared with the F value given in the table keeping in mind
two degrees of freedom k-1 which is also known as greater degree of freedom or
df1 and N-k, which is known as smaller degree of freedom or df2. Thus, while testing
the significance of the F ratio, two situations may arise.
The obtained F Ratio is Insignificant:
When the obtained F ratio is found less than the value of F ratio given in F table for
corresponding lower degrees of freedom df1 that is, k-1 and higher degree of
freedom df that is, (df=N-K) (See F table in a Statistics Book) at .05 and .01 level
of significance it is found to be significant or not significant. Thus the null hypothesis
is rejected retained. There is no reason for further testing, as none of the mean
difference will be significant.
When the obtained ‘F Ratio’ is found higher than the value of F ratio given in F table
for its corresponding df1 and df2 at .05 level of .01 level, it is said to be significant.
In such condition, we have to proceed further to test the separate differences among
the two means, by applying ‘t’ test of significance. This further procedure of testing
significant difference among the two means is known as post-hoc test or post ANOVA
test of difference.
To have clear understanding, go through the following working examples very carefully.
Example 1
In a study of intelligence, a group of 5 students of class IX studying each in Arts,
Commerce and Science stream were selected by using random method of sample
selection. An intelligence test was administered to them and the scores obtained are
as under. Determine, whether the three groups differ in their level of intelligence.
Table 3.2.3
Arts Group Comm. Group Science Group
S.No.
Intelligence scores Intelligence scores Intelligence scores
1 15 12 12
2 14 14 15
3 11 10 14
4 12 13 10
5 10 11 10
Null hypothesis H0 =
i.e. the students of IX class studying in Arts, Commerce and Science stream do not
differ in their level of intelligence.
Thus
83
Normal Distribution Table 3.2.4
5 5 5
12.40 12.00 12.20
Cx= = = =
N n1 + n2 + n3 ......nk 5+5+5 15
Or Cx = 2232.60
Step 2 : SST (Sum of squares of total) = ∑ x 2 – Cx
(∑ x)
2
Or (
= ∑ x1 + ∑ x 2 + ∑ x 3 ......... ∑ x k
2 2 2 2
) –
N
= (786+730+765) – 2232.60
= 2281.00 – 2232.60
SST = 48.40
(∑ x)
2
Or = + + + ........... + – Cx
n1 n2 n3 nk
( 62 ) ( 60 ) ( 61)
2 2 2
= + + – 2232.60
5 5 5
= 2233.00 – 2232.60
Or SSA = 0.40
Step 4 : SSW (Sum of squares within the groups) = SST – SSA
Or = 48.40 – 0.40
SSW = 48.00
Step 5 : MSSA (Mean sum of squares among the groups)
SSA 0.40 0.40
MSSA = = =
k – 1 3 −1 2
84
Or MSSA = 0.20
Step 6 : MSSW (Mean sum of squares within the groups) One Way Analysis of
Variance
SSW 48 48
= = =
N − K 15 − 3 12
MSSW = 4.00
MSS A 0.20
Step 7 : F Ratio = MSS = 4.00 = 0.05
W
From F table (refer to statistics book) for 2 and 12 df at .05 level, the F value is
3.59. Our calculated F value is .05, which is very low than the F value given in the
table. Therefore the obtained F ratio is not significant at .05 level of significance for
2 and 12 df. Thus the null hypothesis (H0) is accepted.
Interpretation of Results
Because null hypothesis is rejected at .05 and .01 level of significance therefore with
99% confidence it can be said that the students studying in Arts, Commerce and
Science stream do not differ significantly in their level of intelligence.
Example 2
An experimenter wanted to study the relative effects of four drugs on the physical
growth of rats. The experimenter took a group of 20 rats of same age group, from
same species and randomly divided them into four groups, having five rats in each
group. The experimenter then gave 4 drops of corresponding drug as a one doze to
each rat of the concerned group. The physical growth was measured in terms of
weight. After one month treatment, the gain in weight is given below. Determine if the
drugs are effective for physical growth? Find out if the drugs are equally effective and
determine, which drug is more effective in comparison to other one.
Table 3.2.6 : Observations (Gain in weight in ounce)
Group A Group B Group C Group D
(Drug P) (Drug Q) (Drug R) (Drug S)
4 9 2 7
5 10 6 7
1 9 6 4
0 6 5 2
2 6 2 7
Null hypothesis H0 = μ1 = μ2 = μ3
85
Normal Distribution i.e. All the four drugs are equally effective for the physical growth of the rats.
Therefore:
Table 3.2.7
(∑ x) (12 + 40 + 21 + 27 ) (100 )
2 2 2
(∑ x)
2
Source of
df SS MSS F Ratio
variance
Among Groups 4-1 = 3 82.80
1 1
Here SEDM = SDW +
n1 n2
i.e. S.DW is the within groups S.D. and n1 and n2 are the size of the samples or
groups being compared.
In the given example the means of four groups A, B, C and D are ranging from 2.40
ounce to 8.00 ounce, and the mean difference from 5.60 to 1.20. To determine the
significance of the difference between any two selected means we must compute ‘t’
ratio by dividing the given mean difference by its S.E.DM. The resulting t is then
compared with the ‘t’ value given in ‘t’ table (Table no 2.5.1 of Unit 2) keeping in
view the df of within the groups i.e. dfW. Thus in this way for four groups we have
to calculate 6, ‘t’ values as given below:
Step 6 : Standard deviation of within the groups
= 2.08
Step 7 : Standard Error of Difference of Mean (S.EDM)
1 1
S.E.DM = SDW +
n1 n2
= 1.31
(All the groups have same size therefore the value of SEDM for the two groups will
87
remain same)
Normal Distribution Step 8 : Comparison of the means of the various pairs of groups.
Group A vs B
M A − M B 8.0 − 2.40 5.60
t= = =
S .EDM 1.31 1.31 = 4.28 (Significant at .01 level for 16 df).
Group A vs C
4.20 − 2.40 1.80
t= = = 1.37 (Insignificant at .05 level for 16 df).
1.31 1.31
Group A vs D
( ∑ x1 + ∑ x2 + ∑ x3 ............ + ∑ xk )
2
∑ x2
Cx = Or Cx =
N n1 + n2 + n3 + ..........nk
Step 7 : Calculate sum of squares i.e. SST by using the formula-
SST = ∑ x2 – Cx
Step 8 : Calculate sum of squares among the groups i.e. SSA by using the formula-
∑ x2
SSA = – Cx
n
(∑ x ) + (∑ x ) + (∑ x ) (∑ x )
2 2 2
2 2 2 2 2
+ ............. + − Cx
1 2 3 k
Or SSA =
n1 n2 n3 nk
Step 9 : Calculate sum of squares within the groups i.e. SSw by using the formula
SSW = SST – SSA
Step 10 : Calculate the degrees of freedom as
greater degree of freedom df1 = k – 1 (where k is number of groups)
Smaller degree of freedom df2 = N-k (where N is the total number in the group)
Step 11 : Find the value of Mean sum of squares of two variances as-
SS A
Mean sum of squares between the group MSSA =
k −1
SSW
Mean sum of squares within the groups MSSW =
N −K
Step 12 : Prepare summary table of analysis of variance as shown in 3.2.5 or 3.2.8.
Step 13 : Evaluate obtained F Ratio with the F ratio value given in F table (Table
no. 3.3.1) keeping in mind df1 and df2.
Step 14 : Retain or Reject the Null Hypothesis framed as in step no-I.
Step 15 : If F ratio is found insignificant and null hypothesis is retained, stop further
calculation, and interpret the results accordingly. If F ratio is found significant and null
hypothesis is rejected, go for further calculations and use post-hoc comparison, find
89
the t values and interpret the results accordingly.
Normal Distribution 3.2.5 Assumptions Underlying the Analysis of Variance
The method of analysis of variance has a number of assumption. The failure of the
observations or data to satisfy these assumptions, leads to the invalid inferences. The
following are the main assumptions of analysis of variance.
The distribution of the dependent variable in the population under study is normal.
There exists homogeneity of variance i.e. the variance in the different sets of scores
do not differ beyond chance, in other words σ 1 = σ 2 = σ 3 = ...... = σ k .
The samples of different groups are selected from the population by using random
method of sample selection.
There is no significant difference in the means of various samples or groups taken
from a population.
F = t 2 or t = F
Analysis of variance is used to test the significance of the difference between the
means of a number of different populations say two or more than two.
Analysis of variance deals with variance rather to deal with means and their standard
error of the difference exist between the means.
The variance is the most important measure of variability of a group. It is simply the
square of S.D. of the group i.e. v = σ 2
The problem of testing the significance of the differences between the number of
means results from experiments designed to study the variation in a dependent variable
with variation in independent variable.
Analysis of variance is used when difference in the means of two or more groups is
found insignificant.
There is a fixed relationship between ‘t’ ratio and ‘F’ ratio. The relationship can be
expressed as F = t2 or t = F.
While determining the significance of calculated or obtained ratio, we consider two
types of degrees of freedom. One greater i.e. degree of freedom between the groups
and second smaller i.e. degree of freedom within the groups.
2) A Test Anxiety test was given to three groups of students of X class, classified
as high achievers, average achievers and low achievers. The scores obtained on
the test are shown below. Are the three groups differ in their test anxiety.
93
Normal Distribution 3) Apply ANOVA on the following sets of scores. Interpret your results.
Set-I Set-II Set-III
10 3 10
7 3 11
6 3 10
10 3 5
4 3 6
3 3 8
2 3 9
1 3 12
8 3 9
9 3 10
Calculate:
‘t’ ratio for the two groups.
‘F’ ratio for the two groups.
What should be the degree of freedom for ‘t’ ratio.
What should be the degrees of freedom for ‘F’ ratio.
Interpret the results obtained on ‘t’ ratio and ‘F’ ratio.
6) Why it is necessary to fulfill the assumptions of ‘F’ test, before to apply analysis
of variance.
94
7) Why the ‘F’ ratio test and ‘t’ ratio tests are complementary to each other. One Way Analysis of
Variance
8) What should be the various problems of psychology and education. Where the
ANOVA can be used successfully.
95