0% found this document useful (0 votes)
38 views7 pages

Analysis of Variance (ANOVA)

Uploaded by

04nakhtar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views7 pages

Analysis of Variance (ANOVA)

Uploaded by

04nakhtar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

256 Research Methodology

11
Analysis of Variance
and Co-variance

ANALYSIS OF VARIANCE (ANOVA)


Analysis of variance (abbreviated as ANOVA) is an extremely useful technique concerning researches
in the fields of economics, biology, education, psychology, sociology, business/industry and in researches
of several other disciplines. This technique is used when multiple sample cases are involved. As
stated earlier, the significance of the difference between the means of two samples can be judged
through either z-test or the t-test, but the difficulty arises when we happen to examine the significance
of the difference amongst more than two sample means at the same time. The ANOVA technique
enables us to perform this simultaneous test and as such is considered to be an important tool of
analysis in the hands of a researcher. Using this technique, one can draw inferences about whether
the samples have been drawn from populations having the same mean.
The ANOVA technique is important in the context of all those situations where we want to
compare more than two populations such as in comparing the yield of crop from several varieties of
seeds, the gasoline mileage of four automobiles, the smoking habits of five groups of university
students and so on. In such circumstances one generally does not want to consider all possible
combinations of two populations at a time for that would require a great number of tests before we
would be able to arrive at a decision. This would also consume lot of time and money, and even then
certain relationships may be left unidentified (particularly the interaction effects). Therefore, one
quite often utilizes the ANOVA technique and through it investigates the differences among the
means of all the populations simultaneously.

WHAT IS ANOVA?
Professor R.A. Fisher was the first man to use the term ‘Variance’* and, in fact, it was he who
developed a very elaborate theory concerning ANOVA, explaining its usefulness in practical field.
*
Variance is an important statistical measure and is described as the mean of the squares of deviations taken from the
mean of the given series of data. It is a frequently used measure of variation. Its squareroot is known as standard deviation,
i.e., Standard deviation = Variance.
Analysis of Variance and Co-variance 257

Later on Professor Snedecor and many others contributed to the development of this technique.
ANOVA is essentially a procedure for testing the difference among different groups of data for
homogeneity. “The essence of ANOVA is that the total amount of variation in a set of data is broken
down into two types, that amount which can be attributed to chance and that amount which can be
attributed to specified causes.”1 There may be variation between samples and also within sample
items. ANOVA consists in splitting the variance for analytical purposes. Hence, it is a method of
analysing the variance to which a response is subject into its various components corresponding to
various sources of variation. Through this technique one can explain whether various varieties of
seeds or fertilizers or soils differ significantly so that a policy decision could be taken accordingly,
concerning a particular variety in the context of agriculture researches. Similarly, the differences in
various types of feed prepared for a particular class of animal or various types of drugs manufactured
for curing a specific disease may be studied and judged to be significant or not through the application
of ANOVA technique. Likewise, a manager of a big concern can analyse the performance of
various salesmen of his concern in order to know whether their performances differ significantly.
Thus, through ANOVA technique one can, in general, investigate any number of factors which
are hypothesized or said to influence the dependent variable. One may as well investigate the
differences amongst various categories within each of these factors which may have a large number
of possible values. If we take only one factor and investigate the differences amongst its various
categories having numerous possible values, we are said to use one-way ANOVA and in case we
investigate two factors at the same time, then we use two-way ANOVA. In a two or more way
ANOVA, the interaction (i.e., inter-relation between two independent variables/factors), if any, between
two independent variables affecting a dependent variable can as well be studied for better decisions.

THE BASIC PRINCIPLE OF ANOVA


The basic principle of ANOVA is to test for differences among the means of the populations by
examining the amount of variation within each of these samples, relative to the amount of variation
between the samples. In terms of variation within the given population, it is assumed that the values
of (Xij) differ from the mean of this population only because of random effects i.e., there are influences
on (Xij) which are unexplainable, whereas in examining differences between populations we assume
that the difference between the mean of the jth population and the grand mean is attributable to what
is called a ‘specific factor’ or what is technically described as treatment effect. Thus while using
ANOVA, we assume that each of the samples is drawn from a normal population and that each of
these populations has the same variance. We also assume that all factors other than the one or more
being tested are effectively controlled. This, in other words, means that we assume the absence of
many factors that might affect our conclusions concerning the factor(s) to be studied.
In short, we have to make two estimates of population variance viz., one based on between
samples variance and the other based on within samples variance. Then the said two estimates of
population variance are compared with F-test, wherein we work out.
Estimate of population variance based on between samples variance
F=
Estimate of population variance based on within samples variance

1
Donald L. Harnett and James L. Murphy, Introductory Statistical Analysis, p. 376.
258 Research Methodology

This value of F is to be compared to the F-limit for given degrees of freedom. If the F value we
work out is equal or exceeds* the F-limit value (to be seen from F tables No. 4(a) and 4(b) given in
appendix), we may say that there are significant differences between the sample means.

ANOVA TECHNIQUE
One-way (or single factor) ANOVA: Under the one-way ANOVA, we consider only one factor
and then observe that the reason for said factor to be important is that several possible types of
samples can occur within that factor. We then determine if there are differences within that factor.
The technique involves the following steps:
(i) Obtain the mean of each sample i.e., obtain
X 1, X 2, X 3 , ... , X k
when there are k samples.
(ii) Work out the mean of the sample means as follows:

X 1 + X 2 + X 3 + ... + X k
X =
No. of samples ( k )
(iii) Take the deviations of the sample means from the mean of the sample means and calculate
the square of such deviations which may be multiplied by the number of items in the
corresponding sample, and then obtain their total. This is known as the sum of squares for
variance between the samples (or SS between). Symbolically, this can be written:

SS between = n1 X 1 − XFH IK 2
FH
+ n2 X 2 − X IK 2
FH
+ ... + n k X k − X IK 2

(iv) Divide the result of the (iii) step by the degrees of freedom between the samples to obtain
variance or mean square (MS) between samples. Symbolically, this can be written:
SS between
MS between =
( k – 1)
where (k – 1) represents degrees of freedom (d.f.) between samples.
(v) Obtain the deviations of the values of the sample items for all the samples from corresponding
means of the samples and calculate the squares of such deviations and then obtain their
total. This total is known as the sum of squares for variance within samples (or SS within).
Symbolically this can be written:
d
SS within = ∑ X 1i − X 1 i 2
d
+ ∑ X 2i − X 2 i 2
d
+ ... + ∑ X ki − X k i 2

i = 1, 2, 3, …
(vi) Divide the result of (v) step by the degrees of freedom within samples to obtain the variance
or mean square (MS) within samples. Symbolically, this can be written:

*
It should be remembered that ANOVA test is always a one-tailed test, since a low calculated value of F from the sample
data would mean that the fit of the sample means to the null hypothesis (viz., X 1 = X 2 ... = X k ) is a very good fit.
Analysis of Variance and Co-variance 259

SS within
MS within =
(n – k )
where (n – k) represents degrees of freedom within samples,
n = total number of items in all the samples i.e., n1 + n2 + … + nk
k = number of samples.
(vii) For a check, the sum of squares of deviations for total variance can also be worked out by
adding the squares of deviations when the deviations for the individual items in all the
samples have been taken from the mean of the sample means. Symbolically, this can be
written:

FH
SS for total variance = ∑ X ij − X IK 2
i = 1, 2, 3, …

j = 1, 2, 3, …
This total should be equal to the total of the result of the (iii) and (v) steps explained above
i.e.,
SS for total variance = SS between + SS within.
The degrees of freedom for total variance will be equal to the number of items in all
samples minus one i.e., (n – 1). The degrees of freedom for between and within must add
up to the degrees of freedom for total variance i.e.,
(n – 1) = (k – 1) + (n – k)
This fact explains the additive property of the ANOVA technique.
(viii) Finally, F-ratio may be worked out as under:
MS between
F -ratio =
MS within
This ratio is used to judge whether the difference among several sample means is significant
or is just a matter of sampling fluctuations. For this purpose we look into the table*, giving
the values of F for given degrees of freedom at different levels of significance. If the
worked out value of F, as stated above, is less than the table value of F, the difference is
taken as insignificant i.e., due to chance and the null-hypothesis of no difference between
sample means stands. In case the calculated value of F happens to be either equal or more
than its table value, the difference is considered as significant (which means the samples
could not have come from the same universe) and accordingly the conclusion may be
drawn. The higher the calculated value of F is above the table value, the more definite and
sure one can be about his conclusions.

SETTING UP ANALYSIS OF VARIANCE TABLE


For the sake of convenience the information obtained through various steps stated above can be put
as under:

*
An extract of table giving F-values has been given in Appendix at the end of the book in Tables 4 (a) and 4 (b).
264 Research Methodology

= (6)2 + (7)2 + (3)2 + (8)2 + (5)2 + (5)2 + (3)2

FG 60 × 60 IJ
+ (7)2 + (5)2 + (4)2 + (3)2 + (4)2 –
H 12 K
= 332 – 300 = 32

SS between = ∑
dT i − b T g
j
2
2

nj n

FG 24 × 24 IJ + FG 20 × 20IJ + FG 16 × 16IJ − FG 60 × 60IJ


=
H 4 K H 4 K H 4 K H 12 K
= 144 + 100 + 64 – 300
=8

SS within = ∑ X ij2 −∑
dT i
j
2

nj
= 332 – 308
= 24
It may be noted that we get exactly the same result as we had obtained in the case of direct
method. From now onwards we can set up ANOVA table and interpret F-ratio in the same manner
as we have already done under the direct method.

TWO-WAY ANOVA
Two-way ANOVA technique is used when the data are classified on the basis of two factors. For
example, the agricultural output may be classified on the basis of different varieties of seeds and also
on the basis of different varieties of fertilizers used. A business firm may have its sales data classified
on the basis of different salesmen and also on the basis of sales in different regions. In a factory, the
various units of a product produced during a certain period may be classified on the basis of different
varieties of machines used and also on the basis of different grades of labour. Such a two-way design
may have repeated measurements of each factor or may not have repeated values. The ANOVA
technique is little different in case of repeated measurements where we also compute the interaction
variation. We shall now explain the two-way ANOVA technique in the context of both the said
designs with the help of examples.
(a) ANOVA technique in context of two-way design when repeated values are not there: As we
do not have repeated values, we cannot directly compute the sum of squares within samples as we
had done in the case of one-way ANOVA. Therefore, we have to calculate this residual or error
variation by subtraction, once we have calculated (just on the same lines as we did in the case of one-
way ANOVA) the sum of squares for total variance and for variance between varieties of one
treatment as also for variance between varieties of the other treatment.
Analysis of Variance and Co-variance 265

The various steps involved are as follows:


(i) Use the coding device, if the same simplifies the task.
(ii) Take the total of the values of individual items (or their coded values as the case may be)
in all the samples and call it T.
(iii) Work out the correction factor as under:

Correction factor =
bT g 2

n
(iv) Find out the square of all the item values (or their coded values as the case may be) one by
one and then take its total. Subtract the correction factor from this total to obtain the sum of
squares of deviations for total variance. Symbolically, we can write it as:
Sum of squares of deviations for total variance or total SS

= ∑ X ij2 −
bT g 2

n
(v) Take the total of different columns and then obtain the square of each column total and
divide such squared values of each column by the number of items in the concerning
column and take the total of the result thus obtained. Finally, subtract the correction factor
from this total to obtain the sum of squares of deviations for variance between columns or
(SS between columns).
(vi) Take the total of different rows and then obtain the square of each row total and divide
such squared values of each row by the number of items in the corresponding row and take
the total of the result thus obtained. Finally, subtract the correction factor from this total to
obtain the sum of squares of deviations for variance between rows (or SS between rows).
(vii) Sum of squares of deviations for residual or error variance can be worked out by subtracting
the result of the sum of (v)th and (vi)th steps from the result of (iv)th step stated above. In
other words,
Total SS – (SS between columns + SS between rows)
= SS for residual or error variance.
(viii) Degrees of freedom (d.f.) can be worked out as under:
d.f. for total variance = (c . r – 1)
d.f. for variance between columns = (c – 1)
d.f. for variance between rows = (r – 1)
d.f. for residual variance = (c – 1) (r – 1)
where c = number of columns
r = number of rows
(ix) ANOVA table can be set up in the usual fashion as shown below:
266 Research Methodology

Table 11.3: Analysis of Variance Table for Two-way Anova


Source of Sum of squares Degrees of Mean square F-ratio
variation (SS) freedom (d.f.) (MS)

Between
columns ∑
dT i − bT g
j
2
2
(c – 1)
SS between columns MS between columns
treatment nj n (c – 1) MS residual

Between
rows ∑
bT g − bT g
i
2 2
(r – 1)
SS between rows MS between rows
treatment ni n (r – 1) MS residual

Residual Total SS – ( SS SS residual


(c – 1) (r – 1)
or error between columns (c – 1) (r – 1)
+ SS between rows)

Total ∑ X ij2 −
bT g 2
(c.r – 1)
n

In the table c = number of columns


r = number of rows
SS residual = Total SS – (SS between columns + SS between rows).
Thus, MS residual or the residual variance provides the basis for the F-ratios concerning
variation between columns treatment and between rows treatment. MS residual is always
due to the fluctuations of sampling, and hence serves as the basis for the significance test.
Both the F-ratios are compared with their corresponding table values, for given degrees of
freedom at a specified level of significance, as usual and if it is found that the calculated
F-ratio concerning variation between columns is equal to or greater than its table value,
then the difference among columns means is considered significant. Similarly, the F-ratio
concerning variation between rows can be interpreted.
Illustration 2
Set up an analysis of variance table for the following two-way design results:
Per Acre Production Data of Wheat
(in metric tonnes)
Varieties of seeds A B C
Varieties of fertilizers
W 6 5 5
X 7 5 4
Y 3 3 3
Z 8 7 4

Also state whether variety differences are significant at 5% level.

You might also like