Topic 3
Topic 3
3.1 Introduction
Analysis of Variance (or ANOVA for short) allows us to compare means across more than
two groups. ANOVA is a test of hypothesis that is appropriate to compare means of a
continuous variable in two or more independent comparison groups. For example, in some
clinical trials there are more than two comparison groups. In a clinical trial to evaluate a new
medication for asthma, investigators might compare an experimental medication to a placebo and
to a standard treatment.
It is normally used to breakdown variations into various components of the different variables. It
enables us to test the significance of the differences among more than two sample means. That is,
testing whether different samples have been drawn from the same population with the same
characteristics. If x 1 = x 2 = x 3, then the three samples have been drawn from a similar
population.
If one is examining the means observed among, say three groups, it might be tempting to
perform three separate group to group comparisons, but this approach is incorrect because each
of these comparisons fails to take into account the total data, and it increases the likelihood of
incorrectly concluding that there are statistically significant differences, since each comparison
adds to the probability of a type I error. ANOVA avoids these problems by asking a more global
question, that is, whether there are significant differences among the groups, without addressing
differences between any two groups in particular. The fundamental strategy of ANOVA is to
systematically examine variability within groups being compared and also examine variability
among the groups being compared.
For n sample:
Ho: µ1 = µ2 = µ3 …= µn (equal)
H1: (not equal)
Based on the comparison of two different estimates of the variances of all the samples,
determine:
1. Variance among the sample means ( )
2. Variance within samples
If ( )= ( ) then the samples must have been drawn from the same population.
∑ ( ̅ ̿)
( )
Where
= variance of the sample i
This ratio of variance among and variance within is called the Fisher’s test statistic.
The Theoretical F-statistic ( ) is found by getting the degrees of freedom for the numerator (k-
1) and of the denominator( ) at different levels of significance.
If Reject Ho
Accept Ho
Example
Agriculture extension officers would like test the effectiveness of 4 different fertilizers on the
yields of tomatoes. They prepare samples of 5 plots for each and applied the fertilizers
accordingly. The yields per plot for each of the fertilizers are given below. At 5% level of
significance, test whether the fertilizers are equally effective.
A B C D
2 3 5 6
3 4 5 8
1 3 5 7
3 5 3 4
1 0 2 10
Total 10 15 20 35
Means 2 3 4 7
Variance 0.8 2.8 1.6 4
∑ ( ̅ ̿)
̅ (̅ ̿) (̅ ̿) (̅ ̿)
2 -2 4 20
3 -1 1 5
4 0 0 0
7 3 9 45
70
( )
Where
= variance of the sample i
( ) ( ) ( ) ( )
The Theoretical F-statistic ( ) is found by getting the degrees of freedom for the numerator (k-
1) and of the denominator( ) at different levels of significance.
( )
Since we reject the Ho.
That is, the means are different. Hence the samples are drawn from different populations.
The F Distribution Table
The F distribution is a right-skewed distribution used most commonly in Analysis of Variance.
3.4.2 Two-Way Analysis of Variance
Used when there are two or more independent variables. Each of these factors can have multiple
levels. This two-way ANOVA not only measures the independent vs the independent variable,
but if the two factors affect each other. A two-way ANOVA assumes:
Continuous assumption of the dependent variable: The same as a one-way ANOVA, the
dependent variable should be continuous.
Independence: Each sample is independent of other samples, with no crossover.
Variance: The variance in data across the different groups is the same.
Normalcy: The samples are representative of a normal population.
Categories: The independent variables should be in separate categories or groups.
Example
The student marks are as presented in the table below and the effect on it determined based on
Noise and gender:
Students Low Noise Medium Noise Loud Noise
Male students 10 7 4
12 9 5
11 8 6
9 12 5
Female students 12 13 6
13 15 6
10 12 4
13 12 4
Required:
Does noise have an effect on the marks of a student scores
Does gender have an effect on the marks of a student scores
Does gender effect how a student reacts to noise (Collective effect of gender and noise on
the marks)
Solution
State the null hypothesis:
Ho: There is no significant effect of noise on marks of a student scores
Ho: There is no significant effect of gender on marks of a student scores
Ho: There is no significant effect of gender and noise on marks of a student scores
Students Low Noise Medium Noise Loud Noise Row Total
Male students 10 7 4 R1 = 98
12 9 5
11 8 6
9 12 5
Column Total C1 = 90 C2 = 88 C3 = 40
(∑ )
( )
= (10+12+11+9+7+9+8+12+4+5+6+5+12+13+10+13+13+15+12+12+6+6+4+4)^2 / 24
= 1980
( ) ∑
= 200
For Variation in Gender
= 16.33
= 37.67
Total ( n-1) = 23
Since we reject the Ho. Hence noise does have effect on marks obtained
For Gender:
Since we reject the Ho. Hence gender does have effect on marks obtained
Example
Factor DV1 DV2 ̅ for DV1 ̅ for DV1 ̅ for DV2 ̅ for DV2
( )=∑ (̅ ̅)
Factor DV1 DV2 ̅ for DV1 (DV1- ̅ )^2 ̅ for DV2 (DV2- ̅ )^2
1 4 1 2.75 1.5625 2.25 1.5625
1 2 4 2.75 0.565 2.25 3.0625
1 1 3 2.75 3.0625 2.25 0.5625
1 4 1 2.75 1.5625 2.25 1.5625
2 5 4 5 0 4.75 0.5625
2 6 5 5 1 4.75 0.0625
2 5 4 5 0 4.75 0.5625
2 4 6 5 1 4.75 1.5625
3 6 8 7 1 7.25 0.5625
3 8 7 7 1 7.25 0.0625
3 8 8 7 1 7.25 0.5625
3 6 6 7 1 7.25 1.5625
12.75 12.25
= 12.75
= 12.25
Calculate covariance to determine how the two variables are related both for the model and the
error. In this case we calculate the cross product.
Cross product for the model:
∑(̅̅̅ ̅ )( ̅ ̅ )
∑( ̅̅̅̅̅ )( ̅̅̅̅̅ )
DV1 DV2
1 4 1 2.75 4.917 2.25 4.75 5.4175
-1.5625
1 2 4 2.75 4.917 2.25 4.75 5.4175
-1.3125
1 1 3 2.75 4.917 2.25 4.75 5.4175
-1.3125
1 4 1 2.75 4.917 2.25 4.75 5.4175
-1.5625
2 5 4 5 4.917 4.75 4.75 0
0
2 6 5 5 4.917 4.75 4.75 0
0.25
2 5 4 5 4.917 4.75 4.75 0
0
2 4 6 5 4.917 4.75 4.75 0
-1.25
3 6 8 7 4.917 7.25 4.75 2.5
-0.75
3 8 7 7 4.917 7.25 4.75 2.5
-0.25
3 8 8 7 4.917 7.25 4.75 2.5
0.75
3 6 6 7 4.917 7.25 4.75 2.5
1.25
42.5 -5.75
Make the cross product matrices for the model (H) and for the error (E).
[ ]
[ ]
Calculate the F-value. Ratio between the model and the error (Ratio between matrix H and
Matrix E)
Since matrices cannot be divided, what we do is that we multiply by the inverse
Inverse of E:
[ ]
[ ] x [ ]=[ ]