Analysis Variance Completely Random Design
Analysis Variance Completely Random Design
For experiments with outcomes whose values are continuous measures and considered to
be statistically normal, it makes sense to compare group means.
In the case where there are only two means to compare, it is normal to use a t-test for
independent groups to determine if there are statistically significant differences.
When an experiment consists of more than two groups, the analysis becomes a bit more
complicated. Suppose you are comparing the means of three groups and you’re interested
in knowing which means are significantly different from the other means.
It may seem logical to perform t-tests for all pairs of means – (Brand A vs. Brand B), (Brand
A vs. Brand C) and (Brand B vs. Brand C). In other words, perform t-tests on all possible
comparisons. However, there is a fundamental problem with this technique. The p-value
associated with each t-test is determined as if only one t-test is performed per experiment. If
three t-tests are performed in a single experiment, then the p-values for these tests are no
longer accurate. The more comparisons you make the more likely it is that you will say there
is a significant difference between two means when there actually isn’t.
The solution to this multiple comparisons’ problem is NOT to perform all possible
t-tests. Instead, a two-step procedure is used - an Analysis of Variance (ANOVA)
followed by a Multiple Comparison Test.
The first part answers the question, “Is there at least one mean that is significantly
different from one other mean?” If the p-value for the ANOVA results is less than your
chosen significance level (usually 0.05), you have evidence that at least one mean is
different. If the p-value is not significant your analysis is over and you conclude that
there is no difference between any pair of means.
If the ANOVA’s p-value is significant, proceed to the second stage of the analysis to
answer the question, “Which means are significantly different from which other
means?” This is the multiple comparison stage.
One-way Analysis of Variance of CRD
Assumptions that are made:
Each sample consists of units that are randomly selected from each group with
the sample from one group being unrelated to (i.e. independent of) the sample
from the other groups.
The units sampled from each group are independent of each other and normally
distributed
The variances of the populations are assumed to be equal.
Sample sizes do not have to be the same, although large differences in sample
size may affect the outcome.
State hypotheses
H 0 : 1 2 ... k i.e all means are equal; NO differences
between the groups.
H A : 1 2 ... k i.e. the means of at least two groups are not equal.
Note that the table is similar to the Linear Regression table, with a row for ‘Treatment’
instead of ‘Regression’.
* Treatment can also be referred to as ‘Group’. This refers to the various treatments
that are to be compared. k is generally used to represent the number of treatments
involved.
y 2
CF Add all entries
Factor:
number of entries
n Square the result
Divide by n
Think of the p-value as the probability of getting the calculated F test value (or an even
larger value) when the null hypothesis is actually true.
When the p-value is smaller than the significance level we reject H0.
CF
all entries 2
y 2
132.5 2
1170.42 .
number of entries n 15
y
(iii) Calculate the Total Sum of Squares 2
n
132.5 2
1184.11 13.69
15
(iv) Calculate the Group (Treatment) Sum of Squares
You now have to use the treatment totals.
(viii) Find the p value and compare with a, the significance level
Numerator df = 2; denominator df = 12
df numerator
df a 1 2 …
denominator
1 …
… …
12 0100 2.81
0050 3.89
0.025 5.10
0.010 6.93
0.001 12.97
Since the calculated value for F (13.28) is greater than 12.97, the level of
significance, p, is less than 0.001. This can be written as simply 0.001
or p < 0.001.
Note that because computers are able to store information to a greater degree of
accuracy, the p-value often appears as zero (through rounding), or as a specific
value. In answering questions, you are expected to use tables to look up critical
values to compare with you test result. You can further demonstrate your
understanding of how significant your conclusion is by stating that your result is:
significant (p< 0.05), very significant (0.005<p < 0.01) or highly significant (p <0.001).
(P-values do not simply provide you with a “Yes” or “No” answer. They provide a sense of the
strength of the evidence for the null hypothesis. The lower the p-value, the weaker the
support for H0 i.e., the stronger the evidence against H0. Once you know how to read p-
values, you can more critically interpret journal articles, and decide for yourself if you agree
with the conclusions of the author.)
(vii) Complete the ANOVA table with group df = (k-1) and error df = (n-k)
respectively:
SOURCE df SS MS F p
Treatment 2 9.43 4.715 13.28 <0.001
Error 12 4.26 0.355
Total 14 13.69
Since this result is highly significant, we can reject H0 in favour of HA. That is, there is
strong evidence that the mean km/l is different in at least one of the three makes of
car.
The next step would be to do a multiple comparison test. This will be dealt with later.
Test questions (multi-choice) Answers p.13/14
1. The dry shear strength of birch plywood bonded with different resin glues was
studied with a completely randomised designed experiment.
Analysis of Variance
SOURCE df SS MS F p
Treatment * **** **** 37.99 0.000
Error ** 8168 628
Total ** ****
i. For the shear strength of birch plywood data, what are the error degrees of freedom?
A. 14 B. 12 C. 13 D. 2 E. 8
ii. For the shear strength of birch plywood data, what is the standard error of the
difference between the mean of Glue C and the mean of Glue F?
2. Pott (1992) looked at the effect of feeding dietary molybdenum (Mo) on the Mo
concentration in sheep kidneys. 20 sheep were randomly allocated to one of the four
treatment groups.
Treatment A I 3
Treatment S38 3
Treatment G_9 5
Treatment A2 6
Treatment GH 4
Control 8
A. 6 B. 28 C. 23 D. 5 E. 29
Calculate an ANOVA table, and use the results to test the hypothesis that there is no
difference in the means against the alternative hypothesis that at least one population
mean is different from the other two.
3. Nine Pigs were used in a feeding trial where different levels of vitamin B12 were added
to the diet (11, 22 and 44 kg1- of feed). The daily average wight gain (gm day-1) were
as follows.
Source DF SS MS F P
Strains 2 359.79 179.90 31.10 0.000
Error 221 1278.42 5.78
Total 223 1638.21
What is the standard error of the difference between the means for strain 9D and
11C?
5. The standard error of the difference between the mean for 11C and DSC1 is 0.374.
What is the value of the t statistic to test if the means are significantly different?
6. Twenty people, patients with high systolic blood pressure, were randomly allocated to
one of five treatment protocols: a control using a standard drug, and four others with
the standard and various other components. The ANOVA is as follows:
Source DF SS MS F P
Drug 4 388.70 97.18 15.10 0.000
Error 15 96.50 6.43
Total 19 485.20
Drug N Mean
S 4 25.00
S+AL 4 15.75
S+AH 4 11.50
S+BL 4 19.00
S+BH 4 17.75
The t statistic to test for the difference between the mean of ‘S’(25.00) and the mean
of ‘S+BL’ (19.00) is 3.35. Determine if the means are significantly different, and at
what level. Justify your answer.
Answers Completely Randomised Design:
Multi choice
1. i) C 13
dftreatment 3 1 2; dftotal 16 1 15; dferror 16 3 13
sed MSerror n1 n1 628 14 15 16.81
1 2
ii) B 16.81
3. n 29, k 6, df error 23
Long answers
CF
number of entries n 20.82
2.12 3.32 0.22 ... 0.22 2.02
20.8 2
43.264 10
SStreatment 10
SSBG SS Between Groups
59.32 43.264 16.056
group total 2
CF
number in group
5.6 2
11.9 2
3.3
20.82
2
SS Error SS Total SS Between Groups
3 4 3 10 16.056 6.222 9.834
49.486 43.264 6.222
That is, assuming no difference between the treatments, (i.e., H0 is True), there is a
greater than 10% probability of getting these results by chance. This is not small enough
to be able to reject H0. (Using a significance level of 0.05 we would reject H0 only if the p-
value was smaller than 0.05).
2. a) SOURCE df SS MS F p
Treatments 4 24.7 6.175 4.91 0.001<p<0.01
Error 30 37.7 1.257
Total 34 62.4
c) Calculated F = 4.91, Ftable = 4.02 for a=0.01; Ftable = 6.12 for a =0.001
Hence, we can reject Ho, that there is no difference among the population means at a
= 0.01, but not at a = 0.001 Hence 0.001<p<0.01
Calculate the sum of squares between the groups – that is between treatments.
20962 21502 20812 6327 2
SStreatment 878
3 3 3 9
Since the number of pigs in each group is the same, this could also be done:
SStreatment
2096 2
21502 20812 6327 2
878
3 9
4. sed EMS 1
n1 n1
2
5.78 31 1
601 0.532
6. The t table for 15 DF lists 2.947 under the 0.005 column. The 3.35 is larger than this
but not as large as the 3.733 under the 0.001 column. This is a two sided test, so
double the tail value and conclude that the means are significantly different at the
0.01 level. (i.e., p<0.01)