Analysis of Variance Final
Analysis of Variance Final
(ANOVA)
1
Why ANOVA?
In many experiments there are
more than two conditions or
treatments to compare in which
case the two sample t-test will not
suffice. Analysis of variance
(ANOVA) which provides a classical
test of equality of more than two
means.
2
Why ANOVA?
you have two or more populations
that are independent (with
independent variables categorical),
in this case ANOVA becomes
equivalent to a 2 tailed t test (2
sample test where is unknown but
2
assumed equal).
3
Why ANOVA?
• Using t test would require a series of
several t tests to evaluate all of the mean
differences. (Remember, a t test can
compare only 2 means at a time.)
• Although each t test can be done with a
specific α-level (risk of Type I error), the α-
levels accumulate over a series of tests so
that the final experimentwise α-level can
be quite large.
4
Why ANOVA?
• The difference between ANOVA and the t
tests is that ANOVA can be used in
situations where there are two or more
means being compared, whereas the t
tests are limited to situations where only
two means are involved.
• Analysis of variance is necessary to
protect researchers from excessive risk of
a Type I error in situations where a study
is comparing more than two population
means. 5
Why ANOVA?
• ANOVA allows researcher to evaluate all
of the mean differences in a single
hypothesis test using a single α-level and,
thereby, keeps the risk of a Type I error
under control no matter how many
different means are being compared.
• Although ANOVA can be used in a variety
of different research situations, this
chapter presents only independent-
measures designs involving only one
independent variable. 6
Introduction and Definitions
• Analysis of variance (ANOVA) describes
the partition of the response variable sum
of squares in a linear model into
‘explained’ and ‘unexplained’ components.
8
Introduction and Definitions
A single categorical explanatory variable
(factor or classification) corresponds to
one-way analysis of variance;
two factors to two-way analysis of
variance;
three factors to three-way analysis of
variance;
and so on.
9
Objectives and Goals
• Objectives
Appreciate the need for analysing data
from two samples
Understand the assumptions of ANOVA
model
Understand when, and be able to carry out
one way ANOVA
Understand when, and be able to carry out
two way ANOVA and higher way
10
Objectives and Goals
• Goals
To introduce statistical models for
one-way and two-way analysis of
variance.
To show how the models can be fitted
to data by placing restrictions on their
parameters and appropriately coding
regressors.
11
Assumptions of the test
• 1) observations were randomly and
independently chosen from the
populations;
• 2) population distributions are normal
for each group; and
• 3) population variances are equal for
all groups.
12
The Logic and the Process of
Analysis of Variance
• The test statistic for ANOVA is an F-ratio,
which is a ratio of two sample variances. In
the context of ANOVA, the sample variances
are called mean squares, or MS values.
• The top of the F-ratio MSbetween measures the
size of mean differences between samples.
The bottom of the ratio MSwithin measures the
magnitude of differences that would be
expected without any treatment effects.
13
The Logic and the Process of
Analysis of Variance (cont.)
• Thus, the F-ratio has the same basic structure
as
14
The Logic and the Process of
Analysis of Variance (cont.)
• The two components of the F-ratio can be
described as follows:
• Between-Treatments Variability: MSbetween
measures the size of the differences between
the sample means. For example, suppose that
three treatments, each with a sample of n = 5
subjects, have means of M1 = 1, M2 = 2, and
M3 = 3.
Notice that the three means are different; that is,
they are variable.
15
The Logic and the Process of
Analysis of Variance (cont.)
Logically, the differences (or variance) between means
can be caused by two sources:
1. Treatment Effects: If the treatments have different
effects, this could cause the mean for one treatment to
be higher (or lower) than the mean for another
treatment.
2. Chance or Sampling Error: If there is no treatment
effect at all, you would still expect some differences
between samples. Mean differences from one sample
to another are an example of random, unsystematic
sampling error.
16
The Logic and the Process of
Analysis of Variance (cont.)
• Within-Treatments Variability: MSwithin
measures the size of the differences that
exist inside each of the samples.
• Because all the individuals in a sample
receive exactly the same treatment, any
differences (or variance) within a sample
cannot be caused by different treatments.
17
The Logic and the Process of
Analysis of Variance (cont.)
Thus, these differences are caused by
only one source:
1.Chance or Error: The unpredictable
differences that exist between individual
scores are not caused by any systematic
factors and are simply considered to be
random chance or error.
18
Some basic notations
• Experimental Design: Is the sequence of
steps initially taken insure that the data will
be obtained in such a way that its analysis
will lead immediately to valid statistical
inference.
The following questions must be answered
i. How is the effect to be measured
ii. What factors influence the effect
iii. How many of the factors will be
considered at a time
19
Some basic notations cont’d
iv. How many replications (repetitions) of the
experiment will be required
v. What level of difference in effects is
considered significant
• Replication: Is merely complete
repetition of the basic experiment. It
makes tests of significant of effects
possible
20
Some basic notations cont’d
• Randomization: Is means used to
eliminate any bias in the
experimental units and/or treatment
combination. It helps to make
analysis be carried out as thought
the assumption of independence
were true
21
NOTATIONS & TERMINOLOGIES
Yield: Simply any response to any treatment
Treatment: Any event that can generate
response from the recipient e.g. Methods,
fertilizers e. t. c.
Factor: Is simply a basic treatment that can
assume several forms e.g. Factors of growth
are age, diet & environment
Level: Form of the factor
Block: Simply collection of homogenous
plots/experimental material or any collection
of similar or like plots
22
NOTATIONS & TERMINOLOGIES cont’d
• Experimental plot or unit: This is the small
division on which the experiment is to be carried
out on factor of interest
• Experimental error: Is the difference in yields
that exist between the factor of the same type
called residual error. Any error found is an
experimental error and is due to the treatment
after all other sources of errors have been
controlled.
• Replication: Is the need to repeat experiment on
a factor in number of timed to get experimental
error (The more the replication the more the
precision) 23
NOTATIONS & TERMINOLOGIES cont’d
yij i ij
25
NOTATIONS & TERMINOLOGIES cont’d
28
Types Of ANOVA Design
1 2 3
12 Experimental Units
4 5 6
7 8 9
10 11 12
B A B
We measure the yields
C B A
of the treatments to
B C A
carry out analysis
C A C
30
Analysis
xij =jth yield of
Treatments Row
Replication Total treatment i
1 2 3 ... t
1 x11 x21 x31 …. xt1 X1 G = Grand
2 x12 x22 x32 …. xt2 X2 Total
3 x13 x23 x33 …. xt3 X3 ti = Treatment i
. . . …. Total
. . . ….
r x1r x2r x3r …. xtr
Treatment t1 t2 t3 …. tr G
Total
31
Calculation Of Sum Of Squares
i
t 2
G2
i 1
= r ti = Treatment i Total
N
x2
i 1 j i
xij =jth yield of treatment i
ij
N
32
Calculation Of Sum Of Squares cont’d
33
Hypothesis Statements
There are 2 hypotheses namely Null
Hypothesis and Alternative Hypothesis
H0 :(Null hypothesis) can be stated as
i. A = B = C = …. = t
ii. No significant difference
iii. ti = 0 all i [ i.e. no treatment effect]
iv. There is no significant difference in the
treatment effects.
34
Hypothesis Statements cont’d
H1 : (Alternative hypothesis) can be stated
as ~ H0 i.e. negation to Ho as follows
i. A ≠ B ≠ C ≠….. ≠ t
ii. There is a significant difference
iii. ti ≠ 0 for some i’s [ i.e. there is treatment
effects]
iv. There is significant difference in the
treatment effects
35
ANOVA TABLE
Sources of Degree of Sum of Mean Square (MS) fcalculated
Variation freedom Square (SS)
Treatment t-1 SSt (SSt)/(t – 1) = (A) A/B
Error t(r – 1) SSE (SSE)/t(r – 1) = (B)
Total rt – 1 SST
MSt = (SSt)/(t – 1)
MSE = (SSE)/t(r – 1)
fcalculated = fc = fcal = MSt/MSE
ftabulated = ft = ftab = f(t – 1), t(r – 1), α
Where α is the level of significance 36
Decision
If fcalculated > ftabulated, Reject H0
i.e. There is significant difference
OR
If p ≤ α, Rejects H0
Where p is the probability value
Model: yij i ij
37
Examples
CASE I: ONE WAY ANOVA [With Equal Replicates]
Given 3 types of treatments A, B & C with 4 replicates and
yields recorded as shown
38
Examples cont’d
• H0 : A = B = C [i.e. there is no significant difference]
• H1 : A ≠ B ≠ C [i.e. there is significant difference]
r=4
Treatments
Replication t=3
A B C
1 2 2 3 N = t * r = 3*4 = 12
2 3 5 2 G = 36
3 2 4 3
4 2 6 2
TOTAL 9 17 10
36
39
Examples cont’d
t
i
t 2
G2
SSt = i 1
r N
9 2 17 2 10 2 36 2
=
4 12
470
108 117.5 105 9.5
4
t r 2
G
SST =
i
j
xij
2
N
2
36
22 32 22 22 ....... 32 22
12
128 108 20 40
Examples cont’d
SSE = SST – SSt ( i.e. by subtraction)
= 20 – 9.5 = 10.5
42
Example cont’d
CASE II: ONE WAY ANOVA [Unequal Replicates]
Suppose we have treatments A, B & C with A having 3, B has
4 and C has 5 number of replicates. Other things are equal
replicates and yields recorded as shown
Treatment Yields in Kg Total
A 2 3 1 6
B 3 4 5 3 15
C 5 7 2 3 2 19
total 40
Carry out ANOVA test to analyze if there is any significant
different among the treatment. Take α = 0.05
G = 40, N = 12 (by counting)
H0 : A = B = C [i.e. there is no significant difference]
H1 : A ≠ B ≠ C [i.e. there is significant difference] 43
Examples cont’d
t
t i
2
G2
SSt = i 1
ri
N
( for unequal replication)
t 2 r
G
SST =
i
j
xij
2
N
2
40
2 2 32 22 22 ....... 32 22
12
164 133.33 30.67 44
Examples cont’d
SSE = SST – SSt ( i.e. by subtraction)
= 30.67 – 7.12 = 23.55
46
TWO- WAY ANOVA
Here the plots can be classified by a factor into
groups of homogenous series. Each treatment
must appear in every group or block and each
group or block must receive every treatment. It
is also called Complete Randomized Block
Design (CRBD). Its main objective is to isolate
or reduce the residual. In CRBD
i. Intra group (within) variation is minimal
ii. Inter group (between) variation is maximal
iii. Number of plots in each group equal an
exact multiple of number of treatments
47
ILLUSTRATION AND LAYOUT
6 Treatments ( A, B, C, D, E and F)
3 Blocks (B1, B2 and B3)
Note: In a block, each of the treatments must
appear equal number of time.
Also, Number of Replicates = Number of Blocks
B1 B2 B3
B A F D
C E
E F C E
B A
C D A B
D F
48
Analysis
Treatments Row Block
Blocks Total mean
1 2 3 ... k
1 x11 x12 x13 …. x1k B1. x1.
2 x21 x22 x23 …. x2k B2.
x2.
3 x31 x32 x33 …. x3k B3. x3.
. . . …. . .
. . . …. . .
n xn1 xn2 xn3 …. xnk Bn. .
Total T.1 T.2 T.3 …. T.k T..
Treatment
mean x.1 x.2 x.3 x..
xij = yield of jth treatment in ith block
G = T.. = Grand Total 49
CALCULATION OF SUM OF SQUARES
x = Block I mean
i.
T
j 1
.j
2
G2
= 50
n nk
CALCULATION OF SUM OF SQUARES CONT’D
Sum of Squares Block = SSB = BSS
n
= B i.
2
G2
i 1
k nk
51
HYPOTHESIS STATEMENTS
In two-way ANOVA, hypotheses are stated for
each of the two factors namely treatments and
blocks.
For Treatments
H0 : i. A = B = C = ……. = t
ii. No significant difference
iii. i 0 all i ( i.e. No treatment effects)
iv. There is significant difference in the
treatment effects
H1 : ~ H0 ( i.e. H0 is false)
52
HYPOTHESIS STATEMENTS Cont’d
For Blocks
H0 : i. B1 = B2 = B3 = ……. = Bn
ii. No significant difference
iii. i 0 all i ( i.e. No block effects)
iv. There is significant difference in the
block effects
H1 : ~ H0 ( i.e. H0 is false)
53
ANOVA TABLE( Two-way, CRBD
Sources of Degree of Sum of Mean Square (MS) fcalculated
Variation freedom Square (SS)
Treatment k-1 SSt (SSt)/(k – 1) = (A) A/C
Block n-1 SSB (SSB)/(n – 1) = (B) B/C
Error (k – 1)(n – 1) SSE (SSE)/(k -1)(r -1) = (C)
Total nt – 1 SST
MSt = (SSt)/(k – 1)
MSB = (SSB)/(n – 1)
MSE = (SSE)/(k – 1)(r – 1)
fcalculated = fc = fcal = MSt/MSE : for Treatments
fcalculated = fc = fcal = MSB/MSE : for Blocks
ftabulated = ft = ftab = f(k – 1), (k – 1) (n – 1) , α : for Treatments
ftabulated = ft = ftab = f(n – 1), (k – 1) (n – 1) , α : for Blocks
54
Where α is the level of significance
DECISION
fcal is distributed as Fisher’s F with
i. k – 1 as numerator degree of freedom and
(k – 1)(n – 1) as denominator degree of
freedom for treatments
ii. n – 1 as numerator degree of freedom and
(k – 1)(n – 1) as denominator degree of
freedom for treatments for blocks
Each of these is compared with the critical
value of F distribution as required at a
specified significant level α.
55
If f calculated is greater than respective
critical values of the f–distribution, the Null
hypothesis is rejected.
i.e. fcalculated > ftabulated, Reject H0, otherwise
yij j i
ij
57
Solution to example
H0 : Teaching methods are not different for Teaching
H1 : Teaching methods are different methods
15
1, 567 1, 520.07 46.93
58
Solution to example cont’d
n
SSt = T
j 1
.j
2
G2
k nk