0% found this document useful (0 votes)
49 views60 pages

Analysis of Variance Final

The document provides an overview of analysis of variance (ANOVA). It explains that ANOVA allows researchers to evaluate differences between multiple population means in a single test, controlling the risk of Type I errors. ANOVA partitions total variation in a dataset into components associated with treatments and error. The assumptions of ANOVA include independent and normally distributed samples with equal variances.

Uploaded by

kester TV
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views60 pages

Analysis of Variance Final

The document provides an overview of analysis of variance (ANOVA). It explains that ANOVA allows researchers to evaluate differences between multiple population means in a single test, controlling the risk of Type I errors. ANOVA partitions total variation in a dataset into components associated with treatments and error. The assumptions of ANOVA include independent and normally distributed samples with equal variances.

Uploaded by

kester TV
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

ANALYSIS OF VARIANCE

(ANOVA)

1
Why ANOVA?
In many experiments there are
more than two conditions or
treatments to compare in which
case the two sample t-test will not
suffice. Analysis of variance
(ANOVA) which provides a classical
test of equality of more than two
means.
2
Why ANOVA?
you have two or more populations
that are independent (with
independent variables categorical),
in this case ANOVA becomes
equivalent to a 2 tailed t test (2
sample test where  is unknown but
2

assumed equal).

3
Why ANOVA?
• Using t test would require a series of
several t tests to evaluate all of the mean
differences. (Remember, a t test can
compare only 2 means at a time.)
• Although each t test can be done with a
specific α-level (risk of Type I error), the α-
levels accumulate over a series of tests so
that the final experimentwise α-level can
be quite large.
4
Why ANOVA?
• The difference between ANOVA and the t
tests is that ANOVA can be used in
situations where there are two or more
means being compared, whereas the t
tests are limited to situations where only
two means are involved.
• Analysis of variance is necessary to
protect researchers from excessive risk of
a Type I error in situations where a study
is comparing more than two population
means. 5
Why ANOVA?
• ANOVA allows researcher to evaluate all
of the mean differences in a single
hypothesis test using a single α-level and,
thereby, keeps the risk of a Type I error
under control no matter how many
different means are being compared.
• Although ANOVA can be used in a variety
of different research situations, this
chapter presents only independent-
measures designs involving only one
independent variable. 6
Introduction and Definitions
• Analysis of variance (ANOVA) describes
the partition of the response variable sum
of squares in a linear model into
‘explained’ and ‘unexplained’ components.

• The term also refers to procedures for


fitting and testing linear models in which
the explanatory variables are categorical.
7
Introduction and Definitions
For splitting or partitioning of total variation
in an experiment into different components

8
Introduction and Definitions
 A single categorical explanatory variable
(factor or classification) corresponds to
one-way analysis of variance;
two factors to two-way analysis of
variance;
 three factors to three-way analysis of
variance;
and so on.

9
Objectives and Goals
• Objectives
Appreciate the need for analysing data
from two samples
Understand the assumptions of ANOVA
model
Understand when, and be able to carry out
one way ANOVA
Understand when, and be able to carry out
two way ANOVA and higher way

10
Objectives and Goals
• Goals
To introduce statistical models for
one-way and two-way analysis of
variance.
To show how the models can be fitted
to data by placing restrictions on their
parameters and appropriately coding
regressors.

11
Assumptions of the test
• 1) observations were randomly and
independently chosen from the
populations;
• 2) population distributions are normal
for each group; and
• 3) population variances are equal for
all groups.

12
The Logic and the Process of
Analysis of Variance
• The test statistic for ANOVA is an F-ratio,
which is a ratio of two sample variances. In
the context of ANOVA, the sample variances
are called mean squares, or MS values.
• The top of the F-ratio MSbetween measures the
size of mean differences between samples.
The bottom of the ratio MSwithin measures the
magnitude of differences that would be
expected without any treatment effects.

13
The Logic and the Process of
Analysis of Variance (cont.)
• Thus, the F-ratio has the same basic structure
as

obtained mean differences (including treatment effects) MSbetween


F = ────────────────────────────────────── = ───────
differences expected by chance (without treatment effects) MSwithin

14
The Logic and the Process of
Analysis of Variance (cont.)
• The two components of the F-ratio can be
described as follows:
• Between-Treatments Variability: MSbetween
measures the size of the differences between
the sample means. For example, suppose that
three treatments, each with a sample of n = 5
subjects, have means of M1 = 1, M2 = 2, and
M3 = 3.
Notice that the three means are different; that is,
they are variable.
15
The Logic and the Process of
Analysis of Variance (cont.)
Logically, the differences (or variance) between means
can be caused by two sources:
1. Treatment Effects: If the treatments have different
effects, this could cause the mean for one treatment to
be higher (or lower) than the mean for another
treatment.
2. Chance or Sampling Error: If there is no treatment
effect at all, you would still expect some differences
between samples. Mean differences from one sample
to another are an example of random, unsystematic
sampling error.

16
The Logic and the Process of
Analysis of Variance (cont.)
• Within-Treatments Variability: MSwithin
measures the size of the differences that
exist inside each of the samples.
• Because all the individuals in a sample
receive exactly the same treatment, any
differences (or variance) within a sample
cannot be caused by different treatments.

17
The Logic and the Process of
Analysis of Variance (cont.)
Thus, these differences are caused by
only one source:
1.Chance or Error: The unpredictable
differences that exist between individual
scores are not caused by any systematic
factors and are simply considered to be
random chance or error.

18
Some basic notations
• Experimental Design: Is the sequence of
steps initially taken insure that the data will
be obtained in such a way that its analysis
will lead immediately to valid statistical
inference.
The following questions must be answered
i. How is the effect to be measured
ii. What factors influence the effect
iii. How many of the factors will be
considered at a time
19
Some basic notations cont’d
iv. How many replications (repetitions) of the
experiment will be required
v. What level of difference in effects is
considered significant
• Replication: Is merely complete
repetition of the basic experiment. It
makes tests of significant of effects
possible

20
Some basic notations cont’d
• Randomization: Is means used to
eliminate any bias in the
experimental units and/or treatment
combination. It helps to make
analysis be carried out as thought
the assumption of independence
were true

21
NOTATIONS & TERMINOLOGIES
Yield: Simply any response to any treatment
Treatment: Any event that can generate
response from the recipient e.g. Methods,
fertilizers e. t. c.
Factor: Is simply a basic treatment that can
assume several forms e.g. Factors of growth
are age, diet & environment
Level: Form of the factor
Block: Simply collection of homogenous
plots/experimental material or any collection
of similar or like plots
22
NOTATIONS & TERMINOLOGIES cont’d
• Experimental plot or unit: This is the small
division on which the experiment is to be carried
out on factor of interest
• Experimental error: Is the difference in yields
that exist between the factor of the same type
called residual error. Any error found is an
experimental error and is due to the treatment
after all other sources of errors have been
controlled.
• Replication: Is the need to repeat experiment on
a factor in number of timed to get experimental
error (The more the replication the more the
precision) 23
NOTATIONS & TERMINOLOGIES cont’d

• Randomization: A means by which we


allocate treatment to plots(different)by
means of chance
• Blocking: is the mean of correcting
errors/controlling errors
• Model: simply a mathematical relationship
between yields and yields determinant
• Yield = Means + Treatment + Error
yij    i  ij
24
NOTATIONS & TERMINOLOGIES cont’d

yij    i   ij

Yield Grand Treatment Error


Mean Mean

TYPICAL ANOVA DESIGN MODEL

25
NOTATIONS & TERMINOLOGIES cont’d

ANOVA (Analysis of Variance): Simply


resolution of total variation into different
components e.g.
TSS= Total sum of squares
tss= Treatment sum of squares
BSS= Block sum of squares
SSb= Sum of squares between
SSw= Sum of squares within
26
Assumption Underlying ANOVA
1.The model is additive
-Adding up effects that make up yield
-No interaction between blocks & treatments
2.Error
- Are independent [ i.e. what happens to one in
one plot has no effect in the second plot]
-Are equal variance i.e. They have the same
variance – equivariance, homogenous and
homoscedastic
-Normal i.e. Normally independently distributed

NID(0, )
2 27
Assumption Underlying ANOVA
• If the assumption are true then,
MSB Mean Square Between
• F= MSW

Mean Square Within
 f cal

• fcal > ftab → significant

28
Types Of ANOVA Design

• 1.One-Way ANOVA: Here the treatment


(object of interest) is the only thing that
varies while others are constant in a single
factor experiment .It is also referred to as
Complete Randomized Design(CRD)
• Illustration
• 3 treatments A, B and C of a particular
object of interest. If each is replicated four
times .The design layout will feature these
t = 3, r = 4, n = t × r = 3×4 = 12 units 29
Types Of ANOVA Design cont’d

1 2 3
12 Experimental Units
4 5 6
7 8 9
10 11 12

With each treatment occupy only 4 plots by randomization

B A B
We measure the yields
C B A
of the treatments to
B C A
carry out analysis
C A C

30
Analysis
xij =jth yield of
Treatments Row
Replication Total treatment i
1 2 3 ... t
1 x11 x21 x31 …. xt1 X1 G = Grand
2 x12 x22 x32 …. xt2 X2 Total
3 x13 x23 x33 …. xt3 X3 ti = Treatment i
. . . …. Total
. . . ….
r x1r x2r x3r …. xtr
Treatment t1 t2 t3 …. tr G
Total

31
Calculation Of Sum Of Squares

• Sum of squares treatment =sum of square


between = SSt = SStr = tss = SSB = BSS
t

i
t 2
G2
i 1

= r ti = Treatment i Total
N

• Sum of Squares Total = SST = TSS


= t r
G 2

 x2 
i 1 j i
xij =jth yield of treatment i
ij
N

32
Calculation Of Sum Of Squares cont’d

• Sum of squares Error = Sum of squares


within = SSw = WSS = SSE
By subtraction
SSE = SST - SSt
G²/N = Correction factor = C.F.
G = Grand total [sum of all observations]
N = t × r [total number of all observations]

33
Hypothesis Statements
There are 2 hypotheses namely Null
Hypothesis and Alternative Hypothesis
H0 :(Null hypothesis) can be stated as
i. A = B = C = …. = t
ii. No significant difference
iii. ti = 0 all i [ i.e. no treatment effect]
iv. There is no significant difference in the
treatment effects.
34
Hypothesis Statements cont’d
H1 : (Alternative hypothesis) can be stated
as ~ H0 i.e. negation to Ho as follows
i. A ≠ B ≠ C ≠….. ≠ t
ii. There is a significant difference
iii. ti ≠ 0 for some i’s [ i.e. there is treatment
effects]
iv. There is significant difference in the
treatment effects
35
ANOVA TABLE
Sources of Degree of Sum of Mean Square (MS) fcalculated
Variation freedom Square (SS)
Treatment t-1 SSt (SSt)/(t – 1) = (A) A/B
Error t(r – 1) SSE (SSE)/t(r – 1) = (B)
Total rt – 1 SST

MSt = (SSt)/(t – 1)
MSE = (SSE)/t(r – 1)
fcalculated = fc = fcal = MSt/MSE
ftabulated = ft = ftab = f(t – 1), t(r – 1), α
Where α is the level of significance 36
Decision
If fcalculated > ftabulated, Reject H0
i.e. There is significant difference
OR
If p ≤ α, Rejects H0
Where p is the probability value
Model: yij     i   ij

37
Examples
CASE I: ONE WAY ANOVA [With Equal Replicates]
Given 3 types of treatments A, B & C with 4 replicates and
yields recorded as shown

A(2) B(2) C(3)


C(2) C(3) B(5)
B(4) A(2) A(2)
A(3) C(2) B(6)
(x) yield
Carry out ANOVA test to analyze if there is any significant
different among the treatment. Take α = 0.05

38
Examples cont’d
• H0 : A = B = C [i.e. there is no significant difference]
• H1 : A ≠ B ≠ C [i.e. there is significant difference]
r=4
Treatments
Replication t=3
A B C
1 2 2 3 N = t * r = 3*4 = 12
2 3 5 2 G = 36
3 2 4 3
4 2 6 2
TOTAL 9 17 10
36

39
Examples cont’d
t

 i
t 2
G2
SSt = i 1

r N

9 2  17 2  10 2 36 2
= 
4 12
470
 108  117.5  105  9.5
4
t r 2
G
SST = 
i

j
xij 
2

N
2
36
 22  32  22  22  .......  32  22 
12
 128  108  20 40
Examples cont’d
SSE = SST – SSt ( i.e. by subtraction)
= 20 – 9.5 = 10.5

ANOVA TABLE ( equal Replicate)


Sources of Degree of Sum of Mean Square (MS) fcalculated
Variation freedom Square (SS)
Treatment 2 9.5 4.75 4.06
Error 9 10.5 1.17
Total 11 20

Mean Square (Treatment) = 9.5/2 = 4.75


Mean Square (Error) = 10.5/9 = 1.17
41
Examples cont’d
4.75 MSt
F calculated=  4.06 i.e.
1.17 MSE
F tabulated= f 2,9,0.05  4.26
Decision: if fcalculated > ftabulated, Reject H0,
Otherwise Accept
Conclusion: since 4.06 < 4.26, i.e. fcal < ftab,
we accept H0 and conclude that there is no
significant difference among the treatments

42
Example cont’d
CASE II: ONE WAY ANOVA [Unequal Replicates]
Suppose we have treatments A, B & C with A having 3, B has
4 and C has 5 number of replicates. Other things are equal
replicates and yields recorded as shown
Treatment Yields in Kg Total
A 2 3 1 6
B 3 4 5 3 15
C 5 7 2 3 2 19
total 40
Carry out ANOVA test to analyze if there is any significant
different among the treatment. Take α = 0.05
G = 40, N = 12 (by counting)
H0 : A = B = C [i.e. there is no significant difference]
H1 : A ≠ B ≠ C [i.e. there is significant difference] 43
Examples cont’d
t

t i
2
G2
SSt = i 1
ri

N
( for unequal replication)

= 62 152 192 402


  
3 4 5 12
 140.45  133.33  7.12

t 2 r
G
SST = 
i

j
xij 
2

N
2
40
 2 2  32  22  22  .......  32  22 
12
 164  133.33  30.67 44
Examples cont’d
SSE = SST – SSt ( i.e. by subtraction)
= 30.67 – 7.12 = 23.55

ANOVA TABLE ( unequal Replicate)


Sources of Degree of Sum of Mean Square (MS) fcalculated
Variation freedom Square (SS)
Treatment 2 7.12 3.56 1.5
Error 9 23.55 2.62
Total 11 30.67

Mean Square (Treatment) = 7.12/2 = 3.56


Mean Square (Error) = 23.55/9 = 2.62
45
Examples cont’d
3.56 MSt
F calculated=  1.5 i.e.
2.62 MSE
F tabulated= f 2,9,0.05  4.26
Decision: if fcalculated > ftabulated, Reject H0,
Otherwise Accept
Conclusion: since 1.5 < 4.26, i.e. fcal < ftab,
we accept H0 and conclude that there is no
significant difference among the treatments

46
TWO- WAY ANOVA
Here the plots can be classified by a factor into
groups of homogenous series. Each treatment
must appear in every group or block and each
group or block must receive every treatment. It
is also called Complete Randomized Block
Design (CRBD). Its main objective is to isolate
or reduce the residual. In CRBD
i. Intra group (within) variation is minimal
ii. Inter group (between) variation is maximal
iii. Number of plots in each group equal an
exact multiple of number of treatments
47
ILLUSTRATION AND LAYOUT
6 Treatments ( A, B, C, D, E and F)
3 Blocks (B1, B2 and B3)
Note: In a block, each of the treatments must
appear equal number of time.
Also, Number of Replicates = Number of Blocks
B1 B2 B3
B A F D
C E
E F C E
B A
C D A B
D F

48
Analysis
Treatments Row Block
Blocks Total mean
1 2 3 ... k
1 x11 x12 x13 …. x1k B1. x1.
2 x21 x22 x23 …. x2k B2.
x2.
3 x31 x32 x33 …. x3k B3. x3.
. . . …. . .
. . . …. . .
n xn1 xn2 xn3 …. xnk Bn. .
Total T.1 T.2 T.3 …. T.k T..
Treatment
mean x.1 x.2 x.3 x..
xij = yield of jth treatment in ith block
G = T.. = Grand Total 49
CALCULATION OF SUM OF SQUARES

T.j = Treatment j total


Bi. = Block i total
x. j = Treatment j mean

x = Block I mean
i.

Sum of Squares Total = SST = TSS


n k
G2
= 
i j
xij2 
nk

Sum of Squares Treatment = SSt = tSS


k

T
j 1
.j
2

G2
=  50
n nk
CALCULATION OF SUM OF SQUARES CONT’D
Sum of Squares Block = SSB = BSS
n

= B i.
2
G2
i 1

k nk

Error Sum of Squares = SSE = ESS


SSE = SST – SSt – SSB (By Subtraction)

Corresponding Degree of Freedom


(kn – 1) = (k – 1) + (n – 1) + (k – 1)(n – 1)
Total = Treatment + Block + Error

51
HYPOTHESIS STATEMENTS
In two-way ANOVA, hypotheses are stated for
each of the two factors namely treatments and
blocks.
For Treatments
H0 : i. A = B = C = ……. = t
ii. No significant difference
iii.  i  0 all i ( i.e. No treatment effects)
iv. There is significant difference in the
treatment effects
H1 : ~ H0 ( i.e. H0 is false)
52
HYPOTHESIS STATEMENTS Cont’d
For Blocks
H0 : i. B1 = B2 = B3 = ……. = Bn
ii. No significant difference
iii. i  0 all i ( i.e. No block effects)
iv. There is significant difference in the
block effects
H1 : ~ H0 ( i.e. H0 is false)

53
ANOVA TABLE( Two-way, CRBD
Sources of Degree of Sum of Mean Square (MS) fcalculated
Variation freedom Square (SS)
Treatment k-1 SSt (SSt)/(k – 1) = (A) A/C
Block n-1 SSB (SSB)/(n – 1) = (B) B/C
Error (k – 1)(n – 1) SSE (SSE)/(k -1)(r -1) = (C)
Total nt – 1 SST

MSt = (SSt)/(k – 1)
MSB = (SSB)/(n – 1)
MSE = (SSE)/(k – 1)(r – 1)
fcalculated = fc = fcal = MSt/MSE : for Treatments
fcalculated = fc = fcal = MSB/MSE : for Blocks
ftabulated = ft = ftab = f(k – 1), (k – 1) (n – 1) , α : for Treatments
ftabulated = ft = ftab = f(n – 1), (k – 1) (n – 1) , α : for Blocks
54
Where α is the level of significance
DECISION
fcal is distributed as Fisher’s F with
i. k – 1 as numerator degree of freedom and
(k – 1)(n – 1) as denominator degree of
freedom for treatments
ii. n – 1 as numerator degree of freedom and
(k – 1)(n – 1) as denominator degree of
freedom for treatments for blocks
Each of these is compared with the critical
value of F distribution as required at a
specified significant level α.
55
If f calculated is greater than respective
critical values of the f–distribution, the Null
hypothesis is rejected.
i.e. fcalculated > ftabulated, Reject H0, otherwise
yij     j i
   ij

Yield Grand Block Treatment Error


Mean Mean Mean

TYPICAL ANOVA DESIGN MODEL 2- WAY


56
EXAMPLE
Given the learning results data on 3 teaching
methods applied to 5 different age groups.
Compute CRBD (2-Way ANOVA) to analyze
the data assuming a significant level, α=0.05
Age Teaching Methods
group Total
A B C
< 20 7 9 10 26
20 – 29 8 9 10 27
30 – 39 9 9 12 30
40 – 49 10 9 12 31
>50 11 12 14 37
Total 45 48 58 151

57
Solution to example
H0 : Teaching methods are not different for Teaching
H1 : Teaching methods are different methods

H0 : Age group has no effect for Age


group
H1 : Age group has significant effect
n = 5, k =n 3k 2
G
SST =   ij 
2
x
i j N
1512
 7  8  9  10  .......  12  14 
2 2 2 2 2 2

15
 1, 567  1, 520.07  46.93

58
Solution to example cont’d
n

SSt = T
j 1
.j
2

G2

k nk

452  482  582 1512


= 
5 5*3
 1, 539.6  1, 520.07  18.53
k
SSB = T i.
2
G2
i 1

n nk
262  27 2  302  312  37 2 1512

3 15
 1, 545  1, 520.07  24.93

SSE = SST – SSB – SSt ( by substraction)


= 46.93 – 24.93 – 18.53 = 3.47 59
ANOVA TABLE (CRBD)
Sources of Degree of Sum of Mean Square (MS) fcalculated
Variation freedom Square (SS)
Treatment 2 18.53 9.265 21.35
Age group 4 24.93 6.23 14.35
Error 8 3.47 0.434
Total 14 46.93

F tabulated = f 2,8,0.05  4.46 : for Teaching methods


F tabulated = f 4,8,0.05  3.84 : for Age groups
Decision: if fcalculated > ftabulated, Reject H0,
Conclusion: The fcal computed for both factors
greater than ftab. We therefore reject H0’s and
conclude that teaching method are different and age
group have significant effects on learning results. 60

You might also like