0% found this document useful (0 votes)
17 views7 pages

ANOVA Models

Chapter 7 introduces Analysis of Variance (ANOVA) as a method for hypothesis testing concerning two or more population means. It explains key terms such as factors, factor levels, and treatments, and outlines the one-way ANOVA model, assumptions, and the process for estimating parameters. The chapter also covers the analysis of variance table, degrees of freedom, mean squares, and post-hoc analysis methods like the Least Significant Difference (LSD) method for further comparisons among treatment means.

Uploaded by

charlotte
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views7 pages

ANOVA Models

Chapter 7 introduces Analysis of Variance (ANOVA) as a method for hypothesis testing concerning two or more population means. It explains key terms such as factors, factor levels, and treatments, and outlines the one-way ANOVA model, assumptions, and the process for estimating parameters. The chapter also covers the analysis of variance table, degrees of freedom, mean squares, and post-hoc analysis methods like the Least Significant Difference (LSD) method for further comparisons among treatment means.

Uploaded by

charlotte
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

HASTS112/HSTS112

CHAPTER 7: INTRODUCTION TO ANALYSIS OF


VARIANCE (ANOVA) MODELS

Introduction
We have dealt with hypothesis testing under a number of different conditions,
for example, χ2 test, test for the significance of regression parameters, tests
concerning population means or proportions, etc. We saw that the relation-
ship between paired variables may be analyzed by using regression analysis.

This chapter introduces yet another method for testing hypotheses about
the population by using a set of sample data. Analysis of variance deals
specifically with tests concerning two or more population means. The ad-
vantage of using analysis of variance is that it enables us to test hypotheses
concerning several means. It is therefore a general test, since we are able to
test for equality of a number of means at one time.

Analysis of Variance Models


The terminology used in ANOVA is slightly different from the one we en-
countered in regression analysis.

Main Terms Used in ANOVA


(1) Factor - A factor is an independent variable (usually categorical) whose
effects on the response are to be studied in an investigation. Thus an
independent variable is now referred to as a factor.

(2) Factor Level - This is a particular value or level of that factor. Let the
independent variable be X (where X represents say seed type). Suppose
that we have three different categories of seed, such as: Pannar, Seedco
and Pioneer. Each of these seeds is a level of the factor. Investigations
differ as to the number of factor studied, some are single factor studies,

1
where only one factor is of concern and some are multifactor studies,
where more than one factor is of concern.

(3) Treatment - In a single factor study, a treatment corresponds to a


factor level and in multifactor studies, a treatment corresponds to a
combination of factor levels.

One-Way ANOVA
ANOVA was developed to analyse randomised experiments where the treat-
ments are assigned at random (as opposed to an observation study where
a researcher has to take the treatments as they come). Suppose we have a
treatments or different levels of a single factor that we wish to compare. The
observed response from each of the a treatments is a random variable.

The data would appear as in table below;

Treatments
1 2 3 . . . a
y11 y21 y31 . . . ya1
y12 y22 y32 . . . ya2
Observations y13 y23 y33 . . . ya3
. . . .
. . . .
. . . .
y1n1 y2n2 y3n3 . . . yana

If n1 = n2 = n3 = ... = na , we call the model a balanced design model


otherwise it will be unbalanced design. The observations are described by
the linear statistical model

yij = µ + τi + ij

where,
yij is the (ij)th observation of the response variable in the j th trial for the ith
factor level or treatment.
µ is a constant term.

2
τi is the effect of the ith factor level or treatment, and
ij is a random error component.

Assumptions
The model errors (ij are assumed to be normally and independently dis-
tributed random variables with mean zero and variance σ 2 . The variance σ 2
P
is assumed constant for all levels of the factor. Also the constrain τ i = 0
is imposed on the τi to enable estimation.
The model above is called one-way analysis of variance because only one fac-
tor is being investigated

The model can also be stated as yij = µi + ij , where µi ’s are parameters,
(µi = µ + τi )

NOTE

1. The observed value yij in the j th trial for the ith factor level or treatment
is the sum of the following components

(a) a constant term µi and


(b) a random error term ij .

2. E(yij ) = µi since E(ij ) = 0

3. Thus, all observations for the j th factor level have the same expectation.

4. Since mui is a constant, it follows that V ar(yij ) = V ar(ij ) = σ 2 , that


is homogeneous variance.

5. yij is normally distributed since eij ∼ N (0, σ 2 )

The task is now to estimate the parameters in the ANOVA model. From the
data we calculate the means of each level as follows. First, redraw the table
and add level sums and means
yi. is the sum over all j for each i,
y¯i. is the mean of all j for each i,
y.. is the grand total for all observations and y¯.. represent the grand average
of all observations. This is expressed symbolically as

3
Treatments
1 2 3 . . . a Total
y11 y21 y31 . . . ya1
y12 y22 y32 . . . ya2
Observations y13 y23 y33 . . . ya3
. . . .
. . . .
. . . .
y1n1 y2n2 y3n3 . . . yana
Total y1. y2. y3. . . . ya. y..
Mean y¯1. y¯2. y¯1. . . . y¯a. y¯..

yi. = nj=1 y which implies y¯i. = yni.i


P i
Pa Pijni
y.. = i=1 j=1 yij which implies y¯.. = yN..
where N = n1 + n2 + ... + na and for balanced design model we have N = an,
thus total number of observations.

We then estimate the parameters µ and the τi ’s for i = 1, 2, 3, ..., a as follows

µ̂ = ȳ..

τˆi = ȳi. − ȳ..


For instance τˆ3 = ȳ3. − ȳ..

Once we have estimated the parameters, we should be able to use our model
to test hypotheses relating to the model and parameters. The most common
and perhaps most important hypotheses we test in ANOVA is
H0 : τ1 = τ2 = ... = τa = 0
H1 : τi 6= 0, for at least one i

An equivalent way to write the above hypothesis in terms of the treatment


means µi , that is
H0 : µ1 = µ2 = ... = µa
H1 : µi 6= µj , for i 6= j

What these null hypotheses really says is there is no treatment effect. Thus,

4
the different levels of independent variables do not differ in their effect on the
dependent variable (response). The appropriate procedure for testing that
the treatment effects (τi ’s) are zero is the analysis of variance.

The name analysis of variance is derived from a partitioning of total vari-


ability into its components parts.

Sum of Squares
The total sum of squares is given by
ni
a X ni
a X
(yij − ȳ.. )2 = yij2 − N ȳ..2
X X
SST =
i=1 j=1 i=1 j=1

SST can be partitioned into a sum of squares of differences between treat-


ment averages and grand average, plus a sum of squares of differences of
observations within treatments from the treatment average, that is

SST = SStreatments + SSE

where,
ni ni
2 yi.2
− N ȳ..2
X X
SStreatments = n (ȳi. − ȳ.. ) =
i=1 i=1 n
and ni
a X
(yij − ȳi. )2
X
SSE =
i=1 j=1

Note: SSE = SST − SStreatments

Degrees of Freedom
There are an = N total observations, thus SST has N − 1 d.f. There are
a levels of the factor (and a treatment means) so SStreatments has a − 1 d.f.
Finally, within any treatment there are n replicates providing n − 1 d.f with
which to estimate the experimental error. Since there are a treatments, we
have a(n − 1) = an − a = N − a d.f for the error.

5
Mean Squares
We divide the sum of squares by there corresponding degrees of freedom to
obtain mean squares. However there are two important mean squares that
is treatment and error mean squares.
SStreatments SSE
M Streatments = , M SE =
a−1 N −a
If the null hypothesis is true, then the ratio;
M Streatments
F = ∼ F (a − 1, N − a)
M SE
Testing α level of significance, we would reject H0 if; F > Fα (a − 1, N − a)

Analysis of Variance Table for the One-Way Fixed Ef-


fects Model

Source of Variation SS d.f MS F


M Streatments
Between treatments SStreatments a-1 M Streatments F = M SE
Error (within treatments) SSE N −a M SE
Total SST N −1

Multiple Comparisons (Post-Hoc) in ANOVA


When the computed value of the F-statistic in single factor ANOVA is not
significant, the analysis is terminated because no differences between the
τi ’s or µi ’s have been identified. But when H0 is rejected, the investigator
will usually want to know which of the treatments or means are different
from each other. A technique for carrying out this further analysis is called
a multiple comparisons procedure or Post-Hoc analysis. Post-Hoc analysis
is done if it has been shown that indeed there are differences amongst the
means. Specifically, Post-Hoc tests are done when;

1. You reject H0 and

6
2. There are three or more treatments (groups)
There are a number of such procedures in Statistics;
1. Tukey’s HSD Procedure (i.e. Honestly Significant Difference) - This
test can be used only when the groups are all of the same six.
2. Scheffe’s Procedure - This is used when the groups are not all of the
same size and when they are of the same size.
3. Least Significant Difference - can be used in both cases, that is, if
groups are of different size and also if they are of the same size.
For this course we shall use the LSD method.

The Least Significance Difference (LSD) Method


Suppose that, following an analysis of variance F test where the null hypoth-
esis is rejected, we wish to test H0 : µi = µj for all i 6= j. This could be
done by employing the test statistic;
ȳi. − ȳj.
t= r  
M SE n1i − n1j

Assuming a two sided-alternative, the pair of means r µi and µj would be de-


 
clared significantly different if |ȳi. − ȳj. | > t α2 (N − a) M SE n1i − n1j

The quantity v
u !
u 1 1
LSD = t α2 (N − a)tM SE −
ni nj
is called the least significant difference and if the design is balanced, then
n1 = n2 = ... = na = n, and
s
2M SE
LSD = t α2 (N − a)
n
To use the LSD procedure, simply compare the observed difference between
each pair of averages to the corresponding LSD, that is, if |ȳi. − ȳj. | > LSD,
we reject H0 and conclude that the population means µi and µj differ.

Example

You might also like