0% found this document useful (0 votes)
21 views13 pages

Lecture 2

The document provides lecture notes on analysis of variance (ANOVA), focusing on its application in testing hypotheses about population means across multiple groups. It explains the methodology, including one-way and two-way ANOVA, the assumptions required, and the use of the F distribution for hypothesis testing. Practical examples and steps for conducting ANOVA tests are also included to illustrate the concepts.

Uploaded by

harawataona
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views13 pages

Lecture 2

The document provides lecture notes on analysis of variance (ANOVA), focusing on its application in testing hypotheses about population means across multiple groups. It explains the methodology, including one-way and two-way ANOVA, the assumptions required, and the use of the F distribution for hypothesis testing. Practical examples and steps for conducting ANOVA tests are also included to illustrate the concepts.

Uploaded by

harawataona
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

AAE-223: Statistics for Economist 2

Lecture Notes 2

Assa Mulagha-Maganga
Dept of Agricultural and Applied Economics, LUANAR
Department of Mathematical Sciences (Statistics), Chancellor College

Summer 2022 (27 Oct 2022)

2 Analysis of variance
This lecture introduces the first in a series of lectures devoted to linear models. The topic of
this chapter, analysis of variance, provides a methodology for partitioning the total variance
computed from a data set into components, each of which represents the amount of the
total variance that can be attributed to a specific source of variation. The results of this
partitioning can then be used to estimate and test hypotheses about population variances and
means. In this chapter we focus our attention on hypothesis testing of means. Specifically,
we discuss the testing of differences among means when there is interest in more than two
populations or two or more variables. The techniques discussed in this chapter are widely
used in business or health sciences.

2.1 Introduction to analysis of variance


In this unit you will learn about analysis of variance, when and how it is applied in real life
scenarios. Analysis of variance is called ANOVA in its short form. Further we will look at
two common ways of how ANOVA is conduction; one-way ANOVA and two-way ANOVA.
The analysis of variance is used to examine or test hypothesis for more than two popu-
lations (or groups). For example, testing means of more than two groups or populations
(m-populations). Whereas the chi-square test can be used to test the differences among
several population proportions, the analysis of variance can be used to test the differences
among several population means. The null hypothesis is that the several population means
are mutually equal. The sampling procedure used is that several independent random sam-
ples are collected, one for each of the data categories (treatment levels). The assumption
underlying the use of the analysis of variance is that the several sample means were obtained
from normally distributed populations having the same variance.

1
Statistics for Economists 2

The probability distribution used in this unit is the F distribution. It was named to honor Sir
Ronald Fisher, one of the founders of modern-day statistics. This probability distribution is
used as the distribution of the test statistic for several situations. It is used to test whether
two samples are from populations having equal variances, and it is also applied when we
want to compare several population means simultaneously.
What are the characteristics of the F distribution?

1. The F distribution is continuous. This means that it can assume an infinite number
of values between zero and positive infinity.

2. The F distribution cannot be negative. The smallest value F can assume is 0.

3. It is positively skewed. The long tail of the distribution is to the right-hand side. As
the number of degrees of freedom increases in both the numerator and denominator
the distribution approaches a normal distribution.

4. It is asymptotic. As the values of X increase, the F curve approaches the X-axis but
never touches it. This is similar to the behavior of the normal distribution.

To use ANOVA, we assume the following:

1. The populations follow the normal distribution.

2. The populations have equal standard deviations (a).

3. The populations are independent.

When these conditions are met, F is used as the distribution of the test statistic

2.2 The ANOVA Test


How does the AN OVA test work? Recall that we want to determine whether the various
sample means came from a single population or populations with different means. We
actually compare these sample means through their variances. To explain, recall that on
earlier we listed the assumptions required for ANOVA. One of those assumptions was that
the standard deviations of the various normal populations had to be the same. We take
advantage of this requirement in the ANOVA test. The underlying strategy is to estimate
the population variance (standard deviation squared) two ways and then find the ratio of
these two estimates. If this ratio is about 1, then logically the two estimates are the same,
and we conclude that the population means are the same. If the ratio is quite different from
1, then we conclude that the population means are not the same. The F distribution serves
as a referee by indicating when the ratio of the sample variances is too much greater than 1
to have occurred by chance.

By Assa Mulagha-Maganga, Lilongwe University of Agriculture and Natural Resources 2


Statistics for Economists 2

2.3 One-way analysis of variance


The one-way analysis of variance procedure is concerned with testing the difference among
k sample means when the subjects are assigned randomly to each of the several treatment
groups. The following notation will be used when samples from k different normally dis-
tributed populations having equal population variances are selected in order to test for the
equality of the means of the k populations:
The sample size, sample mean, and sample variance for the ith population are represented
by ni , x̄i , and s2i , respectively. The total sample size is n = n1 + n2 + · · · + nk . The overall
mean for all n sample values is represented by x̄. The population mean for the ith population
is represented by µi and the standard deviation for the ith population is represented by σi .
The between samples variation is measured by the between treatments mean square and is
represented by MSTR. The expression for MSTR is given by formula:

SST R
M ST R =
k−1
The numerator of formula, SST R, is called the treatment sum of squares, and is computed
by using formula:

SST R = n1 (x̄1 − x̄)2 + n2 (x̄2 − x̄)2 + · · · + nk (x̄k − x̄)2

The within samples variation is measured by the error mean square and is represented by
M SE. The expression for M SE is given by

SSE
M SE =
n−k
The numerator of formula, SSE, is called the error sum of squares, and is computed by using
formula below, where s1 , s2 , . . . sk are the sample variances.

SSE = (n1 − 1)s21 + (n2 − 1)s22 + · · · + (nk − 1)s2k

The denominator k − 1, is called the degrees of freedom for treatments and the denominator,
n − k, is called the degrees of freedom for error.
The sum of the treatment sum of squares and the error sum of squares is called the total sum
of squares. The total sum of squares is represented by SST and is given by: SST = SST R +
SSE. The total sum of squares may be computed directly by using formula SST = Σ(x− x̄)2
, where the sum is over all n sample values. The degrees of freedom for total is equal to
n − 1.
The results of the computations in the proceeding sections are usually conveniently displayed
in a one-way ANOVA table. The general structure of the one-way ANOVA table is given as:
Table 2.1: ANOVA Table

By Assa Mulagha-Maganga, Lilongwe University of Agriculture and Natural Resources 3


Statistics for Economists 2

Source df SS MS = SS/df F statistic


Treatment K -1 SSTR MSTR F
Error n-k SSE MSE
Total n-1 SST

What are the Steps for Testing the Equality of Means Using the One-Way ANOVA Proce-
dure?
Step 1: State the null and alternative hypotheses as follows:
H0 : µ1 = µ2 = · · · = µk Ha : All k means are not equal.
Step 2: Use the F distribution table and the level of significance, α, to determine the
rejection region.
Step 3: Build the ANOVA table, and from the table determine the computed value of the
F ratio.
Step 4: State your conclusion. The null hypothesis is rejected if the computed value of the
test statistic falls in the rejection region. Otherwise, the null hypothesis is not rejected.
Example 7.1
Fifteen students at LUANAR are randomly assigned to three different schools, all of which
are concerned with developing a specified level of skill in agricultural economics. The achieve-
ment test scores at the conclusion of the instructional unit are reported in Table 7.2, along
with the mean performance score associated with each instructional approach. Use the anal-
ysis of variance procedure in Section to test the null hypothesis that the three sample means
were obtained from the same population, using the 5 percent level of significance for the test.
Table 7.2:

Schools Test scores Total scores Mean score


Bunda Campus (A1) 86 79 81 70 84 400 80
City Campus (A1) 90 76 88 82 89 425 85
ODL (A3) 82 68 73 71 81 375 75

Solution
Step 1
H0 : µ1 = µ2 = µ3
H1 : Not all µ1 = µ2 = µ3
Step 2
Critical F (df = k − 1, n − k; α = 0.05) = F (2, 12; α = 0.05) = 3.89
The is obtained from the F -distribution tables with alpha level of 0.05 or 5% as;
P
x
Step 3 The overall mean of all 15 test scores is x̄ = n
= 1200
15
= 80

By Assa Mulagha-Maganga, Lilongwe University of Agriculture and Natural Resources 4


Statistics for Economists 2

Figure 1:

By Assa Mulagha-Maganga, Lilongwe University of Agriculture and Natural Resources 5


Statistics for Economists 2

P
x1i
The mean of n1 is x̄1 = n
= 86+79+81+70+84
5
= 80
P
x2i
The mean of n2 is x̄2 = n
= 90+76+88+82+89
5
= 85
P
x2i
The mean of n3 is x̄3 = n
= 82+68+73+71+81
5
= 75
Therefore using the formular for MSTR we get,

SST R n1 (x̄1 − x̄) + n2 (x̄2 − x̄) + n3 (x̄3 − x̄)


M ST R = =
k−1 k−1
5(80 − 80) + 5(85 − 80) + 5(75 − 80)
=
3−1
250
=
2
= 125

From the above, SST R is 250 and k − 1 is 2


The variance for each of the three samples is

(86 − 80)2 + (79 − 80)2 + (81 − 80)2 + (70 − 80)2 + (84 − 80)2
S12 = = 38.5
5−1

(90 − 85)2 + (76 − 85)2 + (88 − 85)2 + (82 − 85)2 + (89 − 85)2
s22 = = 35.0
5−1

(82 − 75)2 + (68 − 75)2 + (73 − 75)2 + (71 − 75)2 + (82 − 75)2
s23 = = 38.5
5−1
Then the

(n1 − 1)S12 + (n2 − 1)S22 + (n3 − 1)S32


M SE =
5−1
(4)(38.5) + (4)(35) + (4)(38.5)
= = 38.5
15 − 3
448
=
12
= 37.33

From the above, SSE is 44 and n − k is 12


Because the calculated F statistic of 3.35 is not greater than the critical F value of 3.89,
the null hypothesis that the mean test scores for the three schools in the population are all
mutually equal cannot be rejected at the 5 percent level of significance.
This information can be presented in ANOVA table as:

By Assa Mulagha-Maganga, Lilongwe University of Agriculture and Natural Resources 6


Statistics for Economists 2

Source df SS MS = SS/df F statistic


Treatment 3-1 =2 250 250/2=125 F= 125/37.33=3.35
Error 15-3=12 448 448/12=37.33
Total 15-1=14 698

Practical Activity
1. Define the meaning of the terms response variable, factor, treatments, and experimental
units.
Solution

• factor = Independent variables in a designed experiment


• treatments = Values of a factor (or combination of factors)
• experimental units = The entities to which the treatments are assigned
• response variable = The variable of interest in an experiment; dependent variable

2. Explain the assumptions that must be satisfied in order to validly use the one-way
ANOVA formulas.
Solution

• Constant variance, normality, independence

3. Explain the difference between the between-treatment variability and the within-
treatment variability when performing a one-way ANOVA.
Solution

• SST = variability of the sample treatment means


• SSE = variability within each sample

4. Explain why we conduct pairwise comparisons of treatment means


Solution

• If the one-way ANOVA F test leads us to conclude that at least two of the treat-
ment means differ, then we wish to investigate which of the treatment means
differ and we wish to estimate how large the differences are.

5. A consumer preference study compares the effects of three different bottle designs (A,
B, and C) on sales of a popular fabric softener. A completely randomized design is
employed. Specifically, 15 supermarkets of equal sales potential are selected, and 5 of
these supermarkets are randomly assigned to each bottle design. The number of bottles
sold in 24 hours at each supermarket is recorded. The data obtained are displayed in
Table below. Let µA , µB , and µC represent mean daily sales using bottle designs A, B,
and C, respectively. Test the null hypothesis that µA , µB , and µC are equal by setting
That is, test for statistically significant differences between these treatment means at

By Assa Mulagha-Maganga, Lilongwe University of Agriculture and Natural Resources 7


Statistics for Economists 2

the .05 level of significance. Based on this test, can we conclude that bottle designs A,
B, and C have different effects on mean daily sales?

Table 1: Bottle designs

A B C
16 33 23
18 31 27
19 37 21
17 29 28
13 34 25

Solution

• F = 43.36, Reject H0 : bottle design does have an impact on sales.

2.4 Two-way analysis of variance


Many response variables are affected by more than one factor. Because of this we must
often conduct experiments in which we study the effects of several factors on the response.
In this section we consider studying the effects of two factors on a response variable. The
two-way ANOVA compares the mean differences between groups that have been split on
two independent variables (called factors). The primary purpose of a two-way ANOVA is to
understand if there is an interaction between the two independent variables on the dependent
variable.
Suppose that an agricultural experiment consists of examining the yields per acre of 4 differ-
ent varieties of wheat, where each variety is grown on 5 different plots of land. Thus, a total
of 20 plots are needed. It is convenient in such case to combine the plots into blocks, say 4
plots to a block, with a different variety of wheat grown on each plot within a block. Thus
5 blocks would be required here. In this case, there are two classifications, or factors, since
there may be differences in yield per acre due to (1) the particular type of wheat grown or
(2) the particular block used (which may involve different soil fertility, etc.).
By analogy with the agricultural experiment example, we often refer to the two factors in an
experiment as treatments and blocks, but of course we could simply refer to them as factor
1 and factor 2.
Assuming that we have a treatments and b blocks, we construct Table where it is supposed
that there is one experimental value (such as yield per acre) corresponding to each treatment
and block. For treament j and block k, we denote this value by Xjk . The mean of the entries
in the jth row is denoted by x̄j. , where j = 1, . . . , a, while the mean of the entries in the kth
column is denoted by x̄.k , where k = 1, . . . , b. The overall, or grand, mean is denoted by x̄¯.
in symbols, this is shown as:

By Assa Mulagha-Maganga, Lilongwe University of Agriculture and Natural Resources 8


Statistics for Economists 2

1X b
x̄j. = xjk
b k=1

1X a
x̄.k = xjk
a j=1

¯ 1 Xab
x̄ = xjk
ab j,k

Block
1 2 ··· b
Treatment 1 X11 X12 · · · X11 x̄1.
Treatment 2 X21 X22 · · · X11 x̄2.

Treatment a Xa1 Xa2 ··· X11 x̄a.


x̄.1 x̄.2 ··· x̄.b
The ANOVA procedure for a two-factor factorial experiment partitions the total sum of
squares (SSTO) into four components: the factor 1 sum of squares– SS(1), the factor 2 sum
of squares–SS(2), the interaction sum of squares–SS(int), and the error sum of squares–SSE.
The formula for this partitioning is as follows:

SST O = SS(1) + SS(2) + SS(int) + SSE

But in this lesson we will for now ignore the interaction: hence,

SST O = SS(1) + SS(2) + SSE

Example

Table 3 gives fresh graduates daily earnings (in thousands of MK) of former students with
bachelor’s degrees from 5 colleges and for 3 class rankings at graduation. Test at the 5%
level of significance that the means are identical (a) for college populations and (b) for
class-ranking populations.
Table 2.2
Class rank Bunda Chanco Poly Medicine Nursing Sample
mean
Top 20 18 16 14 12 16
Middle 19 16 13 12 8 14
Bottom 18 14 10 10 10 12
Sample 19 16 13 12 10 14
mean

By Assa Mulagha-Maganga, Lilongwe University of Agriculture and Natural Resources 9


Statistics for Economists 2

Solution

Step 1: The hypotheses to be tested are

H0 : µ1 = µ2 = µ3 = µ4 = µ5

H1 : µ1 = µ2 = µ3 = µ4 = µ5 are not equal

We define each of these sums of squares and show how they are calculated for the
bakery demand data as follows (note that a = 3, b = 5) : Where µ refers to the various
means for factor A (school) populations

Step 2: Calculate SSTO, which measures the total amount of variability:

a X
b
SST O = (xjk − x̄¯)
X

j=1 k=1

= (20 − 14)2 + (18 − 14)2 + (16 − 14)2 + (20 − 14)2 + (20 − 14)2
+ (19 − 14)2 + (16 − 14)2 + (13 − 14)2 + (20 − 14)2 + (20 − 14)2
+ (18 − 14)2 + (14 − 14)2 + (10 − 14)2 + (20 − 14)2 + (20 − 14)2
= 36 + 16 + 4 + 0 + 4 + 25 + 4 + 1 + 4 + 16 + 16 + 0 + 16 + 16 + 36
= 194

Step 3: Calculate SS(a), which measures the amount of variability due to the different
levels of factor a:

3
SS(a) = b (x̄j. − x̄¯)2
X

j=1

SS(a) = b[(x̄1. − x̄¯)2 + (x̄2. )2 + (x̄3. − x̄¯)2


= 5[(16 − 14)2 + (14 − 14)2 + (12 − 14)2
= 5(4 + 0 + 4)
= 40

Step 4: Calculate SS(b), which measures the amount of variability due to the different
levels of factor b (colleges):

SS(b) = a (x̄.k − x̄¯)2


X

By Assa Mulagha-Maganga, Lilongwe University of Agriculture and Natural Resources 10


Statistics for Economists 2

SS(b) = a[(x̄.1 − x̄¯)2 + (x̄.2 − x̄¯)2 + (x̄.3 − x̄¯)2 + (x̄.4 − x̄¯)2 + (x̄.5 − x̄¯)2 ]
= 3[(19 − 14)2 + (16 − 14)2 + (13 − 14)2 + (12 − 14)2 + (10 − 14)2 ]
= 3(23 + 4 + 1 + 4 + 14)
= 150

Step 5: Calculate SSE, which measures the amount of variability due to the error:

SSE = SST − SSA − SSB = 194 − 150 − 40 = 4

These results are summarized in Table 2.3. From F distribution table, F=3.84 for
degrees of freedom 4 and 8 and ∝= 0.05. Since the calculated F=70, we reject H0 and
accept H1 , that the population means of fresh graduates’ earnings for the 5 colleges
are different.
Table 2.3 Two-Factor ANOVA Table for First-Year Earnings
Variation sum of squares Degree of Mean square F
Freedom
Expllained by SSA = 40 b-1=4 MSA=150/4=37.5 MSA/MSE=70
Schools (B)
(between
columns)
Explained by SSB=150 a-1=2 MSB=40/2=20 MSB/MSE=40
ranking (A)
(between rows)
Error or SSE=4 (a-b)(b-1)=8 MSE=4/8=0.5
unexplained
Total SST=194 ab-1=14

(b) The hypotheses to be tested for class rankings is given by

H0 : µ1 = µ2 = µ3
H1 : µ1 = µ2 = µ3 are not equal
Where µ refers to the various means for factor B (class-ranking) populations. From Table
2.3, we get that the calculated value of F = M SB/M SE = 40. Since this is larger than the
tabular value of F = 4.46 for df 2 and 8 and ∝= 0.05, we reject H0 and accept H1 , that the
population means of first-year earnings for the 3 class rankings are different. Thus, the type
of school and class ranking are both statistically significant at the 5% level in explaining
differences in first-year earnings. The preceding analysis implicitly assumes that the effects
of the two factors are additive (i.e., there is no interaction between them).

By Assa Mulagha-Maganga, Lilongwe University of Agriculture and Natural Resources 11


Statistics for Economists 2

Activity:
1. Table 2.4 gives the km per litre of petrol for 4 different filling stations in Lilongwe for
5 days. Assume that the km per litre for each filling station is normally distributed
with equal variance. Should the hypothesis of equal population means be accepted or
rejected at the 5% level of significance?
Table 2.4
Filling station 1 Filling station 2 Filling station 3 Filling station 4
12 12 16 17
11 14 14 15
12 13 15 17
13 15 13 16
11 14 14 18
Answer: Rejected

2. Table 2.5 gives the miles per litre of petrol for each of 4 different filling stations and
3 types of car (heavy, medium, and light) in a completely randomized design. Should
the hypothesis be accepted at the 1% level of significance that the population means
are the same for each (a) filling station? (b) Type of car?
Table 2.5
Type of Filling station 1 Filling station 2 Filling station 3 Filling station 4
Car
Heavy 8 9 9 10
Medium 16 15 18 17
Light 24 26 28 30
Answer: (a) Yes (b)No

3. Table 2.6 gives sales data for soap with each of 3 different packaging and 4 different
varieties of groundnuts in a completely randomized design. Should the hypothesis be
accepted at the 5% level of significance that the population means are the same for
each (a) packaging? (b) variety?
Table 2.6 Groundnut sales for of 3 package wrappings and 4 varieties
Parkaging 1 Parkaging 2 Parkaging 3
Manipinta 87 78 90
Chalimbana 79 79 84
Kalisere 83 81 91
CG7 85 83 89
Answer: (a)No (b) Yes

4. Table below gives the outputs of an experimental farm that used each of four fertilizers
and three pesticides such that each plot of land had an equal probability of receiving
each fertilizer-pesticide combination (completely randomized design).

By Assa Mulagha-Maganga, Lilongwe University of Agriculture and Natural Resources 12


Statistics for Economists 2

Output with 4 fertilizers and 3 pesticides


Fertilizer 1 Fertilizer 2 Fertilizer 3 Fertilizer 4
Pesticide 1 21 12 9 6
Pesticide 2 13 10 8 5
Pesticide 3 8 8 7 1

a. Find the average output for each fertilizer X̄.j , for each pesticide X̄i. and for
¯.
the sample as a whole X̄
b. Find the total sum of squares, SST, the sum of squares for fertilizer or factor
A, SSA, for pesticides or factor B, SSB, and for the error or unexplained residual,
SSE.
c. Find the degrees of freedom for SSA, SSB, SSE, and SST.
d. Find MSA, MSB, MSE, MSA/MSE, and MSB/MSE.

By Assa Mulagha-Maganga, Lilongwe University of Agriculture and Natural Resources 13

You might also like