0% found this document useful (0 votes)
21 views30 pages

Lecture Two 2019-2020

This lecture covers the principles and applications of one-way analysis of variance (ANOVA), including its assumptions, hypotheses, and the calculation of sum of squares. It explains how to interpret the F statistic and perform tests to determine if there are significant differences among multiple population means. Additionally, the lecture introduces the Kruskal-Wallis test for non-normally distributed populations.

Uploaded by

affrohbarbara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views30 pages

Lecture Two 2019-2020

This lecture covers the principles and applications of one-way analysis of variance (ANOVA), including its assumptions, hypotheses, and the calculation of sum of squares. It explains how to interpret the F statistic and perform tests to determine if there are significant differences among multiple population means. Additionally, the lecture introduces the Kruskal-Wallis test for non-normally distributed populations.

Uploaded by

affrohbarbara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Analysis of Variance

Prof. Kwabena Doku-Amponsah


Lecture Goals

After completing this Lecture, you should be able


to:
 Recognize situations in which you use analysis of variance
 Understand different analysis of variance designs
 Perform a one-way and two-way analysis of variance and
interpret the results
 Conduct and interpret a Kruskal-Wallis test
 Analyze two-factor analysis of variance tests with more than
one observation per cell
Comparison of
Several Population Means
 In this lecture the procedures in lecture one is extended
to tests for the equality of more than two population
means
 The null hypothesis is that the population means are all
the same
 The critical factor is the variability involved in the data
 If the variability around the sample means is small

compared with the variability among the sample


means, we reject the null hypothesis
Comparison of
Several Population Means
(continued)

 Small variation around the  Large variation around the


sample means compared sample means compared
to the variation among the to the variation among the
sample means sample means
One-Way Analysis of Variance
 Evaluate the difference among the means of three
or more groups
Examples: Average production for 1st, 2nd, and 3rd shifts
Expected in mileage for five brands of tires

 Assumptions
 Populations are normally distributed

 Populations have equal variances

 Samples are randomly and independently drawn


Hypotheses of One-Way ANOVA

 H0 : μ1 μ2 μ3  μK


 All population means are equal
 i.e., no variation in means between groups

 H1 : μi μ j for at least one i, j pair


 At least one population mean is different
 i.e., there is variation between groups
 Does not mean that all population means are different
(some pairs may be the same)
One-Way ANOVA
H0 : μ1 μ2 μ3  μK
H1 : Not all μi are the same

All Means are the same:


The Null Hypothesis is True
(No variation between
groups)

μ1 μ2 μ3
One-Way ANOVA
(continued)
H0 : μ1 μ2 μ3  μK
H1 : Not all μi are the same
At least one mean is different:
The Null Hypothesis is NOT true
(Variation is present between groups)

or

μ1 μ2 μ3 μ1 μ2 μ3


Variability
 The variability of the data is key factor to test the
equality of means
 In each case below, the means may look different, but a
large variation within groups in B makes the evidence
that the means are different weak
A B

A B C A B C
Group Group
Small variation within groups Large variation within groups
Sum of Squares Decomposition
 Total variation can be split into two parts:

SST = SSW + SSG


SST = Total Sum of Squares
Total Variation = the aggregate dispersion of the individual
data values across the various groups
SSW = Sum of Squares Within Groups
Within-Group Variation = dispersion that exists among the
data values within a particular group
SSG = Sum of Squares Between Groups
Between-Group Variation = dispersion between the group
sample means
Sum of Squares Decomposition

Total Sum of Squares


(SST)

Variation due to Variation due to


= random sampling + differences
(SSW) between groups
(SSG)
Total Sum of Squares
SST = SSW + SSG
K ni
SST   (x ij  x) 2

i1 j1
Where:
SST = Total sum of squares
K = number of groups (levels or treatments)
ni = number of observations in group i
xij = jth observation from group i
x = overall sample mean

Lecture by:Dr. Lord Mensah


Total Sum of Squares
(continued)

SST (x 11  x )2  (X12  x )2  ...  (x KnK  x )2

R esponse, X

G roup 1 G roup 2 G roup 3


Within-Group Variation
SST = SSW + SSG
K ni
SSW   (x ij  x i )2
i1 j1
Where:
SSW = Sum of squares within groups
K = number of groups
ni = sample size from group i
xi = sample mean from group i
xij = jth observation in group i
Within-Group Variation
(continued)

K ni
SSW   (x ij  x i )2
i1 j1
SSW
Summing the variation
within each group and then
MSW 
adding over all groups n K
Mean Square Within =
SSW/degrees of freedom

μi
Within-Group Variation
(continued)

SSW (x 11  x1 )2  (x 12  x1 )2  ...  (x KnK  x K )2

R esponse, X

x3
x2
x1

G roup 1 G roup 2 G roup 3


Between-Group Variation
SST = SSW + SSG
K
SSG  ni ( x i  x ) 2

i1
Where:
SSG = Sum of squares between groups
K = number of groups
ni = sample size from group i
xi = sample mean from group i
x = grand mean (mean of all data values)
Between-Group Variation
(continued)
K
SSG  ni ( x i  x ) 2

i1

SSG
Variation Due to
Differences Between MSG 
Groups K 1
Mean Square Between Groups
= SSG/degrees of freedom

μi μj
Between-Group Variation
(continued)

2 2 2
SSG n1(x1  x)  n2 (x 2  x)  ...  nK (x K  x)

R esponse, X

x3
x2 x
x1

G roup 1 G roup 2 G roup 3


Obtaining the Mean Squares
SST
MST 
n 1
SSW
MSW 
n K

SSG
MSG 
K 1
Where n = sum of the sample sizes from all groups
K = number of populations
One-Way ANOVA Table

Source of SS df MS F ratio
Variation (Variance)
Between SSG MSG
SSG K-1 MSG = F=
Groups K-1 MSW
Within SSW
SSW n-K MSW =
Groups n-K
Total SST = n-1
SSG+SSW
K = number of groups
n = sum of the sample sizes from all groups
df = degrees of freedom
One-Factor ANOVA
F Test Statistic
H0: μ1= μ2 = … = μK
H1: At least two population means are different
 Test statistic MSG
F
MSW
MSG is mean squares between variances
MSW is mean squares within variances
 Degrees of freedom
 df1 = K – 1 (K = number of groups)
 df2 = n – K (n = sum of sample sizes from all groups)
Interpreting the F Statistic
 The F statistic is the ratio of the between
estimate of variance and the within estimate
of variance
 The ratio must always be positive

df1 = K -1 will typically be small

df2 = n - K will typically be large

Decision Rule:
 Reject H0 if  = .05

F > FK-1,n-K, 0 Do not Reject H0


reject H0
FK-1,n-K,
One-Factor ANOVA
F Test Example

You want to see if three Club 1 Club 2 Club 3


different golf clubs yield 254 234 200
different distances. You 263 218 222
randomly select five 241 235 197
measurements from trials on 237 227 206
an automated driving 251 216 204
machine for each club. At the
.05 significance level, is there
a difference in mean
distance?
One-Factor ANOVA Example:
Scatter Diagram
Distance
Club 1 Club 2 Club 3 270
254 234 200 260 •
263 218 222 •
250 • x1
241 235 197 240 •
237 227 206 230
• ••
251 216 204
220 • x2 x

••
210
x1 249.2 x 2  226.0 x 3 205.8 200 •• x3
••
190
x  227.0

1 2 3
Club
One-Factor ANOVA Example
Solution
H 0: μ 1 = μ 2 = μ 3 Test Statistic:
H1: μi not all equal
MSA 2358.2
 = .05 F  25.275
MSW 93.3
df1= 2 df2 = 12

Critical Value: Decision:


F2,12,.05= 3.89 Reject H0 at  = 0.05
Conclusion:
 = .05
There is evidence that
0 Do not Reject H at least one μi differs
reject H F = 25.275
0

from the rest


0

F2,12,.05 = 3.89
Lecture by:Dr. Lord Mensah
ANOVA -- Single Factor:
Excel Output
EXCEL: data | data analysis | ANOVA: single factor
SUMMARY
Groups Count Sum Average Variance
Club 1 5 1246 249.2 108.2
Club 2 5 1130 226 77.5
Club 3 5 1029 205.8 94.2
ANOVA
Source of
SS df MS F P-value F crit
Variation
Between
4716.4 2 2358.2 25.275 4.99E-05 3.89
Groups
Within
1119.6 12 93.3
Groups
Total 5836.0 14
Multiple Comparisons Between
Subgroup Means
 To test which population means are significantly
different

e.g.: μ1 = μ2 ≠ μ3
 Done after rejection of equal means in single factor
ANOVA design
 Allows pair-wise comparisons
 Compare absolute mean differences with critical
range

1= 2 3 x
Two Subgroups

 When there are only two subgroups, compute


the minimum significant difference (MSD)

1 1
MSD t α/2sp 
n1 n2

Where sp is a pooled estimate of the variance

Lecture by:Dr. Lord Mensah


Summary
 Described one-way analysis of variance
 The logic of Analysis of Variance
 Analysis of Variance assumptions
 F test for difference in K means
 Applied the Kruskal-Wallis test when the
populations are not known to be normal

You might also like