Session 15 - ANOVA
Session 15 - ANOVA
variance
Session 15
BUSINESS STATISTICS
One-Way Analysis of Variance
• Assumptions
• Populations are normally distributed
• Populations have equal variances
• Samples are randomly and independently drawn
Hypotheses of One-Way ANOVA
• H0 : μ1 μ2 μ3 μc
• All population means are equal
• i.e., no factor effect (no variation in means Between
groups)
μ1 μ 2 μ 3
One-Way ANOVA
H0 : μ1 μ2 μ3 μc
H1 : Not all μ j are equal
The Null Hypothesis is NOT true
At least one of the means is different
(Factor Effect is present)
or
μ1 μ2 μ3 μ1 μ2 μ3
Visualizing ANOVA through Example
Partitioning the Variation
• Total variation can be split into two parts:
SST ( Xij X) 2
j1 i1
Where:
SST = Total sum of squares
c = number of groups or levels
nj = number of observations in group j
Xij = ith observation from group j
X = grand mean (mean of all data values)
Total Variation
(continued)
2 2 2
SST ( X 11 X ) ( X 12 X ) ( X cn X )
c
R esponse, X
j 1
Where:
SSB = Sum of squares Between groups
c = number of groups
nj = sample size from group j
Xj = sample mean from group j
X = grand mean (mean of all data values)
BETWEEN-Group Variation
(continued)
c
SSB n j ( X j X ) 2
j 1
SSB
Variation Due to
Differences Between Groups MSB
c 1
Mean Square Between =
SSB/degrees of freedom
i j
BETWEEN-Group Variation
(continued)
SSB n1 ( X 1 X ) n2 ( X 2 X ) nc ( X c X )
2 2 2
R esponse, X
X3
X2 X
X1
SSW ( Xij X j ) 2
j1 i1
Where:
SSW = Sum of squares within groups
c = number of groups
nj = sample size from group j
Xj = sample mean from group j
Xij = ith observation in group j
Within-Group Variation
(continued)
c nj
SSW ( Xij X j )2
j1 i1
SSW
Summing the variation
MSW
within each group and then
adding over all groups nc
Mean Square Within =
SSW/degrees of freedom
μj
Within-Group Variation
(continued)
R esponse, X
X3
X2
X1
SST
MST Mean Square Total
n 1 (d.f. = n-1)
One-Way ANOVA Table
c = number of groups
n = sum of the sample sizes from all groups
df = degrees of freedom
One-Way ANOVA
F Test Statistic
H0: μ1= μ2 = … = μc
H1: At least two population means are different
• Test statistic MSB
FSTAT
MSW
MSB is mean squares Between groups
MSW is mean squares within groups
• Degrees of freedom
• df1 = c – 1 (c = number of groups)
• df2 = n – c (n = sum of sample sizes from all populations)
Interpreting One-Way ANOVA & F-Statistic
• The F statistic is the ratio of the Between estimate of variance and the
within estimate of variance. HIGHER the RATIO; HIGHER the BETWEEN
VARIANCE (numerator) and LOWER the WITHIN VARIANCE-Denominator
(homogeneous within groups)
• The ratio must always be positive
• df1 = c -1 will typically be small
• df2 = n - c will typically be large
Decision Rule:
Reject H if F
0 STAT > Fα,
otherwise do not reject
0
H0 Do not
reject H0
Reject H0
Fα
One-Way ANOVA
F Test Example
1 2 3
Club
One-Way ANOVA Example
Computations
Zone 1 Zone 2 Zone 3 X1 = 249.2 n1 = 5
254 234 200 X2 = 226.0 n2 = 5
263 218 222
241 235 197 X3 = 205.8 n3 = 5
237 227 206 n = 15
251 216 204 X = 227.0
c=3
SSB = 5 (249.2 – 227)2 + 5 (226 – 227)2 + 5 (205.8 – 227)2 = 4716.4
SSW = (254 – 249.2)2 + (263 – 249.2)2 +…+ (204 – 205.8)2 = 1119.6
Why do we need the analysis of variance? Why not test every pair of means? For
example say k = 6. There are 6C2 = 6(5)/2= 14 different pairs of means.
1&2 1&3 1&4 1&5 1&6
2&3 2&4 2&5 2&6
3&4 3&5 3&6
4&5 4&6
5&6
• If we test each pair with α = .05 we increase the probability of making a Type I
error. If there are no differences then the probability of making at least one Type I
error is 1-(.95)14 = 1 - .463 = .537
• Major shortcoming: tells us that any one or more of the pair(s) are different but
no indication on which pair is different
Multiple Comparisons
When we conclude from the one-way analysis of variance that at least
two treatment means differ (i.e. we reject the null hypothesis that H0:
), we often need to know which treatment
means are responsible for these differences.
IF
( x1 x 2 )
is greater than
1 1
t / 2 s
2
p
n1 n 2
Fisher’s Least Significant Difference
However, we have a better estimator of the pooled variances. It is MSE.
We substitute MSE in place of sp2. Thus we compare the difference
between means to the Least Significant Difference LSD, given by:
LSD will be the same for all pairs of means if all k sample sizes are equal. If
some sample sizes differ, LSD must be calculated for each combination.
MSE==MSW=SSW/n-k
Example
The problem objective is to compare four populations, the data are
interval, and the samples are independent. The correct statistical
method is the one-way analysis of variance.
A B C D E F G
11 ANOVA
12 Source of Variation SS df MS F P-value F crit
13 Between Groups 150,884 3 50,295 4.06 0.0139 2.8663
14 Within Groups 446,368 36 12,399
15
16 Total 597,252 39
1 1 1 1
LSD t / 2
n i n j 2.030 12,399 10 10 101.09
MSE
Example (continued…)
We calculate the absolute value of the differences between
means and compare them to LSD = 101.09.
We set α = .05/6 = .0083. Thus, tα/2,36 = 2.794 (available from Excel and difficult to approximate
manually) and
1 1 1 1
LSD t / 2 MSE 2.79 12,399 139.13
ni n j 10 10
.
Example Continued..
A B C D E
1 Multiple Comparisons
2
3 LSD Omega
4 Treatment Treatment Difference Alpha = 0.0083 Alpha = 0.05
5 Bumper 1 Bumper 2 -105.9 139.11 133.45
6 Bumper 3 -103.8 139.11 133.45
7 Bumper 4 31.8 139.11 133.45
8 Bumper 2 Bumper 3 2.1 139.11 133.45
9 Bumper 4 137.7 139.11 133.45
10 Bumper 3 Bumper 4 135.6 139.11 133.45