ANOVA

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Analysis of Variance

Overview of ANOVA
• Analysis of variance (ANOVA) is a comparison
of means.
• ANOVA allows you to compare more than
two means simultaneously.
• Proper experimental design efficiently uses
limited data to draw the strongest possible
inferences.

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.


Overview of ANOVA
 The Goal: Explaining Variation
• ANOVA seeks to identify sources of variation in a
numerical dependent variable Y (the response
variable).
• Variation in Y about its mean is explained by one
or more categorical independent variables (the
factors) or is unexplained (random error).

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.


Overview of ANOVA
 The Goal: Explaining Variation
• Each possible value of a factor or combination of
factors is a treatment.
• We test to see if each factor has a significant
effect on Y using (for example) the hypotheses:
H0: m1 = m2 = m3 = m4
H1: Not all the means are equal
• The test uses the F distribution.
• If we cannot reject H0, we conclude that
observations within each treatment have a
common mean m.
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Overview of ANOVA
 The Goal: Explaining Variation
• For example, a one-factor ANOVA would test the
hypothesis that the length of hospital stay (LOS)
is affected by Type of Fracture:
Length of stay = f(type of fracture)
• A two-factor ANOVA would test the hypothesis
that the length of hospital stay (LOS) is affected
by Type of Fracture and Age Group:
Length of stay = f(type of fracture, age group)
• We can also test for interaction between factors.
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Overview of ANOVA
 The Goal: Explaining Variation

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.


One-Factor ANOVA
(Completely Randomized Design)
 Data Format
• A one-factor ANOVA only compares the
means of c groups (treatments or factor
levels).
• Consider the format for a one-factor ANOVA
with c treatments, denoted A1, A2, …, Ac

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.


One-Factor ANOVA
(Completely Randomized Design)
 Data Format
• Sample sizes within each treatment do not need
to be equal (i.e., balanced).
• The total number of observations is equal to
n = n1 + n2 + … + n c
 Hypothesis to Be Tested
• H0: m1 = m2 = … = mc
H1: Not all the means are equal
• ANOVA tests all means simultaneously and so
does not inflate the type I error.
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
One-Factor ANOVA
(Completely Randomized Design)
 Group Means
• The mean of each group is calculated as

• The overall sample mean (grand mean) can be


calculated as

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.


One-Factor ANOVA
(Completely Randomized Design)
 Partitioned Sum of Squares
• For a given observation yij, the following
relationship must hold
(yij – y ) = (yj – y ) + (yij – yj )
• Where
(yij – y ) = deviation of an observation from the
grand mean
(yj – y ) = deviation of the column mean from the
grand mean (between treatments)
(yij – yj ) = deviation of the observation from its own
column mean (within treatments).
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
One-Factor ANOVA
(Completely Randomized Design)
 Partitioned Sum of Squares
• This relationship is true for sums of squared
deviations, yielding partitioned sum of squares:

• Simply put, SST = SSA + SSE

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.


One-Factor ANOVA
(Completely Randomized Design)
 Partitioned Sum of Squares
• SSA and SSE are used to test the hypothesis
of equal treatment means by dividing each
sum of squares by it degrees of freedom to
adjust for group size.
• These ratios are called Mean Squares (MSA
and MSE).
• The resulting test statistic is F = MSA/MSE.

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.


One-Factor ANOVA
(Completely Randomized Design)
 Partitioned Sum of Squares

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.


One-Factor ANOVA
(Completely Randomized Design)
 Test Statistic
• The F distribution describes the ratio of two
variances.
• The F statistic is the ratio of the variance due
to treatments (MSA) to the variance due to
error (MSE).

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.


One-Factor ANOVA
(Completely Randomized Design)
 Test Statistic
• When F is near zero, then there is little difference
among treatments and we would not expect to reject
the hypothesis of equal treatment means.

F=MSA/MSE

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.


One-Factor ANOVA
(Completely Randomized Design)
 Decision Rule

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.


Multiple Comparison Tests

 Tukey’s Test
• After rejecting the hypothesis of equal mean,
we naturally want to know – Which means
differ significantly?
• In order to maintain the desired overall
probability of type I error, a simultaneous
confidence interval for the difference of means
must be obtained.
• For c groups, there are c(c – 1) distinct pairs of
means to be compared.
• These types of comparisons are called Multiple
Comparison Tests.
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Multiple Comparison Tests

 Tukey’s Test
• Tukey’s studentized range test (or HSD for
“honestly significant difference” test) is a
multiple comparison test that has good
power and is widely used.
• Named for statistician John Wilder Tukey
(1915 – 2000)
• This test is not available in Excel’s Tools >
Data Analysis but is available in MegaStat.
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Multiple Comparison Tests

 Tukey’s Test
• Tukey’s is a two-tailed test for equality of
paired means from c groups compared
simultaneously.
• The hypotheses are:
H0: mj = mk
H1: mj ≠ mk

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.


Multiple Comparison Tests

 Tukey’s Test
• The decision rule is:
|yj – yk|
Reject H0 if > Ta
1+1
MSE nj nk

• Where Ta = 0.707qc,n-c and qc,n-c is a critical value


of the studentized range for the desired a.

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.


Multiple Comparison Tests

 Tukey’s Test
• For example, here is the upper 5% of
studentized range:

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

You might also like