Multisample Inference: Analysis of Variance
Multisample Inference: Analysis of Variance
Multisample inference:
Analysis of Variance
EPI809/Spring 2008 1
Learning Objectives
EPI809/Spring 2008 2
Analysis of Variance
A analysis of variance is a technique that
partitions the total sum of squares of
deviations of the observations about their
mean into portions associated with
independent variables in the experiment
and a portion associated with error
EPI809/Spring 2008 3
Analysis of Variance
The ANOVA table was previously
discussed in the context of regression
models with quantitative independent
variables, in this chapter the focus will be
on nominal independent variables (factors)
EPI809/Spring 2008 4
Analysis of Variance
EPI809/Spring 2008 5
Analysis of Variance
EPI809/Spring 2008 6
Types of Experimental Designs
Experimental
Designs
One-Way Two-Way
Anova Anova
EPI809/Spring 2008 7
Completely Randomized
Design
EPI809/Spring 2008 8
Completely Randomized Design
1. Experimental Units (Subjects) Are
Assigned Randomly to Treatments
Subjects are Assumed Homogeneous
2. Variables
One Nominal Independent Variable
One Continuous Dependent Variable
EPI809/Spring 2008 10
One-Way ANOVA F-Test
Assumptions
1. Randomness & Independence of Errors
2. Normality
Populations (for each condition) are
Normally Distributed
3.Homogeneity of Variance
Populations (for each condition) have Equal
Variances
EPI809/Spring 2008 11
One-Way ANOVA F-Test
Hypotheses
H0: 1 = 2 = 3 = ... = p
All Population Means
are Equal
No Treatment Effect
EPI809/Spring 2008 12
One-Way ANOVA F-Test
Hypotheses
H0: 1 = 2 = 3 = ... = p
f(X)
All Population Means
are Equal
No Treatment Effect
X
1 = 2 = 3
Ha: Not All j Are Equal
At Least 1 Pop. Mean is
Different f(X)
Treatment Effect
1 = 2 = ... = p
X
Or i ≠ j for some i, j. 1 = 2 3
EPI809/Spring 2008 13
One-Way ANOVA
Basic Idea
1. Compares 2 Types of Variation to Test
Equality of Means
2. If Treatment Variation Is Significantly
Greater Than Random Variation then
Means Are Not Equal
3.Variation Measures Are Obtained by
‘Partitioning’ Total Variation
EPI809/Spring 2008 14
One-Way ANOVA
Partitions Total Variation
EPI809/Spring 2008 15
One-Way ANOVA
Partitions Total Variation
Total variation
EPI809/Spring 2008 16
One-Way ANOVA
Partitions Total Variation
Total variation
Variation due to
treatment
EPI809/Spring 2008 17
One-Way ANOVA
Partitions Total Variation
Total variation
EPI809/Spring 2008 18
One-Way ANOVA
Partitions Total Variation
Total variation
Total variation
Response, Y
Y
SST n1 Y1 Y n2 Y2 Y n p Y p Y
2 2 2
Response, Y
Y3
Y
Y2
Y1
Response, Y
Y3
Y2
Y1
EPI809/Spring 2008 24
One-Way ANOVA
Summary Table
Source of Degrees Sum of Mean F
Variation of Squares Square
Freedom (Variance)
Treatment p-1 SST MST = MST
SST/(p - 1) MSE
Error n-p SSE MSE =
SSE/(n - p)
Total n-1 SS(Total) =
SST+SSE
EPI809/Spring 2008 25
One-Way ANOVA F-Test Critical
Value
If means are equal,
F = MST / MSE 1.
Only reject large F! Reject H0
Do Not
Reject H0
0 F
Fa ( p1, n p)
Always One-Tail!
© 1984-1994 T/Maker Co.
EPI809/Spring 2008 26
One-Way ANOVA F-Test
Example
As a vet epidemiologist you Food1 Food2 Food3
want to see if 3 food 25.40 23.40 20.00
supplements have different 26.31 21.80 22.20
mean milk yields. You 24.10 23.50 19.75
assign 15 cows, 5 per food 23.74 22.75 20.60
supplement. 25.10 21.60 20.40
Question: At the .05 level, is
there a difference in mean
yields?
EPI809/Spring 2008 27
One-Way ANOVA F-Test
Solution
H0: 1 = 2 = 3
Test Statistic:
Ha: Not All Equal
MST 23.5820
= .05 F 25.6
MSE .9211
1 = 2 2 = 12
Critical Value(s):
Decision:
Reject at = .05
= .05
Conclusion:
There Is Evidence Pop.
0 3.89 F Means Are Different
EPI809/Spring 2008 28
Summary Table
Solution
Source of Degrees of Sum of Mean F
Variation Freedom Squares Square
(Variance)
Food 3-1=2 47.1640 23.5820 25.60
Total 15 - 1 = 14 58.2172
EPI809/Spring 2008 29
SAS CODES FOR ANOVA
Data Anova;
input group$ milk @@;
cards;
food1 25.40 food2 23.40 food3 20.00
food1 26.31 food2 21.80 food3 22.20
food1 24.10 food2 23.50 food3 19.75
food1 23.74 food2 22.75 food3 20.60
food1 25.10 food2 21.60 food3 20.40
;
run;
EPI809/Spring 2008 30
SAS OUTPUT - ANOVA
Sum of
Source DF Squares Mean Square F Value Pr > F
EPI809/Spring 2008 31
Pair-wise comparisons
Needed when the overall F test is rejected
EPI809/Spring 2008 32
Fisher’s Least Significant
Difference (LSD) Test
To compare level 1 and level 2
1 1
t y1 y2 MSE
n1 n2
Compare this to t/2 = Upper-tailed value or - t/2
lower-tailed from Student’s t-distribution for /2 and
(n - p) degrees of freedom
�2 �
EPI809/Spring 2008 34
SAS CODES FOR multiple
comparisons
proc anova;
class group;
model milk=group;
means group/ lsd bon;
run;
EPI809/Spring 2008 35
SAS OUTPUT - LSD
t Tests (LSD) for milk
Alpha 0.05
Error Degrees of Freedom 12
Error Mean Square 0.9211
Critical Value of t 2.17881 = t.975,12
Least Significant Difference 1.3225
A 24.9300 5 food1
B 22.6100 5 food2
C 20.5900 5 food3
EPI809/Spring 2008 36
SAS OUTPUT - Bonferroni
Bonferroni (Dunn) t Tests for milk
Alpha 0.05
Error Degrees of Freedom 12
Error Mean Square 0.9211
Critical Value of t 2.77947=t1-0.05/3/2,12
Minimum Significant Difference 1.6871
A 24.9300 5 food1
B 22.6100 5 food2
C 20.5900 5 food3
EPI809/Spring 2008 37
Randomized Block
Design
EPI809/Spring 2008 38
Randomized Block Design
EPI809/Spring 2008 39
Randomized Block Design
Factor Levels:
(Treatments) A, B, C, D
Treatments are randomly
Experimental Units assigned within blocks
Block 1 A C D B
Block 2 C D B A
Block 3 B A D C
. . . . .
. . . . .
. . . . .
Block b D C A B
EPI809/Spring 2008 40
Randomized Block F-Test
2.Variables
One Nominal Independent Variable
One Nominal Blocking Variable
One Continuous Dependent Variable
EPI809/Spring 2008 41
Randomized Block F-Test
Assumptions
1.Normality
Probability Distribution of each Block-
Treatment combination is Normal
2.Homogeneity of Variance
Probability Distributions of all Block-
Treatment combinations have Equal
Variances
EPI809/Spring 2008 42
Randomized Block F-Test
Hypotheses
H0: 1 = 2 = 3 = ... = p
All Population Means are
Equal
No Treatment Effect
EPI809/Spring 2008 43
Randomized Block F-Test
Hypotheses
H0: 1 = 2 = ... = p
All Population Means f(X)
are Equal
No Treatment Effect
X
Ha: Not All j Are Equal 1 = 2 = 3
At Least 1 Pop. Mean
is Different f(X)
Treatment Effect
1 2 ... p Is
wrong X
1 = 2 3
EPI809/Spring 2008 44
The F Ratio for Randomized
Block Designs
SS=SSE+SSB+SST
MST SST / p 1
F
MSE SSE / n 1 p 1 b 1
SST / p 1
SSE / n p b 1
Randomized Block F-Test
Test Statistic
1. Test Statistic
F = MST / MSE
• MST Is Mean Square for Treatment
• MSE Is Mean Square for Error
2. Degrees of Freedom
1 = p -1
2 = n – b – p +1
• p = # Treatments, b = # Blocks, n = Total Sample Size
EPI809/Spring 2008 46
Randomized Block F-Test
Critical Value
If means are equal,
F = MST / MSE 1.
Only reject large F! Reject H0
Do Not
Reject H0
0 F
Fa ( p1, n p)
Always One-Tail!
© 1984-1994 T/Maker Co.
EPI809/Spring 2008 47
Randomized Block F-Test
Example
You wish to determine which of four brands of tires has
the longest tread life. You randomly assign one of each
brand (A, B, C, and D) to a tire location on each of 5
cars. At the .05 level, is there a difference in mean tread
life? Tire Location
Block Left Front Right Front Left Rear Right Rear
Car 1 A: 42,000 C: 58,000 B: 38,000 D: 44,000
Car 2 B: 40,000 D: 48,000 A: 39,000 C: 50,000
Car 3 C: 48,000 D: 39,000 B: 36,000 A: 39,000
Car 4 A: 41,000 B: 38,000 D: 42,000 C: 43,000
Car 5 D: 51,000 A: 44,000 C: 52,000 B: 35,000
EPI809/Spring 2008 48
Randomized Block F-Test
Solution
H0: 1 = 2 = 3= 4
Test Statistic:
Ha: Not All Equal
= .05 F = 11.9933
1 = 3 2 = 12
Decision:
Critical Value(s): Reject at = .05
Conclusion:
= .05
There Is Evidence Pop.
Means Are Different
0 3.49 F
EPI809/Spring 2008 49
SAS CODES FOR ANOVA
data block;
input Block$ trt$ resp @@;
cards;
Car1 A: 42000 Car1 C: 58000 Car1 B: 38000 Car1 D: 44000
Car2 B: 40000 Car2 D: 48000 Car2 A: 39000 Car2 C: 50000
Car3 C: 48000 Car3 D: 39000 Car3 B: 36000 Car3 A: 39000
Car4 A: 41000 Car4 B: 38000 Car4 D: 42000 Car4 C: 43000
Car5 D: 51000 Car5 A: 44000 Car5 C: 52000 Car5 B: 35000
;
run;
proc anova;
class trt block;
model resp=trt block;
Means trt /lsd bon;
run;
EPI809/Spring 2008 50
SAS OUTPUT - ANOVA
Dependent Variable: resp
Sum of
Source DF Squares Mean Square F Value Pr > F
EPI809/Spring 2008 51
SAS OUTPUT - LSD
A 50200 5 C:
B 44800 5 D:
B
C B 41000 5 A:
C
C 37400 5 B:
EPI809/Spring 2008 52
SAS OUTPUT - Bonferroni
Means with the same letter are not significantly different.
A 50200 5 C:
A
B A 44800 5 D:
B
B C 41000 5 A:
C
C 37400 5 B:
EPI809/Spring 2008 53
Factorial Experiments
EPI809/Spring 2008 54
Factorial Design
1.
Experimental Units (Subjects) Are
Assigned Randomly to Treatments
Subjects are Assumed Homogeneous
2.Two or More Factors or Independent
Variables
Each Has 2 or More Treatments (Levels)
3. Analyzed by Two-Way ANOVA
EPI809/Spring 2008 55
Advantages
of Factorial Designs
1.Saves Time & Effort
e.g., Could Use Separate Completely
Randomized Designs for Each Variable
2.Controls Confounding Effects by Putting
Other Variables into Model
3.Can Explore Interaction Between Variables
EPI809/Spring 2008 56
Two-Way ANOVA
EPI809/Spring 2008 57
Two-Way ANOVA
Assumptions
1.Normality
Populations are Normally Distributed
2.Homogeneity of Variance
Populations have Equal Variances
3.Independence of Errors
Independent Random Samples are Drawn
EPI809/Spring 2008 58
Two-Way ANOVA
Data Table
Factor Factor B
A 1 2 ... b Observation k
1 Y 111 Y121 ... Y1b1
Y112 Y122 ... Y1b2 Yijk
2 Y211 Y221 ... Y2b1
Y212 Y222 ... YX
2b2
2b2
Level i Level j
: : : : : Factor Factor
a Ya11 Ya21 ... Yab1 A B
EPI809/Spring 2008 60
Two-Way ANOVA
Total Variation Partitioning
Total
Total Variation
Variation
SS(Total)
Variation
VariationDue
Dueto
to Variation
Variation Due
Dueto
to
Treatment
TreatmentAA Treatment
TreatmentBB
SSA SSB
Variation
VariationDue
Duetoto Variation
VariationDue
Dueto
to
Interaction
Interaction Random
Random Sampling
Sampling
SS(AB) SSE
EPI809/Spring 2008 61
Two-Way ANOVA
Summary Table
Source of Degrees of Sum of Mean F
Variation Freedom Squares Square
A a-1 SS(A) MS(A) MS(A)
(Row) MSE
B b-1 SS(B) MS(B) MS(B)
(Column) MSE
AB (a-1)(b-1) SS(AB) MS(AB) MS(AB)
(Interaction) MSE
Error n - ab SSE MSE
Total n-1 SS(Total) Same as
Other
EPI809/Spring 2008 Designs 62
Interaction
1.Occurs When Effects of One Factor Vary
According to Levels of Other Factor
2.When Significant, Interpretation of Main
Effects (A & B) Is Complicated
3.Can Be Detected
In Data Table, Pattern of Cell Means in One
Row Differs From Another Row
In Graph of Cell Means, Lines Cross
EPI809/Spring 2008 63
Graphs of Interaction
female female
sv lv nor sv lv nor
EPI809/Spring 2008 64
Two-Way ANOVA F-Test
Example
Effect of diet (sv-strict vegetarians, lv-
lactovegetarians, nor-normal) and gender (female,
male) on systolic blood pressure.
EPI809/Spring 2008 65
SAS CODES FOR ANOVA
data factorial;
input dietary$ sex$ sbp; lv male 116.5
cards; lv male 118.5
lv male 119.5
sv male 109.9 lv male 110.5
sv male 101.9 lv male 115.5
sv male 100.9 lv male 105.2
sv male 119.9 nor male 128.3
sv male 104.9 nor male 129.3
sv male 189.9 nor male 126.3
nor male 127.3
nor male 126.3
sv female 102.6 nor male 125.3
sv female 99
nor female 119.1
sv female 83 .6 nor female 119.2
sv female 99.6 nor female 115.6
sv female 102.6 nor female 119.9
nor female 119.8
sv female 112.6 nor female 119.7
;
run;
EPI809/Spring 2008 66
SAS CODES FOR ANOVA
proc glm;
class dietary sex;
model sbp=dietary sex dietary*sex;
run;
proc glm;
class dietary sex;
model sbp=dietary sex;
run;
EPI809/Spring 2008 67
SAS OUTPUT - ANOVA
Dependent Variable: sbp
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 5 2627.399667 525.479933 1.96 0.1215
Error 24 6435.215000 268.133958
Corrected Total 29 9062.614667
EPI809/Spring 2008 68
Linear Contrast
Linear Contrast is a linear combination of the means
of populations
L �c j j with �c j 0
k L
To test H0: L �c j j 0 Construct t
2
j 1
k c
s 2
�n
j 1
j
j
Compare with critical value t1-α/2,, n-k.
Reject H0 if |t| ≥ t1-α/2,, n-k.
SAS uses contrast statement and performs an F – test df (1, n-k);
Or estimate statement and perform a t-test df (n-k).
EPI809/Spring 2008 70
T-test for Linear Contrast (Scheffe)
Construct multiple contrasts involving k group
means. Trying to search for significant contrast
k L
To test H0: L �c j j 0 Construct t
2
j 1
k c
Compare with critical value.
s 2
�n
j 1
j
j
a (k 1) Fk 1,n k ,1
Reject H0 if |t| ≥ a
EPI809/Spring 2008 71
SAS Code for contrast testing
proc glm;
class trt block;
model resp=trt block;
Means trt /lsd bon scheffe;
contrast 'A - B = 0' trt 1 -1 0 0 ;
contrast 'A - B/2 - C/2 = 0' trt 1 -.5 -.5 0 ;
contrast 'A - B/3 - C/3 -D/3 = 0' trt 3 -1 -1 -1 ;
contrast 'A + B - C - D = 0' trt 1 1 -1 -1 ;
lsmeans trt/stderr pdiff;
lsmeans trt/stderr pdiff adjust=scheffe; /* Scheffe's test */
lsmeans trt/stderr pdiff adjust=bon; /* Boneferoni's test
*/
estimate ‘A - B' trt 1 -1 0 0 0;
run;
EPI809/Spring 2008 72
Regression representation of Anova
EPI809/Spring 2008 73
Regression representation of
Anova
yij i eij i eij
One-way anova: p
�
i 1
i 0
Two-way anova:
yijk ij eijk i b j g ij eijk
a b b a
�
i 1
i 0, �b j 0,
j 1
�g
j 1
ij 0 for all i and �g
i 1
ij 0 for all j
EPI809/Spring 2008 74
Regression representation of
Anova
One-way anova: Dummy variables of factor
with p levels
y b 0 b1 x1 b 2 x2 ... b p 1 x p 1 e
1
� if level i
where xi �
0
� if otherwise
EPI809/Spring 2008 75
Conclusion: should be able to
1. Recognize the applications that uses ANOVA
Between-Sample Levels
Variation One-Way Analysis
Completely of Variance
Randomized Design Total Variation
Experiment-Wide
Treatment
Error Rate
Within-Sample
Factor
Variation
EPI809/Spring 2008 78