lecture12
lecture12
groups
Groups are independent
Hypotheses of One-Way
ANOVA
H 0 : μ 1 μ 2 μ 3
http://www.econtools.com/jevons/java/Graphics2D/FDist.html
The F-distribution
A ratio of variances follows an F-
distribution: 2
between
2
~ Fn ,m
within
The F-test tests the hypothesis that two
variances are equal.
F will be close to 1 if sample variances are
equal. H : 2 2
0 between within
2 2
H a : between within
How to calculate ANOVA’s
by hand…
Treatment 1 Treatment 2 Treatment 3 Treatment 4
y11 y21 y31 y41
y12 y22 y32 y42
n=10 obs./group
y13 y23 y33 y43
y14 y24 y34 y44 k=4 groups
y15 y25 y35 y45
y16 y26 y36 y46
y17 y27 y37 y47
y18 y28 y38 y48
y19 y29 y39 y49
y110 y210 y310 y410
10
10 10 10
y1 j
y 2j y 3j y 4j The group
j 1 j 1
y1 y 2
j 1
y 3
j 1 y 4 means
10 10 10 10
10 10 10
10
( y 2 j y 2 ) 2 (y (y
(y y 4 ) 2
2
1j y1 ) 2
3j y 3 ) 4j
j 1 j 1 j 1 j 1 The (within)
10 1 10 1 10 1 10 1 group
variances
Sum of Squares Within
(SSW), or Sum of Squares
Error (SSE)
10 10 10
(y
10 2
(y 1j y1 ) 2
j 1
2j y 2 ) (yj 1
3j y 3 ) 2
(y
j 1
4j y 4 ) 2
The (within)
j 1
group
10 1 10 1 10 1 10 1 variances
10 10
10 10
(y 1j y1 ) 2
+ ( y 2 j y 2 ) 2 + ( y 3 j y 3 ) + 2
(y
j 1
4j y 4 ) 2
j 1 j 3
j 1
4 10
i 1 j 1
( y ij y i ) 2 Sum of Squares Within (SSW)
(or SSE, for chance error)
Sum of Squares Between (SSB),
or Sum of Squares Regression
(SSR)
4 10
Overall
mean of all y
i 1 j 1
ij
40
observation y
s (“grand 40
mean”)
4 Sum of Squares
10 x (y
i 1
i y ) 2 Between (SSB).
Variability of the
group means
compared to the
grand mean (the
variability due to the
treatment).
Total Sum of Squares
(SST)
Total sum of
4 10 squares(TSS).
i 1 j 1
( y ij y ) 2 Squared difference
of every observation
from the overall
mean. (numerator of
variance of Y!)
Partitioning of Variance
4 10 4 4 10
(y
i 1 j 1
ij y i ) 2
+10x ( y i y ) 2
= ( y ij y ) 2
i 1 i 1 j 1
SSW + SSB =
TSS
ANOVA Table
Mean Sum
Source of Sum of of Squares
variation d.f. squares F-statistic p-value
Total 39 2257.1
Step 3) Fill in the ANOVA
table
Source of variation d.f. Sum of squares Mean Sum of F-statistic p-value
Squares
Total 39 2257.1
INTERPRETATION of ANOVA:
How much of the variance in height is explained by treatment
group?
Coefficient of
Determination
2 SSB SSB
R
SSB SSE SST
The amount of variation in the outcome variable (dependent
variable) that is explained by the predictor (independent
variable).
Beyond one-way ANOVA
Often, you may want to test more
than 1 treatment. ANOVA can
accommodate more than 1
treatment or factor, so long as they
are independent. Again, the
variation partitions beautifully!
Total 74 489,179
**R2=98113/489179=20%
School explains 20% of the variance in lunchtime calcium
intake in these kids.
ANOVA summary
A statistically significant ANOVA (F-
test) only tells you that at least two of
the groups differ, but not which ones
differ.
• Scheffe (adjusts p)
Drug 1 vs. 2 3 4 5 6 7 8 9 10
drug …
Arrange p-values:
6 9 7 10 5 2 8 4 3
Conformed?
2 4 6 8 10
Yes 20 50 75 60 30
No 80 50 25 40 70
Yes 47 47 47 47 47
No 53 53 53 53 53
Do observed and expected differ
more than expected due to
chance?
Chi-Square test
(observed - expected)2
2
expected
2 (20 47) 2 (50 47) 2 (75 47) 2 (60 47) 2 (30 47) 2
4
47 47 47 47 47
(80 53) 2 (50 53) 2 (25 53) 2 (40 53) 2 (70 53) 2
85
53 53 53 53 53
The expected
value and
variance of a chi-
square:
E(x)=df
Var(x)=2(df)
Chi-Square test
(observed - expected)2
2
expected
2 (20 47) 2 (50 47) 2 (75 47) 2 (60 47) 2 (30 47) 2
4
47 47 47 47 47
(80 53) 2 (50 53) 2 (25 53) 2 (40 53) 2 (70 53) 2
85
53 53 53 53 53
Rule of thumb: if the chi-square statistic is much greater than it’s degrees of freedom,
indicates statistical significance. Here 85>>4.
Chi-square example: recall data…
8 435 453
5 3
ptumor / cellphone .014; ptumor / nophone .033
352 91
(pˆ1 p
ˆ2) 0 8
Z ;p .018
( p )(1 p ) ( p )(1 p ) 453
n1 n2
(.014 .033) .019
Z 1.22
(.018)(.982) (.018)(.982) .0156
352 91
Same data, but use Chi-square
test
Brain tumor No brain tumor
Own 5 347 352
Don’t own 3 88 91
8 435 453
8 352
ptumor .018; pcellphone .777 Expected value
453 453 in cell c= 1.7, so
ptumor xpcellphone .018 * .777 .014 technically
Expected in cell a .014 * 453 6.3; 1.7 in cell c; should use a
Fisher’s exact
345.7 in cell b; 89.3 in cell d
here! Next
(R-1 )*(C-1 ) 1*1 1 df term…
2 (8 - 6.3) 2 (3 - 1.7) 2 (89.3 - 88) 2 (347 - 345.7) 2
1 1.48
6.3 1.7 89.3 345.7
NS
note :Z 2 1.22 2 1.48
Caveat
**When the sample size is very
small in any cell (expected
value<5), Fisher’s exact test is
used as an alternative to the chi-
square test.
Binary or categorical
outcomes (proportions)
Are the observations correlated? Alternative to the
Outcom independent correlated chi-square test if
e sparse cells:
Variable