Notes Class5
Notes Class5
T avg
A 62 60 63 59 61
B 63 67 71 64 65 66 66
C 68 66 71 67 68 68 68
D 56 62 60 61 63 64 63 59 61
64
Blood coagulation time
Combined
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
Coagulation Time
Notation
t i t t i
ST = SB + SW
We assume that the data are random samples from four normal
distributions having the same variance σ 2, differing only (if at all)
in their means.
We can estimate the variance σ 2 for each treatment t, using the
sum of squared differences from the averages within each group.
Then
E(St)=(nt – 1) × σ 2.
Within group variability
(Yti − Ȳt·)2
!!
SW=S1 + · · · + Sk=
t i
− Ȳt·)2
""
S1 + · · · + Sk SW t i (Yti
MW = = =
(n1 – 1) + · · · + (nk – 1) N − k N−k
k
nt(Ȳt· − Ȳ)2
!
SB=
t=1
− Ȳ)2
"
SB t nt (Ȳt·
MB = =
k−1 k−1
The following are facts that we will exploit later for some formal
hypothesis testing:
The F distribution
Z1/m
−→ Then ∼ Fm,n
Z2/n
F distributions
df=20,10
df=20,20
df=20,50
!
between treatments SB= nt(Ȳt· − Ȳ)2 k–1 MB=SB/(k – 1)
t
!!
within treatments SW= (Yti − Ȳt·)2 N–k MW=SW/(N – k)
t i
!!
total ST= (Yti − Ȳ)2 N–1
t i
Example
total 340 23
The ANOVA model
Yti = µ + τt + #ti.
62 63 68 56 64 64 64 64 −3 2 4 −3 1 −3 0 −5
60 67 66 62 64 64 64 64 −3 −3
2 4 −1 1 −2 1
63 71 71 60
64 64 64 −3
64 2 4 −3
2 5 3 −1
59 64 67 61
64 64 64 64
−3 2 4 −3 −2 −2 −1 0
= + +
65 68 63
64 64 64
2 4 −3
−1 0 2
66 68 64
64 64 64
2 4 −3
0 0 3
63 64 −3 2
59 64 −3 −2
Hypothesis testing
We assume
We want to test
2
"
2 t n t τt
E(MB)=σ +
k–1
Therefore
E(MB)=σ 2 if H0 is true
Under H0 we have
MB
∼ Fk – 1, N – k.
MW
Therefore
• Calculate MB/MW.
F(3,20)
MB MW
0 2 4 6 8 10 12 14
Another example
200 400 600 800 1000 1200 1400 1600 1800 2000
treatment response
!
between treatments SB= nt(Ȳt· − Ȳ)2 k–1 MB=SB/(k – 1)
t
!!
within treatments SW= (Yti − Ȳt·)2 N–k MW=SW/(N – k)
t i
!!
total ST= (Yti − Ȳ)2 (N – 1)
t i
and
"n1 2 "n2 2
i=1 (Y1i − Ȳ1 ) + i = 1 (Y2i − Ȳ2)
MW =
n1 + n2 − 2
Two-sample t-test
Ȳ1 − Ȳ2
t= )
s 1/n1 + 1/n2
with
"n1 2 "n2 2
2 i=1 (Y 1i − Ȳ 1 ) + i = 1 (Y2i − Ȳ2)
s =
n1 + n2 − 2
Reference distributions
MB 2
−→ Result: =t
MW
MB
∼ F1,n1+n2−2
MW
t ∼ tn1+n2−2
F1,k = t2k
χ2k
Fk,∞ =
k
Fixed effects
Underlying group dist’ns
µ7
Data
µ6
µ5
µ4
µ3
µ2
µ1
Random effects
µ6
Observed underlying
group means
µ5
µ4
µ3
µ2
µ1
As it turns out, we end up with the same test statistic and same
null distribution. For one-way ANOVA, that is!
Estimation
E(MB)=σ 2 + n0 × σA2
where
* " 2+
1 n
n0 = N − "t t
k–1 t nt
3
Subject ID
25 30 35 40 45 50 55 60
response
The samples sizes for the 8 subjects were (14, 12, 11, 10, 10, 11,
15, 9), for a total sample size of 92. Thus, n0 ≈ 11.45.
source SS df MS F P-value
between subjects 1485 7 212 4.60 0.0002
within subjects 3873 84 46
total 5358 91
Multiple comparisons
α=1 – (1 – α+)T
“Unplanned” comparisons
What next?
Which of the µ’s are different from which others?
/k0
k=5 −→ 2 = 10.
/k0
k = 10 −→ 2 = 45.
Bonferroni correction
Let α = Pr(reject at least one pairwise test | all µ’s the same)
≤ (no. tests) × Pr(reject test #1 | µ’s the same)
Combined
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
Coagulation Time
Pairwise comparisons
α 0.05
Comparison p-value α++= = =0.0083
k 6
A vs B 0.004
A vs C < 0.001
A vs D 1.000
B vs C 0.159
B vs D < 0.001
C vs D < 0.001
Another example
A
treatment
60 65 70 75
response
ANOVA table
/50
2 = 10 pairwise comparisons −→ α+ = 0.05/10 = 0.005
/ 0 1 21 13
For each pair, consider Ti,j = Ȳi· − Ȳj· / σ̂ ni + nj
√
Use σ̂ = MW (MW = within-group mean square)
and refer to a t distribution with df = 45.
A comparison
Uncorrected:
A:S Uncorrected
Bonferroni
Each interval, individually, had G:S Tukey
(in advance) a 95% chance of G:A
covering the true mean differ- F:S
ence.
F:A
F:G
C:S
Corrected:
C:A
A F G S C
58.0 58.2 59.3 64.1 70.1
Table of differences:
F G S C
A 0.2 1.3 6.1 12.1
F 1.1 5.9 11.9
G 4.8 10.0
S 6.0
Example (continued)
)
Ri = qi × MW/10:
R2 R3 R4 R5
2.10 2.53 2.79 2.97
Example (continued)
Table of differences:
F G S C
A 0.2 1.3 6.1 12.1
F 1.1 5.9 11.9
G 4.8 10.0
S 6.0
)
Ri = qi × MW/10:
R2 R3 R4 R5
2.10 2.53 2.79 2.97
Results
A F G S C
58.0 58.2 59.3 64.1 70.1
Interpretation:
A≈F≈G<S<C
Another example
D C A B E
29.6 32.9 40.0 40.7 48.8
Interpretation:
{D, C, A, B} < E and D < {A, B}
We have:
−→ 3 hospitals
−→ 4 subjects within each hospital
−→ 2 independent measurements per subject
1 2 3 4 1 2 3 4 1 2 3 4
58.5 77.8 84.0 70.1 69.8 56.0 50.7 63.8 56.6 77.8 69.9 62.1
59.5 80.9 83.6 68.3 69.8 54.5 49.3 65.8 57.5 79.2 69.2 64.5
The model
40 50 60 70 80 90
40 50 60 70 80 90 100
Hospitals Hospitals
µ = overall mean
αi = “effect” for ith hospital
βij = “effect” for jth subject within ith hospital
#ijk = random error
αi ∼ Normal(0, σA2 )
"
αi fixed; αi = 0
βij ∼ Normal(0, σB2 |A) βij ∼ Normal(0, σB2 |A)
#ijk ∼ Normal(0, σ 2) #ijk ∼ Normal(0, σ 2)
Example: sample means
1 2 3 4 1 2 3 4 1 2 3 4
58.5 77.8 84.0 70.1 69.8 56.0 50.7 63.8 56.6 77.8 69.9 62.1
59.5 80.9 83.6 68.3 69.8 54.5 49.3 65.8 57.5 79.2 69.2 64.5
Ȳij· 59.00 79.35 83.80 69.20 69.80 55.25 50.00 64.80 57.05 78.50 69.55 63.30
Ȳ··· 66.63
− Ȳ···)2
"
among groups SSamong=bn i (Ȳi·· a–1
− Ȳi··)2
""
subgroups within groups SSsubgr=n i j (Ȳij· a (b – 1)
− Ȳij·)2
"""
within subgroups SSwithin= i j k (Yijk a b (n – 1)
− Ȳ···)2
"""
TOTAL i j k (Yijk abn–1
ANOVA table
SS df MS F expected MS
SSamong MSamong
SSamong a–1 σ 2 + n σB2 |A + n b σA2
a–1 MSsubgr
SSsubgr MSsubgr
SSsubgr a (b – 1) σ 2 + n σB2 |A
a(b – 1) MSwithin
SSwithin
SSwithin a b (n – 1) σ2
ab(n – 1)
SStotal abn–1
Example
source df SS MS F P-value
TOTAL 23 2401.97
Variance components
1.30
s2 represents = 1.1%
113.95
94.94
s2B|A represents = 83.3%
113.95
17.71
s2A represents = 15.6%
113.95
Note:
−→ var(Y | A) = σ 2 + σB2 |A
−→ var(Y | A, B) = σ 2
Subject averages
I-1 I-2 I-3 I-4 II-1 II-2 II-3 II-4 III-1 III-2 III-3 III-4
58.5 77.8 84.0 70.1 69.8 56.0 50.7 63.8 56.6 77.8 69.9 62.1
59.5 80.9 83.6 68.3 69.8 54.5 49.3 65.8 57.5 79.2 69.2 64.5
ave 59.0 79.4 83.8 69.2 69.8 55.2 50.0 64.8 57.0 78.5 69.6 63.3
ANOVA table
source df SS MS F P-value
between 2 332.8 166.4 1.74 0.23
within 9 860.3 95.6
You can have as many levels as you like. For example, here is a
three-level nested mixed ANOVA model:
Assumptions: Bij ∼ N(0,σB2 |A), Cijk ∼ N(0,σC2 |B), #ijkl ∼ N(0,σ 2).
Calculations
− Ȳ····)2
"
among groups SSamong=b c n i (Ȳi··· a–1
− Ȳi···)2
""
among subgroups SSsubgr=c n i j (Ȳij·· a (b – 1)
− Ȳij··)2
"""
among subsubgroups SSsubsubgr=n i j k (Ȳijk· a b (c – 1)
− Ȳijk·)2
""" "
within subsubgroups SSsubsubgr= i j k l (Yijkl a b c (n – 1)
ANOVA table
SS MS F expected MS
cn − ȲA)2
b (ȲB MSsubgr
" "
SSsubgr a
σ 2 + nσC2 ⊂B + ncσB2 ⊂A
a(b – 1) MSsubsubgr
n − ȲB)2 MSsubsubgr
" " "
c (ȲC
SSsubsubgr a b
σ 2 + nσC2 ⊂B
ab(c – 1) MSwithin
n (Y − ȲC )2
" " " "
SSwithin a b c
σ2
abc(n – 1)
Unequal sample size
It is best to design your studies such that you have equal sample
sizes in each cell. However, once in a while this is not possible.
Even worse, the F tests for the upper levels in the ANOVA table no
longer have a clear null distribution.
Two-way ANOVA
Treatment
Gender 1 2
709 592
Male 679 538
699 476
657 508
Female 594 505
677 539
Let
r be the number of rows in the two-way ANOVA,
700
650
Response
600
550
500
Treatment
Gender 1 2
−→ This table shows the cell, row, and column means, plus the
overall mean.
Two-way ANOVA table
− Ȳ···)2
"
between rows SSrows=c n i (Ȳi·· r–1
− Ȳ···)2
"
between columns SScolumns=r n j (Ȳ·j· c–1
− Ȳij·)2
"""
error SSwithin= i j k (Yijk rc(n – 1)
− Ȳ···)2
"""
total SStotal= i j k (Yijk rcn – 1
Example
Let Yijk be the kth item in the subgroup representing the ith group
of factor A (r levels) and the jth group of factor B (c levels). We
write
yijk = ȳ··· + (ȳi·· − ȳ···) + (ȳ·j· − ȳ···) + (ȳij· − ȳi·· − ȳ·j· + ȳ···) + (yijk − ȳij·)
cn − Ȳ···)2
"
i (Ȳi·· cn ! 2
between rows σ2 + αi
r−1 r−1
i
rn − Ȳ···)2
"
j (Ȳ·j· rn ! 2
between columns σ2 + βj
c−1 c−1
j
− Ȳij·)2
"""
i j k (Yijk
error σ2
r c (n − 1)
This is for fixed effects, and equal number of observations per cell!
Example (continued)
source SS df MS F p-value
Let Yijk be the kth item in the subgroup representing the ith group
of factor A (r levels) and the jth group of factor B (c levels). We
write
cn ! 2 cn ! 2
between rows σ2 + αi σ 2 + n σR2 ×C + c n σR2 σ 2 + n σR2 ×C + αi
r–1 r–1
i i
rn ! 2
between columns σ2 + βj σ 2 + n σR2 ×C + r n σC2 σ2 + r n σC2
c–1
j
n !!
interaction σ2 + γij2 σ 2 + n σR2 ×C σ 2 + n σR2 ×C
(r – 1)(c – 1)
i j
error σ2 σ2 σ2
Physician
Concentration A B C
source df SS MS
physician 2 2.79 1.39
concentration 6 12.54 2.09
interaction 12 4.11 0.34
total 20
In general, we have:
cn ! 2 cn ! 2
between rows σ2 + αi σ 2 + n σR2 ×C + c n σR2 σ 2 + n σR2 ×C + αi
r–1 r–1
i i
rn ! 2
between columns σ2 + βj σ 2 + n σR2 ×C + r n σC2 σ2 + r n σC2
c–1
j
n !!
interaction σ2 + γij2 σ 2 + n σR2 ×C σ 2 + n σR2 ×C
(r – 1)(c – 1)
i j
error σ2 σ2 σ2
Expected mean squares
c ! 2 c ! 2
between rows σ2 + αi σ 2 + c σR2 σ2 + αi
r–1 r–1
i i
r ! 2
between columns σ2 + βj σ 2 + r σC2 σ 2 + r σC2
c–1
j
error σ2 σ2 σ2
c ! 2 c ! 2
between rows σ2 + αi σ 2 + σR2 ×C + c σR2 σ 2 + σR2 ×C + αi
r–1 r–1
i i
r ! 2
between columns σ2 + βj σ 2 + σR2 ×C + r σC2 σ2 + r σC2
c–1
j
1 !!
interaction σ2 + γij2 σ 2 + σR2 ×C σ 2 + σR2 ×C
(r – 1)(c – 1)
i j
error σ2 σ2 σ2