QRM - Week 3 Lecture - Canvas
QRM - Week 3 Lecture - Canvas
GROUPS/CONDITIONS
Sharon Morein
[email protected]
Comparing 3 groups or more
This lecture:
Between-Subjects ANOVA
• Assumptions
• What to do if assumptions are violated?
• Interpretation and follow-up analyses
• Reporting results
Group 2
Group 1
Example – influence of drug on libido
• Each observation is
comprised of 3
components:
• Grand mean
• (increment or
decrement of the
specific group mean)
• Noise/error
(Individual
differences,
measurement error
etc.)
Partitioning of variance
systematic/explained variance(+error variance)
error (unexplained variance)
SST SS M SS R
dfT dfM dfR
SS R ( xi xi ) 2 df R N k
SS M ni (x i x grand )2 df M k 1
SST (x i x grand )2 dfT N 1
About a name: it’s all about the variances
Logic
• We calculate how much variability there is between all scores
Total Sum of squares (SST).
• We then calculate how much of this variability can be explained
by the ‘model’ we fit to the data
Model Sum of Squares (SSM) [variability due to the experimental
manipulation]
• And how much of this variability cannot be explained by our
‘model’,
Residual Sum of Squares (SSR) [variability due to error, e.g.,
individual differences in performance]
• We compare the amount of variability explained by the ‘model’
(experiment), to the error in the model (individual differences)
• This ratio is called the F-ratio.
• If the model explains a lot more variability than it can’t explain,
then the experimental manipulation has had a significant effect
on the outcome (DV).
DF
• Degrees of Freedom (df) are the number of values that are free
to vary.
• In general, the df are one less than the number of values used
to calculate the SS*
• Example: how many values are free to vary in a group of 10
observations if I HAVE to have a mean of 70? [ans=9]
• A mathematical restriction that needs to be put in place when
estimating one statistic from an estimate of another [you lose
one df for each parameter estimated prior to estimating the
(residual) standard deviation]
SS M SS R
MS M MS R
df M df R
• 2 df MS M
F
MSR
• The larger the F…
• F<1 …
• F(1,dfx)=t(dfx)2
Assumptions
• Errors ϵij ~NID (0, σ2ϵ)
• Normal distribution within groups
• Independent Distribution of observations
• Homogeneity of variance (HOV)
• Variances for each experimental group do not differ (‘can be tested
with Levene’s test’, but bear in mind the n…)
• Non-parametric option
• Kruskall-Wallis: just like a M-W but with more than 2 groups
• Combine everyone and order(rank) them: if the sum of ranks is
similar, the groups likely don’t differ
Results and follow-up
Omnibus: let’s you in, but… not enough
• Multiple t-tests
A bad idea
We need control of type I error: else, inflation!
• Post Hoc Tests
• Not Planned (no hypothesis)
• Compare (all*) (pairs* of) means
• Contrasts/ Planned Comparisons
• Hypothesis driven
• Planned a priori
• Orthogonal
• Trend Analysis – for ordinal IVs only
Contrasts & Orthogonal Comparisons
• Breaking down the variability between groups selectively
• ONLY K-1 comparisons (K= total number of groups in ANOVA)
• Why “contrast”? We end up with two ‘means’ – weights sum to 0 so
always F(1,X)
• Simple (pairwise) versus complex contrasts (e.g., B and A below)
• Number of levels
• Linear trend only with 2 and more
• Quadratic trend only with 3 and more
• Random vs. fixed effects (round 1)
• Different calculations
• Strong implications in some domains
(e.g., imaging)
Post-hoc Tests
• The Big Issue: too little or too much type I error control
• Too liberal vs. not enough power/ too conservative
• Per contrast vs. set of contrasts (family of contrasts)
• Many post hoc tests (e.g., Jamovi has 5, SPSS has 18 options!) – still lots of
disparity of opinions
• Some are very Liberal (LSD, N-K)
• Some intermediate (Bonferroni, Tukey, Holm)
• Some Conservative (Scheffe- all possible contrasts, including complex ones)
• Current (QRM) recommendations:
Assumptions (reasonably) met with equal n’s:
• Tukey HSD (Honest Significant Difference)
Safe Option with small k:
• Bonferroni, Holm
Unequal Variances:
• Games-Howell
Unequal Sample Sizes:
• Gabriel’s (small n), Hochberg’s GT2 (large n).
Post-hoc Tests Bonferroni
Number of Tests
• Bonferroni
• Select some pairwise comparisons (let’s say n)
• New αn= 0.05 / total number of tests made 0.05/3 = .0167
• Essentially T-tests, each with new level of αn
• Tukey
• All pairwise comparisons
• Tends to be conservative with unequal n’s
• Scheffe
• All possible simple and complex comparisons
• Most conservative and reduced power
Effect sizes in ANOVAs
• Eta-squared (biased: uses sum of squares and is a
function of what else we have in the model)
• proportion of variance attributable to the effect 𝑆𝑆𝑀 𝑆𝑆𝑒𝑓𝑓𝑒𝑐𝑡
𝑟2 η2
• Sample specific, overestimates the effect size 𝑆𝑆𝑇 𝑆𝑆𝑡𝑜𝑡𝑎𝑙
• Partial 𝑆𝑆𝑒𝑓𝑓𝑒𝑐𝑡
η𝑝 2
3 groups: A A B B C C
rank rank rank
6.4 11 2.5 2 1.3 1
6.8 12 3.7 3 4.1 4
7.2 13 4.9 5.5 4.9 5.5
• Follow-up
• Post hocs 8.4 18 5.4 8 5.5 9
• Direct comparisons 9.1 19 8.1 14
between groups 9.7 21
(prevent α inflation!) Sum ranks 131 58 42
Avg ranks 13.1 6.44 5.25
Repeated Measures (within subjects)
• Same participants contribute to different means
• Extension of dependent t-test, where there are more than 2
conditions/means to compare
• Minimizes/shrinks the error variance
• Advantages and disadvantages
• Useful when controlling for individual differences, more sensitive
• More economic/efficient (time and ££££)
• Carryover effects (practice, fatigue etc.) -> experimental design
essential
SST SS M SS R SSsubjects
Shrinkage
achieved!
Partitioning of variance
SS R ( xi xi ) 2
Variation among individuals: we’ve got multiple
measures for each person df R N k
SS M ni (x i x grand )2 df M k 1 MS M
F
SST (x i x grand )2 dfT N 1 MS Re s
Assumptions
• Normality
• Independence of observations (unless the repeated one)
• HOV doesn’t make sense here
• Sphericity (less restrictive form of compound symmetry)
• Need at least 3 means for sphericity
• The differences between each two means (treatment levels) needs to
fulfil HOV