0% found this document useful (0 votes)
28 views25 pages

QRM - Week 3 Lecture - Canvas

The document discusses comparing more than two groups or conditions using analysis of variance (ANOVA). It covers assumptions, what to do if assumptions are violated, interpretation and follow-up analyses, and reporting results. It also briefly discusses within-subjects ANOVA and different post hoc tests that can be used as follow ups.

Uploaded by

rxh6kp5pp9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views25 pages

QRM - Week 3 Lecture - Canvas

The document discusses comparing more than two groups or conditions using analysis of variance (ANOVA). It covers assumptions, what to do if assumptions are violated, interpretation and follow-up analyses, and reporting results. It also briefly discusses within-subjects ANOVA and different post hoc tests that can be used as follow ups.

Uploaded by

rxh6kp5pp9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

COMPARING MORE THAN 2

GROUPS/CONDITIONS
Sharon Morein
[email protected]
Comparing 3 groups or more
This lecture:
Between-Subjects ANOVA
• Assumptions
• What to do if assumptions are violated?
• Interpretation and follow-up analyses
• Reporting results

Within-subjects ANOVA (just a little bit today, more next


week)
• What to do if assumptions are violated?
Independent ANOVA
• Can compare 2 means* and more (3, 4 etc.)
• Multiple t-tests inflate type I error [with α=.05: FWE= 1-
(0.95)k] → too liberal
• Workhorse – certainly within experimental design

• ANOVA Omnibus test (F)


• H0: μ1=μ2 = μ3=…μk
• H1: ‘at least one mean is different from all the others’
• Usually by itself isn’t particularly useful – needs following
up tests
Example: 4 different
treatments for MDD
(so k=4)
Group 3

Group 2
Group 1
Example – influence of drug on libido
• Each observation is
comprised of 3
components:
• Grand mean
• (increment or
decrement of the
specific group mean)
• Noise/error
(Individual
differences,
measurement error
etc.)
Partitioning of variance
systematic/explained variance(+error variance)
error (unexplained variance)

SST  SS M  SS R
dfT  dfM  dfR

SS R   ( xi  xi ) 2 df R   N  k 
SS M   ni (x i  x grand )2 df M  k  1
SST   (x i  x grand )2 dfT   N  1
About a name: it’s all about the variances
Logic
• We calculate how much variability there is between all scores
Total Sum of squares (SST).
• We then calculate how much of this variability can be explained
by the ‘model’ we fit to the data
Model Sum of Squares (SSM) [variability due to the experimental
manipulation]
• And how much of this variability cannot be explained by our
‘model’,
Residual Sum of Squares (SSR) [variability due to error, e.g.,
individual differences in performance]
• We compare the amount of variability explained by the ‘model’
(experiment), to the error in the model (individual differences)
• This ratio is called the F-ratio.
• If the model explains a lot more variability than it can’t explain,
then the experimental manipulation has had a significant effect
on the outcome (DV).
DF
• Degrees of Freedom (df) are the number of values that are free
to vary.
• In general, the df are one less than the number of values used
to calculate the SS*
• Example: how many values are free to vary in a group of 10
observations if I HAVE to have a mean of 70? [ans=9]
• A mathematical restriction that needs to be put in place when
estimating one statistic from an estimate of another [you lose
one df for each parameter estimated prior to estimating the
(residual) standard deviation]
SS M SS R
MS M  MS R 
df M df R

• Mean square – we don’t want to be influenced by the number


of observations, so divide by df (extrapolating to the population)
F ratio – general behaviour

• 2 df MS M
F 
MSR
• The larger the F…

• F<1 …

• F(1,dfx)=t(dfx)2
Assumptions
• Errors ϵij ~NID (0, σ2ϵ)
• Normal distribution within groups
• Independent Distribution of observations
• Homogeneity of variance (HOV)
• Variances for each experimental group do not differ (‘can be tested
with Levene’s test’, but bear in mind the n…)

[FYI: sometimes you will see assumptions on observations


(xij) and sometimes on errors (ϵij)– since one is a linear
transformation of the other the assumptions are the same]
Violation of Assumptions
When group sizes are equal – ANOVA ‘relatively’ robust to
violations of normality and HOV [see also p. 536]
• Relatively robust (in terms of type I and II error control)
• Not to the assumption of independence
• Observations between groups HAVE to be independent, no
correlation between them allowed : else type I error could go up
really fast (α>0.5)!
When group sizes unequal –
• Normality violation: F is affected in non-predictable ways
• HOV violation:
• Too conservative if larger group has larger variance
• Too liberal if larger group has smaller variance (reduced α control)
Alternatives and corrections
• Transform

• Corrected F tests: [get around some of the unequal n’s


problem]
• HOV violation: Welch F (slightly better for power) > followed by
Bootstrapped 95%CI
• Brown-Forsythe (not popular)

• Non-parametric option
• Kruskall-Wallis: just like a M-W but with more than 2 groups
• Combine everyone and order(rank) them: if the sum of ranks is
similar, the groups likely don’t differ
Results and follow-up
Omnibus: let’s you in, but… not enough
• Multiple t-tests
A bad idea
We need control of type I error: else, inflation!
• Post Hoc Tests
• Not Planned (no hypothesis)
• Compare (all*) (pairs* of) means
• Contrasts/ Planned Comparisons
• Hypothesis driven
• Planned a priori
• Orthogonal
• Trend Analysis – for ordinal IVs only
Contrasts & Orthogonal Comparisons
• Breaking down the variability between groups selectively
• ONLY K-1 comparisons (K= total number of groups in ANOVA)
• Why “contrast”? We end up with two ‘means’ – weights sum to 0 so
always F(1,X)
• Simple (pairwise) versus complex contrasts (e.g., B and A below)

Orthogonal (independent) are ‘cleanest’ but depends on purpose


How to figure if 2 contrasts are orthogonal? multiplying the weights will
result in 0 [linear transformation of weights does not change contrast]
• If there are more than 3 groups (K>3), there will be different sets of
orthogonal contrasts

• Options to select from


Placebo Lo dose Hi dose Σ
• Can create your own
• Non-orthogonal Contrast 2 -1 -1 0
A
Be sure to justify
Contrast 0 1 -1 0
B
A*B 0 -1 1 0
A special kind of contrast - trend analysis
• Only makes sense when k groups (IV) vary on ordinal (or
above) scale
• e.g., study duration, stimulus duration or retention time
• e.g., class size
• e.g., drug dosage

• Number of levels
• Linear trend only with 2 and more
• Quadratic trend only with 3 and more
• Random vs. fixed effects (round 1)
• Different calculations
• Strong implications in some domains
(e.g., imaging)
Post-hoc Tests
• The Big Issue: too little or too much type I error control
• Too liberal vs. not enough power/ too conservative
• Per contrast vs. set of contrasts (family of contrasts)
• Many post hoc tests (e.g., Jamovi has 5, SPSS has 18 options!) – still lots of
disparity of opinions
• Some are very Liberal (LSD, N-K)
• Some intermediate (Bonferroni, Tukey, Holm)
• Some Conservative (Scheffe- all possible contrasts, including complex ones)
• Current (QRM) recommendations:
Assumptions (reasonably) met with equal n’s:
• Tukey HSD (Honest Significant Difference)
Safe Option with small k:
• Bonferroni, Holm
Unequal Variances:
• Games-Howell
Unequal Sample Sizes:
• Gabriel’s (small n), Hochberg’s GT2 (large n).
Post-hoc Tests Bonferroni  

Number of Tests
• Bonferroni
• Select some pairwise comparisons (let’s say n)
• New αn= 0.05 / total number of tests made 0.05/3 = .0167
• Essentially T-tests, each with new level of αn
• Tukey
• All pairwise comparisons
• Tends to be conservative with unequal n’s
• Scheffe
• All possible simple and complex comparisons
• Most conservative and reduced power
Effect sizes in ANOVAs
• Eta-squared (biased: uses sum of squares and is a
function of what else we have in the model)
• proportion of variance attributable to the effect 𝑆𝑆𝑀 𝑆𝑆𝑒𝑓𝑓𝑒𝑐𝑡
𝑟2 η2
• Sample specific, overestimates the effect size 𝑆𝑆𝑇 𝑆𝑆𝑡𝑜𝑡𝑎𝑙

• Partial 𝑆𝑆𝑒𝑓𝑓𝑒𝑐𝑡
η𝑝 2

• How much of the variance in scores is accounted 𝑆𝑆𝑒𝑓𝑓𝑒𝑐𝑡 𝑆𝑆𝑡𝑜𝑡𝑎𝑙


for by the effect
• Proportion of variance attributable to the effect(+error)

• Omega squared 𝑆𝑆𝑀𝑠 𝑑𝑓𝑀 𝑀𝑆𝑅


• Variance accounted for by the effect in the population
ω 2
𝑆𝑆𝑇 𝑀𝑆𝑅

• Interclass correlation [not commonly used in


traditional ANOVA designs]
Non-parametric option to ANOVA
Kruskal–Wallis test
Common reporting:
“Aesthetic quality of wine judgements were significantly
affected by quality expectations, H(2) = 9.66, p = .022.”

3 groups: A A B B C C
rank rank rank
6.4 11 2.5 2 1.3 1
6.8 12 3.7 3 4.1 4
7.2 13 4.9 5.5 4.9 5.5
• Follow-up
• Post hocs 8.4 18 5.4 8 5.5 9
• Direct comparisons 9.1 19 8.1 14
between groups 9.7 21
(prevent α inflation!) Sum ranks 131 58 42
Avg ranks 13.1 6.44 5.25
Repeated Measures (within subjects)
• Same participants contribute to different means
• Extension of dependent t-test, where there are more than 2
conditions/means to compare
• Minimizes/shrinks the error variance
• Advantages and disadvantages
• Useful when controlling for individual differences, more sensitive
• More economic/efficient (time and ££££)
• Carryover effects (practice, fatigue etc.) -> experimental design
essential

• Hypotheses – same as between subjects ANOVA


Partitioning of variance
Variation between individuals

SST  SS M  SS R SSsubjects

dfT  dfM  dfW SSerror


MS M
F
SS error ( residual )  SSTotal  SSM  SSSubjects
MS Re s

Shrinkage
achieved!
Partitioning of variance
SS R   ( xi  xi ) 2
Variation among individuals: we’ve got multiple
measures for each person df R   N  k 

SS subjects   ( xsubject  x grand ) 2 SSW   ( xw / subject  x grand ) 2 dfW  n  1

SS error / residual  SSTotal  SSM  SSSubjects dferror  k  1)(n  k 

SS M   ni (x i  x grand )2 df M  k  1 MS M
F
SST   (x i  x grand )2 dfT   N  1 MS Re s
Assumptions
• Normality
• Independence of observations (unless the repeated one)
• HOV doesn’t make sense here
• Sphericity (less restrictive form of compound symmetry)
• Need at least 3 means for sphericity
• The differences between each two means (treatment levels) needs to
fulfil HOV

• To assess sphericity – Mauchly’s test (H0: variances of the


differences between conditions are equal)
• If Mauchly’s n.s., (and depending on n) – all good
• But if Mauchly’s p<.05…
If sphericity is violated?
Or we don’t trust the Mauchly’s test:
Need to “pay a price”
• Greenhouse-Geisser estimate
• Huynh-Feldt estimate

• Non-parametric alternative (Friedman’s test)


Non-parametric alternative to repeated
measures ANOVA
e.g., if your data is
ordinal:
Friedman’s ANOVA
• The preference of
participants did not
significantly differ
between the three Original Measure Ranked Measure
chocolate brands, χ2(2) sub A B C A B C
= 0.20, p = .91
1 5 2 2 3 1.5 1.5
2 4.5 4 5 2 1 3
3 2 1 1 3 1.5 1.5

10 2 5 1 2 3 1

You might also like