week3_slides
week3_slides
Navin Souda
2024-04-15
Multiple Testing I
▶ So we run ANOVA and find statistically significant differences
in the effects/means
▶ But this tells us nothing about which effects are significant
▶ Isn’t that what the t-tests are for?
▶ Sort of, but we have to be careful about our significance level
▶ For testing a single effect (i.e. a binary factor), we usually use
a significance level of 0.05 - practically, this means that we
have an at most 5% chance (on average) of rejecting the null
hypothesis when the null hypothesis is actually true
▶ But as we include more tests, each of them separately has a
5% chance of failing - the probability of them all being correct
simultaneously could actually be much less than 95%
▶ Some boring but important math: let Ai be the event test i is
correct for i = 1, ..., p
▶ Then the probability that at least one test fails,
P((∩i Ai )c ) = 1 − P(∩i Ai ), and if all the
Q tests are
independent, then 1 − P(∩i Ai ) = 1 − i P(Ai ) = 1 − (1 − α)p
Multiple Testing II
350
300
250
1 2 3 4
treatment1
Example (cont.)
500
450
400
response
350
300
250
a b c
treatment2
Example (cont.)
We might want to start by determining whether it is useful to consider
the interaction term. Based on the following plots, does it seem like the
interaction effect would be significant? Why or why not?
500
400
treatment1
response
1
2
3
4
300
Example (cont.)
500
400
treatment2
response
a
b
c
300
1 2 3 4
treatment1
Example (cont.)
Now that we’ve decided to include the interaction term, let’s fit
our ANOVA model. Based on the resulting output, what can we
say about the significance of our treatments with regard to the
response? Which levels of each treatment are significant?
Now that we’ve decided to include the interaction term, let’s fit
our ANOVA model. Based on the resulting output, what can we
say about the significance of our treatments with regard to the
response? Which levels of each treatment are significant?
A: Both treatments are significant, as well as their interaction. We
don’t have enough information to say which levels of treatment are
significant.
Example (cont.) I
According to the Tukey HSD output, which levels of treatment 1
and treatment 2 have significant differences? Does the Bonferroni
adjustment agree with the Tukey HSD? Why or why not?
##
## Pairwise comparisons using t tests with pooled SD
##
## data: dat$response and dat$treatment1
##
## 1 2 3
Example (cont.) III
## 2 1.5e-05 - -
## 3 3.9e-10 0.0032 -
## 4 1.5e-13 1.6e-07 0.0095
##
## P value adjustment method: bonferroni
##
## Pairwise comparisons using t tests with pooled SD
##
## data: dat$response and dat$treatment2
##
## a b
## b 1 -
## c 1 1
##
## P value adjustment method: bonferroni
Example (cont.)
23 23
2
Standardized residuals
20
1
Residuals
0
−1
−20
35
−2
35
−40
15
15
Constant Leverage:
Scale−Location Residuals vs Factor Levels
15
1.5
23 23
2
35
Standardized residuals
Standardized residuals
1
1.0
0
−1
0.5
−2
35
15
0.0
−3
treatment1 :
250 300 350 400 450 500 1 2 3 4