0% found this document useful (0 votes)
15 views14 pages

1 3 Multiple Hypothesis Testing

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views14 pages

1 3 Multiple Hypothesis Testing

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

MITx:

Statistics, Computation & Applications

Statistics Refresher
Lecture 3: Multiple Hypothesis Testing

Caroline Uhler (MIT) MITx: Statistics, Computation & Applications Lecture 3 1/9
Some quotes and research findings

Giovannucci et al., Journal of the National Cancer Institute 87 (1995):


Intake of tomato sauce (p-value of 0.001), tomatoes (p-value of 0.03),
and pizza (p-value of 0.05) reduce the risk of prostate cancer;
But for example tomato juice (p-value of 0.67), or cooked spinach
(p-value of 0.51), and many other vegetables are not significant.

Caroline Uhler (MIT) MITx: Statistics, Computation & Applications Lecture 3 2/9
Some quotes and research findings

Giovannucci et al., Journal of the National Cancer Institute 87 (1995):


Intake of tomato sauce (p-value of 0.001), tomatoes (p-value of 0.03),
and pizza (p-value of 0.05) reduce the risk of prostate cancer;
But for example tomato juice (p-value of 0.67), or cooked spinach
(p-value of 0.51), and many other vegetables are not significant.

”Orange cars are less likely to have serious damages that are discovered
only after the purchase.”

Caroline Uhler (MIT) MITx: Statistics, Computation & Applications Lecture 3 2/9
Jelly Beans and Acne

Caroline Uhler (MIT) MITx: Statistics, Computation & Applications Lecture 3 3/9
Problematic of selective inference

http://imgs.xkcd.com/comics/significant.png
Caroline Uhler (MIT) MITx: Statistics, Computation & Applications Lecture 3 4/9
Wonder-syrup

randomized group of 1000 people

measure 100 variables before and after taking the syrup: weight,
blood pressure, etc.

perform a paired t-test with a significance level of 5%

Caroline Uhler (MIT) MITx: Statistics, Computation & Applications Lecture 3 5/9
Wonder-syrup

randomized group of 1000 people

measure 100 variables before and after taking the syrup: weight,
blood pressure, etc.

perform a paired t-test with a significance level of 5%

V := # false significant tests: V ∼ Binomial(100, 0.05)

⇒ in average 5 out of 100 variables show a significant effect!

Caroline Uhler (MIT) MITx: Statistics, Computation & Applications Lecture 3 5/9
Wonder-syrup

randomized group of 1000 people

measure 100 variables before and after taking the syrup: weight,
blood pressure, etc.

perform a paired t-test with a significance level of 5%

V := # false significant tests: V ∼ Binomial(100, 0.05)

⇒ in average 5 out of 100 variables show a significant effect!

Caroline Uhler (MIT) MITx: Statistics, Computation & Applications Lecture 3 5/9
Different protection levels

Compute p-values using methods that control:

family-wise error rate (FWER) ≤ α, where

FWER = P(at least one false significant result)

false discovery rate (FDR) ≤ α, where

FDR = expected fraction of false significant results


among all significant results

Caroline Uhler (MIT) MITx: Statistics, Computation & Applications Lecture 3 6/9
Corrections for multiple testing
Bonferroni correction:
Reject H0 when: m · p-value ≤ α
where m is the total number of hypothesis tests performed
Bonferroni correction implies FWER ≤ α

Caroline Uhler (MIT) MITx: Statistics, Computation & Applications Lecture 3 7/9
Corrections for multiple testing
Bonferroni correction:
Reject H0 when: m · p-value ≤ α
where m is the total number of hypothesis tests performed
Bonferroni correction implies FWER ≤ α

Holm-Bonferroni correction:
Sort p-values in increasing order: p(1) ≤ · · · ≤ p(m)
Reject H0 when: (m − i + 1)p(i) ≤ α (more power than Bonferroni)
Holm-Bonferroni correction implies FWER ≤ α

Caroline Uhler (MIT) MITx: Statistics, Computation & Applications Lecture 3 7/9
Corrections for multiple testing
Bonferroni correction:
Reject H0 when: m · p-value ≤ α
where m is the total number of hypothesis tests performed
Bonferroni correction implies FWER ≤ α

Holm-Bonferroni correction:
Sort p-values in increasing order: p(1) ≤ · · · ≤ p(m)
Reject H0 when: (m − i + 1)p(i) ≤ α (more power than Bonferroni)
Holm-Bonferroni correction implies FWER ≤ α

Benjamini-Hochberg correction:
Sort p-values in increasing order: p(1) ≤ · · · ≤ p(m)
Reject H0 when: mp(i) /i ≤ α
Benjamini-Hochberg correction implies FDR ≤ α
Caroline Uhler (MIT) MITx: Statistics, Computation & Applications Lecture 3 7/9
Commonly accepted practice

No correction for multiple testing when generating hypotheses (but


report number of tests performed)

FDR ≤ 10% in exploratory analysis or screening


balance between high power and low # of false significant results

FWER ≤ 5% in confirmatory analysis


food and drug administration (FDA)

Caroline Uhler (MIT) MITx: Statistics, Computation & Applications Lecture 3 8/9
References

Lecture by Yoav Benjamini, THE expert for multiple testing issues:

http://simons.berkeley.edu/talks/yoav-benjamini-2013-12-11a

Caroline Uhler (MIT) MITx: Statistics, Computation & Applications Lecture 3 9/9

You might also like