Module 4 - Hypothesis Testing - 1pp
Module 4 - Hypothesis Testing - 1pp
Module 3 Recap
Go to menti.com and enter the code 87 01 65 7
Timothy Dobbins
Learning objectives
Formulate a research question as a hypothesis
Understand how 95% con dence intervals can be used to test a hypothesis
fi
Worked example: Section 4.7
The diastolic blood pressure of 733 Pima Native American women was
measured
Histogram showed that diastolic blood pressure was approximately Normally
distributed
Sample mean: 72.4 mmHg
Sample standard deviation: 12.38 mmHg
95% CI: 72.4 ± 1.96 × 12.4 / √733
= 71.5 to 73.3 mmHg
Worked example: Section 4.7
Mean diastolic blood pressure: 72.4 mmHg (95% CI: 71.5 to 73.3)
How do we interpret this?
Worked example: Section 4.7
Mean diastolic blood pressure: 72.4 mmHg (95% CI: 71.5 to 73.3)
Are these results consistent with the sample being drawn from the general US
population (mean = 71 mmHg)?
Worked example: Section 4.7
95% con dence interval can give informal evidence that the sample is not
consistent with the population
i.e. that the sample is different from the population
Con dence interval gives a range within which we are reasonably con dent that
the true mean lies
Does not give a quantitative assessment of evidence against a null-hypothesis of
equality to a hypothesised mean value
fi
fi
Hypothesis testing
Hypothesis: an idea or explanation for something that is based on known facts but
has not yet been proved
(ref: https://dictionary.cambridge.org/dictionary/english/hypothesis)
Hypothesis testing: use data to decide between two competing claims about a
population parameter
Two hypotheses:
• Null hypothesis
• Alternative hypothesis
Hypothesis testing: general framework
1. Formulate the null and alternative hypotheses
2. Calculate a test statistic from data
3. Calculate a P-value from the test statistic
4. Calculate descriptive statistics
5. Make a conclusion
Null hypothesis Alternative hypothesis
Often written as H0 (H zero) Often written as HA
The hypothesis of no difference, no The hypothesis of a difference, a
change, no effect change, an effect
Boring hypothesis Interesting hypothesis
Skeptical hypothesis
Worked example: Section 4.7
The diastolic blood pressure of 733 Pima Native American women was
measured
Sample mean: 72.4 mmHg
Sample standard deviation: 12.4 mmHg
Are these results consistent with the sample being drawn from the general US
population (71 mmHg)?
Worked example: Section 4.7
H0: The mean DBP of Pima Native American women is the same as the general
US population
HA: The mean DBP of Pima Native American women is not the same as the
general US population
HA: The mean DBP of Pima Native American women is not 71mmHg
H0: μ = 71
HA: μ ≠ 71
Hypothesis testing: general framework
1. Formulate the null and alternative hypotheses
2. Calculate a test statistic from data
Done by software
3. Calculate a P-value from the test statistic
4. Calculate descriptive statistics
5. Make a conclusion
The test statistic
Depends on the test being carried out
Test statistic for one-sample t-test:
• how many standard errors is the sample mean from the hypothesised mean
− −
= = with n–1 degrees of freedom
• ( ) /
𝑠
𝑛
𝑆
𝐸
𝑥
𝑡
𝑥
𝑥
𝜇
𝜇
Worked example: Section 4.7
72.4052 − 71
= = 3.07
12.38216/ 733
with 732 degrees of freedom
𝑡
Hypothesis testing: general framework
1. Formulate the null and alternative hypotheses
2. Calculate a test statistic from data
Done by software
3. Calculate a P-value from the test statistic
4. Calculate descriptive statistics
5. Make a conclusion
The P-value
Formal de nition: P is the probability of obtaining the observed test statistic or
more extreme if the null hypothesis is true
From Worked Example:
x̅ = 72.4 mmHg
s = 12.4 mmHg
SE(x̅) = 0.4573 mmHg
From Module 3: sample mean follows a t-distribution with mean μ, sd=0.4573
and (733–1) df
If H0 is true, μ = 71
fi
If H0 is true
0.0011
Revisit our hypotheses
H0: The mean DBP of Pima Native American women is the same as the general
US population
HA: The mean DBP of Pima Native American women is not the same as the
general US population
H0 can be false because the Pima women have higher diastolic blood pressure
or because they have lower diastolic blood pressure
This is a two-tailed test
If H0 is true
P = 2 × 0.0011
= 0.0022
0.0011 0.0011
The P-value
Formal de nition: P is the probability of obtaining the observed test statistic or
more extreme if the null hypothesis is true
Less formal de nitions
• P is the probability of observing a test statistic like yours (or more extreme) if the null
hypothesis is true
• P re ects the strength of the evidence against the null hypothesis
• The smaller the P-value, the more evidence against the null hypothesis
fl
fi
fi
The P-value
P re ects the strength of the evidence against the null hypothesis
Allows us to reject the null hypothesis if P is small (often <0.05)
The opposite of reject is not accept
1.0 1.0
Little or no evidence
0.10
Weak evidence
0.05 0.05
Evidence
0.01
Signi cant Strong evidence
0.001
Very strong evidence
0 0
The P-value is not: the probability of the null hypothesis being true, given your
data
Data
✘ Probability that H0 is true
Worked example: Section 4.7
72.4052 − 71
= = 3.07 with 732 degrees of freedom
12.38216/ 733
P = 0.0022
We have strong evidence to reject the null hypothesis
There is strong evidence that Pima women have a higher diastolic blood
pressure than the general US population
𝑡
Imagine a different sample
P = 2 × 0.092
= 0.184
Little or no evidence
of a di erence
0.092
ff
Two types of errors
Type I error: Null hypothesis is rejected when the null hypothesis is true
• False positive
• Minimise this by requiring strong levels of evidence
• Traditionally called "Signi cance level" (e.g. P<0.05 or P<0.01)
fi
Type I Error
• False negative
• Minimise this by enrolling an adequately sized sample
• Power: to be discussed in Module 10
Type II Error
🔥
Dead Battery by Viktor Ostrovsky from the Noun Project
Smoke Detector by Gregor Cresnar from the Noun Project
Hypothesis testing: general framework
1. Formulate the null and alternative hypotheses
2. Calculate a test statistic from data
3. Calculate a P-value from the test statistic
4. Calculate descriptive statistics
5. Make a conclusion
Hypothesis test vs confidence interval
Hypothesis test: There is strong evidence (P=0.002) that Pima women have a
higher diastolic blood pressure than the general US population
Provide both to give a reader the level of evidence and the size of the effect (with
its con dence interval)
If P<0.05, the 95% Con dence Interval will not cross the null value
• Pima example: μ = 71
• Hypothesis test: There is strong evidence (P=0.002) that Pima women have a
higher diastolic blood pressure than the general US population
• Mean diastolic blood pressure: 72.4 mmHg (95% CI: 71.5 to 73.3)
fi
fi
Null value depends on the test
Number of groups Type of data Measure of effect Null value
Difference in proportions 0
Two Binary Odds ratio 1
Relative risk 1
Statistical significance vs practical importance
De nitely Possibly Not Inconclusive True
E ect size important important important result negative
(e.g. di erence
in means)
Practically
De ned by
researcher
important
di erence
Null value 0
Ref: Bland JM, Altman DG. Statistics Notes: One and two sided tests of signi cance. BMJ 1994;309:248
fi
Putting it all together: Stata
Statistics >
Summaries, tables and tests >
Classical tests of hypotheses >
t test (mean-comparison test)
Putting it all together: Stata
One-sample t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
dbp | 733 72.40518 .4573454 12.38216 71.50732 73.30305
------------------------------------------------------------------------------
mean = mean(dbp) t = 3.0725
Ho: mean = 71 degrees of freedom = 732
≠
Ha: mean < 71 Ha: mean != 71 Ha: mean > 71
Pr(T < t) = 0.9989 Pr(|T| > |t|) = 0.0022 Pr(T > t) = 0.0011
1-sided test 2-sided test 1-sided test
HA: μ < 71 HA: μ ≠ 71 HA: μ > 71
Putting it all together: R
> t.test(pima$dbp, mu=71)
data: pima$dbp
t = 3.0725, df = 732, p-value = 0.002202
alternative hypothesis: true mean is not equal to 71 2-sided test
95 percent confidence interval: HA: μ ≠ 71
71.50732 73.30305
sample estimates:
mean of x
72.40518
Hypothesis testing: general framework
1. Formulate the null and alternative hypotheses
2. Calculate a test statistic from data
3. Calculate a P-value from the test statistic
4. Calculate descriptive statistics
5. Make a conclusion
Putting it all together: Stata
One-sample t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
dbp | 733 72.40518 .4573454 12.38216 71.50732 73.30305
------------------------------------------------------------------------------
mean = mean(dbp) t = 3.0725
Ho: mean = 71 degrees of freedom = 732
≠
Ha: mean < 71 Ha: mean != 71 Ha: mean > 71
Pr(T < t) = 0.9989 Pr(|T| > |t|) = 0.0022 Pr(T > t) = 0.0011
The average diastolic blood pressure of Pima women was estimated as 72.4 mmHg (95%
con dence interval: 71.5 to 73.3 mmHg).
There is strong evidence (t=3.07 with 732 df, P=0.002) that the average diastolic blood
pressure of Pima women is higher than the general US population (71 mmHg).
fi
Putting it all together: R
One Sample t-test
data: pima$dbp
t = 3.0725, df = 732, p-value = 0.002202
alternative hypothesis: true mean is not equal to 71
95 percent confidence interval:
71.50732 73.30305
sample estimates:
mean of x
72.40518
The average diastolic blood pressure of Pima women was estimated as 72.4 mmHg (95%
con dence interval: 71.5 to 73.3 mmHg).
There is strong evidence (t=3.07 with 732 df, P=0.002) that the average diastolic blood
pressure of Pima women is higher than the general US population (71 mmHg).
fi
Guidelines for presenting results from hypothesis tests
Instead:
‘we did not see evidence of a drug effect’,
‘we were unable to demonstrate a difference between groups’,
or simply ‘there was no statistically signi cant difference in response rates’
fi
Guidelines for presenting results from hypothesis tests
P-values just above 5% are not a trend, and they are not moving
Avoid: P=0.07 shows a ‘trend towards statistical signi cance’ or ‘approaches
statistical signi cance’
Alternative language: ‘although we saw some evidence of improved response
rates in patients receiving the novel procedure, differences between groups did
not meet conventional levels of statistical signi cance’
fi
Summary
Hypothesis tests and 95% con dence intervals are complementary
Hypothesis tests formally assess your research question
• do not provide estimates of the size of the effect
95% con dence intervals can be used informally to test a hypothesis
• comparing the interval to the null value
• do not quantify the level of evidence against the null hypothesis
fi
fi
Summary
One-sample t-test
• compares the mean from a single group to a hypothesised value
One- vs two-tailed statistical tests
• use two-tailed unless you have very strong reasons not to
Report the results from your hypothesis test in clear, easy to read language
Statistical significance is not the
same as practical importance
Learning activities
Activity 4.2 discusses hypothesis tests for (b) case-control and (c) cohort
studies
These parts can be omitted if you are not studying Foundations of Epidemiology
Finally
Quiz 2
• Covers Modules 3 and 4
• Opens: Friday, 23 June 2023, 9:00 AM
• Closes: Monday, 3 July 2023, 12:00 PM