0% found this document useful (0 votes)
26 views

Module 4 - Hypothesis Testing - 1pp

Public Health module, UNSW

Uploaded by

dewinrswr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Module 4 - Hypothesis Testing - 1pp

Public Health module, UNSW

Uploaded by

dewinrswr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

PHCM9795 Lecture 4

Lecture will begin at 18:05 (Sydney time)

Module 3 Recap
Go to menti.com and enter the code 87 01 65 7

Menti recap open until 29 June, 2023


PHCM9795
Foundations of Biostatistics
Module 4: Hypothesis testing

Timothy Dobbins
Learning objectives
Formulate a research question as a hypothesis

Understand how 95% con dence intervals can be used to test a hypothesis

Conduct a hypothesis test using a one-sample t-test

Understand the difference between statistical signi cance and clinical


importance

Understand the difference between one- and two-tailed statistical tests

Report the results of a hypothesis test

fi
Worked example: Section 4.7
The diastolic blood pressure of 733 Pima Native American women was
measured
Histogram showed that diastolic blood pressure was approximately Normally
distributed
Sample mean: 72.4 mmHg
Sample standard deviation: 12.38 mmHg
95% CI: 72.4 ± 1.96 × 12.4 / √733
= 71.5 to 73.3 mmHg
Worked example: Section 4.7
Mean diastolic blood pressure: 72.4 mmHg (95% CI: 71.5 to 73.3)
How do we interpret this?
Worked example: Section 4.7
Mean diastolic blood pressure: 72.4 mmHg (95% CI: 71.5 to 73.3)
Are these results consistent with the sample being drawn from the general US
population (mean = 71 mmHg)?
Worked example: Section 4.7
95% con dence interval can give informal evidence that the sample is not
consistent with the population
i.e. that the sample is different from the population
Con dence interval gives a range within which we are reasonably con dent that
the true mean lies
Does not give a quantitative assessment of evidence against a null-hypothesis of
equality to a hypothesised mean value
fi
fi
Hypothesis testing
Hypothesis: an idea or explanation for something that is based on known facts but
has not yet been proved
(ref: https://dictionary.cambridge.org/dictionary/english/hypothesis)

In Statistics: a hypothesis is a claim about a population parameter

Hypothesis testing: use data to decide between two competing claims about a
population parameter

Two hypotheses:

• Null hypothesis
• Alternative hypothesis
Hypothesis testing: general framework
1. Formulate the null and alternative hypotheses
2. Calculate a test statistic from data
3. Calculate a P-value from the test statistic
4. Calculate descriptive statistics
5. Make a conclusion
Null hypothesis Alternative hypothesis
Often written as H0 (H zero) Often written as HA
The hypothesis of no difference, no The hypothesis of a difference, a
change, no effect change, an effect
Boring hypothesis Interesting hypothesis
Skeptical hypothesis
Worked example: Section 4.7
The diastolic blood pressure of 733 Pima Native American women was
measured
Sample mean: 72.4 mmHg
Sample standard deviation: 12.4 mmHg
Are these results consistent with the sample being drawn from the general US
population (71 mmHg)?
Worked example: Section 4.7
H0: The mean DBP of Pima Native American women is the same as the general
US population

HA: The mean DBP of Pima Native American women is not the same as the
general US population

H0: The mean DBP of Pima Native American women is 71mmHg

HA: The mean DBP of Pima Native American women is not 71mmHg

H0: μ = 71

HA: μ ≠ 71
Hypothesis testing: general framework
1. Formulate the null and alternative hypotheses
2. Calculate a test statistic from data
Done by software
3. Calculate a P-value from the test statistic
4. Calculate descriptive statistics
5. Make a conclusion
The test statistic
Depends on the test being carried out
Test statistic for one-sample t-test:
• how many standard errors is the sample mean from the hypothesised mean
− −
= = with n–1 degrees of freedom
• ( ) /
𝑠
𝑛
𝑆
𝐸
𝑥
𝑡
𝑥
𝑥
𝜇
𝜇
Worked example: Section 4.7
72.4052 − 71
= = 3.07
12.38216/ 733
with 732 degrees of freedom
𝑡
Hypothesis testing: general framework
1. Formulate the null and alternative hypotheses
2. Calculate a test statistic from data
Done by software
3. Calculate a P-value from the test statistic
4. Calculate descriptive statistics
5. Make a conclusion
The P-value
Formal de nition: P is the probability of obtaining the observed test statistic or
more extreme if the null hypothesis is true
From Worked Example:
x̅ = 72.4 mmHg
s = 12.4 mmHg
SE(x̅) = 0.4573 mmHg
From Module 3: sample mean follows a t-distribution with mean μ, sd=0.4573
and (733–1) df
If H0 is true, μ = 71
fi
If H0 is true

0.0011
Revisit our hypotheses
H0: The mean DBP of Pima Native American women is the same as the general
US population
HA: The mean DBP of Pima Native American women is not the same as the
general US population
H0 can be false because the Pima women have higher diastolic blood pressure
or because they have lower diastolic blood pressure
This is a two-tailed test
If H0 is true

P = 2 × 0.0011
= 0.0022

0.0011 0.0011
The P-value
Formal de nition: P is the probability of obtaining the observed test statistic or
more extreme if the null hypothesis is true
Less formal de nitions
• P is the probability of observing a test statistic like yours (or more extreme) if the null
hypothesis is true
• P re ects the strength of the evidence against the null hypothesis
• The smaller the P-value, the more evidence against the null hypothesis
fl
fi
fi
The P-value
P re ects the strength of the evidence against the null hypothesis
Allows us to reject the null hypothesis if P is small (often <0.05)
The opposite of reject is not accept

We do not accept the null hypothesis:


• we fail to reject the null hypothesis
fl
Black and white? Shades of grey

1.0 1.0

Little or no evidence

Not signi cant

0.10

Weak evidence

0.05 0.05
Evidence
0.01
Signi cant Strong evidence
0.001
Very strong evidence
0 0

See Table 4.1 for guidelines


fi
fi
The P-value
The probability of obtaining a test statistic like yours, or more extreme, if the null
hypothesis is true
Probability of obtaining your
Null hypothesis
test statistic (or more extreme)

The P-value is not: the probability of the null hypothesis being true, given your
data

Data
✘ Probability that H0 is true
Worked example: Section 4.7
72.4052 − 71
= = 3.07 with 732 degrees of freedom
12.38216/ 733
P = 0.0022
We have strong evidence to reject the null hypothesis
There is strong evidence that Pima women have a higher diastolic blood
pressure than the general US population
𝑡
Imagine a different sample

P = 2 × 0.092
= 0.184

Little or no evidence
of a di erence

0.092
ff
Two types of errors
Type I error: Null hypothesis is rejected when the null hypothesis is true

• False positive
• Minimise this by requiring strong levels of evidence
• Traditionally called "Signi cance level" (e.g. P<0.05 or P<0.01)

fi
Type I Error

Full Battery by Viktor Ostrovsky from the Noun Project


Toaster by arejoenah from the Noun Project
Smoke Detector by Gregor Cresnar from the Noun Project
Two types of errors
Type II error: Fail to reject the null hypothesis when the null hypothesis is false

• False negative
• Minimise this by enrolling an adequately sized sample
• Power: to be discussed in Module 10
Type II Error

🔥
Dead Battery by Viktor Ostrovsky from the Noun Project
Smoke Detector by Gregor Cresnar from the Noun Project
Hypothesis testing: general framework
1. Formulate the null and alternative hypotheses
2. Calculate a test statistic from data
3. Calculate a P-value from the test statistic
4. Calculate descriptive statistics
5. Make a conclusion
Hypothesis test vs confidence interval
Hypothesis test: There is strong evidence (P=0.002) that Pima women have a
higher diastolic blood pressure than the general US population

• Provides evidence against the null hypothesis


• Does not provide an estimate of the size of the difference
Mean and 95% con dence interval: Mean diastolic blood pressure: 72.4 mmHg
(95% CI: 71.5 to 73.3)

• Provides an estimate of the size of the difference


• Does not provide formal evidence against the null hypothesis
fi
Hypothesis test and confidence interval are complementary

Provide both to give a reader the level of evidence and the size of the effect (with
its con dence interval)

Should give similar conclusions:

If P<0.05, the 95% Con dence Interval will not cross the null value

• Pima example: μ = 71
• Hypothesis test: There is strong evidence (P=0.002) that Pima women have a
higher diastolic blood pressure than the general US population

• Mean diastolic blood pressure: 72.4 mmHg (95% CI: 71.5 to 73.3)
fi
fi
Null value depends on the test
Number of groups Type of data Measure of effect Null value

One Continuous Mean μ

One Binary Proportion p

Two Continuous Difference in means 0

Difference in proportions 0
Two Binary Odds ratio 1
Relative risk 1
Statistical significance vs practical importance
De nitely Possibly Not Inconclusive True
E ect size important important important result negative
(e.g. di erence
in means)

Practically
De ned by
researcher
important
di erence

Null value 0

Statistically signi cant Not statistically signi cant


Adapted from Armitage, Berry, Matthews;

Statistical Methods in Medical Research (2002).


ff
ff
fi
fi
ff
fi
fi
Two-tailed vs one-tailed test
Pima example - H0: μ = 71
Two-tailed test - HA: μ ≠ 71
• Null hypothesis can be false because Pima have higher or lower blood pressure
One-tailed test: HA: μ > 71
• Null hypothesis can be false only if Pima have higher blood pressure
Two-tailed vs one-tailed test
Two-tailed tests are standard in health and medical research
Allows for detection of improvements in health as well as adverse events
"In general: a one sided test is appropriate when a large difference in [the null]
direction would lead to the same action as no difference at all
Expectation of a difference in a particular direction is not adequate justi cation
Example, Galloe et al found that oral magnesium signi cantly increased the risk
of cardiac events, rather than decreasing it as they had hoped"

Ref: Bland JM, Altman DG. Statistics Notes: One and two sided tests of signi cance. BMJ 1994;309:248

fi
Putting it all together: Stata
Statistics >
Summaries, tables and tests >
Classical tests of hypotheses >
t test (mean-comparison test)
Putting it all together: Stata
One-sample t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
dbp | 733 72.40518 .4573454 12.38216 71.50732 73.30305
------------------------------------------------------------------------------
mean = mean(dbp) t = 3.0725
Ho: mean = 71 degrees of freedom = 732

Ha: mean < 71 Ha: mean != 71 Ha: mean > 71
Pr(T < t) = 0.9989 Pr(|T| > |t|) = 0.0022 Pr(T > t) = 0.0011
1-sided test 2-sided test 1-sided test
HA: μ < 71 HA: μ ≠ 71 HA: μ > 71
Putting it all together: R
> t.test(pima$dbp, mu=71)

One Sample t-test

data: pima$dbp
t = 3.0725, df = 732, p-value = 0.002202
alternative hypothesis: true mean is not equal to 71 2-sided test
95 percent confidence interval: HA: μ ≠ 71
71.50732 73.30305
sample estimates:
mean of x
72.40518
Hypothesis testing: general framework
1. Formulate the null and alternative hypotheses
2. Calculate a test statistic from data
3. Calculate a P-value from the test statistic
4. Calculate descriptive statistics
5. Make a conclusion
Putting it all together: Stata
One-sample t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
dbp | 733 72.40518 .4573454 12.38216 71.50732 73.30305
------------------------------------------------------------------------------
mean = mean(dbp) t = 3.0725
Ho: mean = 71 degrees of freedom = 732

Ha: mean < 71 Ha: mean != 71 Ha: mean > 71
Pr(T < t) = 0.9989 Pr(|T| > |t|) = 0.0022 Pr(T > t) = 0.0011

The average diastolic blood pressure of Pima women was estimated as 72.4 mmHg (95%
con dence interval: 71.5 to 73.3 mmHg).

There is strong evidence (t=3.07 with 732 df, P=0.002) that the average diastolic blood
pressure of Pima women is higher than the general US population (71 mmHg).
fi
Putting it all together: R
One Sample t-test

data: pima$dbp
t = 3.0725, df = 732, p-value = 0.002202
alternative hypothesis: true mean is not equal to 71
95 percent confidence interval:
71.50732 73.30305
sample estimates:
mean of x
72.40518

The average diastolic blood pressure of Pima women was estimated as 72.4 mmHg (95%
con dence interval: 71.5 to 73.3 mmHg).

There is strong evidence (t=3.07 with 732 df, P=0.002) that the average diastolic blood
pressure of Pima women is higher than the general US population (71 mmHg).
fi
Guidelines for presenting results from hypothesis tests

Do not accept the null hypothesis

H0 is rejected or not rejected

If P ≥0.05 we cannot say:


‘the drug was ineffective’
‘there was no difference between groups’
or ‘response rates were unaffected’

Instead:
‘we did not see evidence of a drug effect’,
‘we were unable to demonstrate a difference between groups’,
or simply ‘there was no statistically signi cant difference in response rates’

Guidelines for reporting of statistics for clinical research in urology.


BJU Int. 2019 Mar;123(3):401-410. doi: 10.1111/bju.14640

fi
Guidelines for presenting results from hypothesis tests

P-values just above 5% are not a trend, and they are not moving
Avoid: P=0.07 shows a ‘trend towards statistical signi cance’ or ‘approaches
statistical signi cance’
Alternative language: ‘although we saw some evidence of improved response
rates in patients receiving the novel procedure, differences between groups did
not meet conventional levels of statistical signi cance’

Guidelines for reporting of statistics for clinical research in urology.


BJU Int. 2019 Mar;123(3):401-410. doi: 10.1111/bju.14640
fi
Guidelines for presenting results from hypothesis tests

P-values and 95% CIs do not quantify the probability of a hypothesis


P=0.03 does not mean that there is 3% probability that the ndings are due to
chance
A 95% CI should not be interpreted as a 95% certainty the true parameter value is
in the range of the 95% CI

Guidelines for reporting of statistics for clinical research in urology.


BJU Int. 2019 Mar;123(3):401-410. doi: 10.1111/bju.14640
Reporting P-values
Signi cant gures: the
number of digits after the rst
Report P values to a single signi cant gure - non-zero digit in your P-value.

or two signi cant gures if P is close to 0.05 (say, 0.01 to 0.2)


Do not report ‘not signi cant’ (‘NS’) for P values of ≥0.05
Very low P values can be reported as P < 0.001 or similar
• Replace Stata's 0.0000 by <0.0001 (or 0.000 by <0.001)
• Replace R’s “p-value < 2.2e-16” by <0.0001
The following P values are reported appropriately:
<0.001, 0.004, 0.045, 0.13, 0.3, 1.

Guidelines for reporting of statistics for clinical research in urology.


BJU Int. 2019 Mar;123(3):401-410. doi: 10.1111/bju.14640
fi
fi
fi
fi
fi
fi
fi
fi
Finally
The P value is just one statistic that helps interpret a study; it does not determine
our interpretations.
Drawing conclusions for research or clinical practice from a clinical research
study requires evaluation of the strengths and weakness of study methodology,
the results of other pertinent data published in the literature, biological
plausibility, and effect size.
Sound and nuanced scienti c judgment cannot be replaced by just checking
whether one of the many statistics in a paper is or is not P < 0.05.

Guidelines for reporting of statistics for clinical research in urology.


BJU Int. 2019 Mar;123(3):401-410. doi: 10.1111/bju.14640

fi
Summary
Hypothesis tests and 95% con dence intervals are complementary
Hypothesis tests formally assess your research question
• do not provide estimates of the size of the effect
95% con dence intervals can be used informally to test a hypothesis
• comparing the interval to the null value
• do not quantify the level of evidence against the null hypothesis
fi
fi
Summary
One-sample t-test
• compares the mean from a single group to a hypothesised value
One- vs two-tailed statistical tests
• use two-tailed unless you have very strong reasons not to
Report the results from your hypothesis test in clear, easy to read language
Statistical significance is not the
same as practical importance
Learning activities
Activity 4.2 discusses hypothesis tests for (b) case-control and (c) cohort
studies
These parts can be omitted if you are not studying Foundations of Epidemiology
Finally
Quiz 2
• Covers Modules 3 and 4
• Opens: Friday, 23 June 2023, 9:00 AM
• Closes: Monday, 3 July 2023, 12:00 PM

Request for 5-minute feedback on Moodle - tell me what you think!


• What have you found most useful/interesting/surprising in this course so far?
• What have you found most challenging/frustrating/di cult in this course so far?
• What would you change if you were running this course?

You might also like