0% found this document useful (0 votes)
8 views32 pages

Module 3b - Random Sampling and Sampling Error

Public Health module, UNSW

Uploaded by

dewinrswr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views32 pages

Module 3b - Random Sampling and Sampling Error

Public Health module, UNSW

Uploaded by

dewinrswr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Module 3b – Random sampling

and sampling error

Katrina Blazek
PHCM9794: Foundations of Epidemiology
Learning outcomes
• State and distinguish between the main sources of bias in
epidemiological studies and distinguish between random and
systematic error
• Understand how to interpret statistical significance and confidence
intervals
Overview
Random sampling
Random sampling error
Confidence intervals and P values
Precision
Type I and Type II errors
A population of people
Let’s select 20 people randomly
Let’s select 20 people randomly
This is our sample
We can describe the sample

Mean SBP
= 126 mmHg
Random selection
Selection of participants for a study on the basis of chance
• Each participant in source population has same chance
(probability) of being included
• E.g. using a random number generator

Requires a sampling frame – a list of people in the source


population, to which the random selection process is applied

Random selection produces a representative sample


What if we select a different sample?
What if we select a different sample?
Here’s our second sample
The two samples are different
Sample 1 Sample 2 Random sampling error
• Sample means differ
from each other, just by
chance
• May also differ from
(unobserved)
population mean
Mean SBP Mean SBP
= 126 mmHg = 121 mmHg • This is OK. Handled
with confidence
intervals and p values
Confidence intervals
Range of values within which we are reasonably confident the
true (unobserved) mean lies

Confidence intervals can be calculated for many types of


statistics, e.g. proportion, risk ratio, odds ratio etc.

Most common are 95% confidence intervals


• 99% (wider) and 90% (narrower) sometimes used
95% Confidence Intervals (CI)
Sample 1 Sample 2

Mean SBP Mean SBP


= 126 mmHg = 121 mmHg
(95% CI 117 to 135 (95% CI 112 to 130
mmHg) mmHg)
95% Confidence Interval
If we repeat the study a number of times, the confidence intervals
would contain the true (unobserved) population mean 95% of
those times

Can be interpreted as level of confidence (95%) we have that the


true value lies within the given range

NOT: 95% probability that the true mean lies in this interval
Objective: To evaluate longer term symptoms and health outcomes
associated with post-covid-19 condition within a cohort of individuals
with a SARS-CoV-2 infection.

Results: 22.9% (95% confidence interval 20.4% to 25.6%) of individuals


infected with SARS-CoV-2 did not fully recover by six months.

We are 95% confident that in the population the true proportion of those who
don’t recover from SARS-CoV-2 within 6 months is between 20.4% and 25.6%

BMJ 2023;381:e074425 | doi: 10.1136/bmj-2022-074425


What if we collected a rd
3 , smaller sample?
Sample 1 Sample 2 Sample 3

The
interval
is wider
Mean SBP Mean SBP Mean SBP
= 126 mmHg = 121 mmHg = 115 mmHg
(95% CI 117 to 135 (95% CI 112 to 130 (95% CI 98 to 132
mmHg) mmHg) mmHg)
Precision
The larger the sample, the more precise the estimate
• Increasing sample size decreases confidence interval width
• Decreasing sample size increases confidence interval width
Hypothesis testing
Confidence interval gives a likely range within which we are
reasonably confident that the true value lies

Does not give a quantitative assessment of evidence against a


null-hypothesis
Hypothesis testing in a nutshell
1. Formulate a null hypothesis – no difference (no effect)
2. Calculate a ‘test statistic’ from your data
3. Obtain a P value

(Covered in more detail in Foundations of Biostatistics)


What is P?
The probability of obtaining data like yours, or more extreme, if
the null hypothesis is true

NOT: The probability the null hypothesis is true

Reminder: Probabilities range from 0 (the event will never occur)


to 1 (the event will always occur)
Statistical significance
Black and white? Shades of grey
1.0 1.0

Little or no evidence

0.10
Weak evidence
0.05 0.05
Evidence
0.01
Significant Strong evidence
0.001
0 Very strong evidence 0
Probability of obtaining a RR
of 1.011 (or larger), just by
chance, given these data
Objective: To assess whether an easy-to-use multifaceted
intervention for children presenting to primary care with respiratory Not the probability that
tract infections would reduce antibiotic dispensing, without there is no difference
increasing hospital admissions for respiratory tract infection.
between groups
Result: No evidence was found that antibiotic dispensing differed
between intervention practices … and control practices … (rate ratio
1.011, 95% confidence interval 0.992 to 1.029; P=0.25).

BMJ 2023;381:e072488 | doi: 10.1136/bmj-2022-072488


Our conclusion might be wrong
Reality vs our study

There’s no effect
(Null hypothesis is TRUE)

There is an effect
(Null hypothesis is FALSE)

Image by Monika Grafik from Pixabay


Our conclusion might be wrong
Reality vs our study

Reject Do not reject


null hypothesis null hypothesis

Probability = 𝜶
There’s no effect Type I error TRUE negative
(Null hypothesis is TRUE)
 ✓
There is an effect TRUE positive Type II error
(Null hypothesis is FALSE)
✓ 
Power (𝟏 − 𝜷) Probability = 𝜷

Image by Monika Grafik from Pixabay


First, the citizens commit a type I error
by believing there is a wolf when there
is not.

Second, the citizens commit a type II


error by believing there is no wolf when
there is one.

https://www.microsoft.com/en-au/p/the-boy-who-cried-
wolf/8d6kgwzxmmst?activetab=pivot%3aoverviewtab

https://www.ncbi.nlm.nih.gov/books/NBK557530/
Clinical vs statistical significance

Webb, Bain & Page, 2020 (Chapter 6)


Other pitfalls
Multiple testing
• If 𝛼 = 0.05 then 1 in 20 chance of false positive
• Can make alpha smaller (e.g. 0.01) or adjust for multiple
testing

Complex sample designs


• Stratified, clustered and multi-stage sampling
• Analysis is complex (use advanced techniques)
“Pet owners had significantly lower systolic blood pressure
and plasma triglycerides than non-owners. “

https://onlinelibrary.wiley.com/doi/epdf/10.5694/j.1326-5377.1992.tb137178.x
“While pet owners and non-pet owners had similar levels of systolic blood
pressure, those with pets had significantly higher diastolic blood pressure. “

https://onlinelibrary.wiley.com/doi/epdf/10.5694/j.1326-5377.2003.tb05649.x

You might also like