0% found this document useful (0 votes)
19 views27 pages

Introduction To Key Statistical Concepts - 2024

The document discusses key statistical concepts in population science including observed versus true epidemiological quantities, random variation, hypothesis testing, and estimation. It explains how statistical analysis allows inference about unknown population parameters and quantification of uncertainty in estimates.

Uploaded by

shohaibh68
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views27 pages

Introduction To Key Statistical Concepts - 2024

The document discusses key statistical concepts in population science including observed versus true epidemiological quantities, random variation, hypothesis testing, and estimation. It explains how statistical analysis allows inference about unknown population parameters and quantification of uncertainty in estimates.

Uploaded by

shohaibh68
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Population Science: Introduction to

Key Statistical Concepts

Dr Clare Gillies
Associate Professor in Medical Statistics
University of Leicester

https://le.ac.uk/medicine
Learning Objectives
• Distinguish between observed epidemiological
quantities (incidence, prevalence, incidence rate
ratio, etc.) and their true or underlying values
• Discuss how observed epidemiological quantities
depart from their true values because of random
variation
• Describe how observed values help us towards a
knowledge of the true values by:
– allowing us to test hypotheses about the true values
– allowing us to calculate a confidence interval that
describes the uncertainty surrounding an estimate
The three domains of public health

Health
improvement

Health Healthcare
protection public
health

Epidemiology, health economics, sociology,


psychology
Role of statistics

•Population - all possible observations


of an experimental/study variable. This
is the population we are primarily
interested in.

•Sample - a selection of observations


taken from the population. Sample not
really of interest – we want to
generalise to the population.

There will always be uncertainty when taking a sample. We need to


quantify the uncertainty.
Purpose of statistics is to generalise to,
i.e. infer about, the population
Two types of error that can occur in a study
where you take a sample, that may influence
the results:
Chance (Random error)
• Due to sampling variation
• This will reduce as sample size increases
Bias (Systematic error)
• Bias is quantified by the difference between the
true value and the expected value
• Does not reduce as sample size increases
Random variation: Truth vs Observed
• The true probability of getting a “tail” on a fair
coin is a half, but what we observe may depart
from this by random variation
• Eg: Everyone in this lecture theatre flips a fair
coin 10 times, the results we observe will vary.
• Some people will get 5 tails, but also some will get
4, 6, 8 or even 10 or 0 tails
Truth  Observation

Truth Observation
• The true or underlying • In January, February and
tendency is for 4 cases March this year, we
per month of meningitis in observed 2, 5 and 4 cases
Leicestershire respectively
• The true or underlying • In a random sample of
proportion of diabetic 1,000 diabetics, the
patients with foot number with foot
problems is 15% problems was 123, i.e.
12.3%

Our observed value is our best estimate of the


true or underlying tendency
Testing hypotheses about the true values
• A hypothesis is a statement that an underlying
truth of scientific interest takes a particular
quantitative value, e.g.
– the prevalence of tuberculosis in a given population is
2 per 10,000 people
– the coin is fair (i.e. probability of heads is 0.5)
– the new drug is neither better nor worse than the
standard treatment
– the new operation leads to neither more nor less post-
operative pain
Hypothesis Testing
• Calculate the probability of getting an observation as
extreme as, or more extreme than, the one observed
assuming that the hypothesis is true
• If the probability is very small, it is reasonable to conclude
that the observation and the stated hypothesis are
incompatible
• Therefore, with a small probability it is very unlikely that the
hypothesis is true
• This calculated probability is called the p-value
An Arbitrary Convention: p-value < 0.05
• Interpretation of p-value < 0.05:
– “Data inconsistent with the stated hypothesis”
– “Substantive evidence against the stated hypothesis”
– “Reasonable to reject the stated hypothesis”
– “There is a statistically significantly
difference/association”
• Interpretation of p-value ≥ 0.05:
– There is not enough evidence to reject the null
hypothesis
“p-value ≥ 0.05” does NOT mean that the
hypothesis has been proven
Hypothesis Testing: This Coin is Fair
• Hypothesis: “the coin is fair”
• Observe data: 10 heads,0 tails
• What observations would have been as extreme or more extreme
than this?: 10 heads,0 tails ; 0 heads,10 tails
• Calculate p-value for observed data: p = 0.002 (1 in 500)

• Since p-value is < 0.05: reject the hypothesis that “the


coin is fair” at the 5% significance level
• Conclusion: data inconsistent with the hypothesis that the coin
is fair therefore, there is strong evidence to reject it
Hypothesis Testing: Limitations
• Rejecting a hypothesis is not always useful:
– ‘p-value < 0.05’ is arbitrary: nothing special happens
between ‘p = 0.049’ and ‘p = 0.051’
– statistical significance depends on sample size, a
larger sample size gives you more power to find a
significant result
– statistically significant ≠ clinically important
• However ‘p = 0.000001’ and ‘p = 0.6’ are often
easy to interpret and p-values are used a lot
Hypothesis Testing: Summary
• It is usual practice to test against a null hypothesis,
i.e. a hypothesis assuming that two things are
equal or that there is no effect or difference
• p < 0.05 strongly suggests that the null hypothesis
is false / shows a statistically significant difference
or association
• p ≥ 0.05 (even p = 0.9999) does not in any sense
prove that the null hypothesis is true, it merely
demonstrates that the observed findings are
consistent with the null hypothesis being true
Estimation: Statistical Variation
• Almost all observed quantities in medical science
are subject to variation by chance, e.g.
– number of NICU cots required on any given day
– mortality rates and incidence rates
– rate ratios
• As well as estimating a statistic (eg
prevalence/rate ratio) we need to understand the
amount of uncertainty around it
Estimation: 95% Confidence Interval
• Observed rate ratio. Risk of developing cardiovascular
disease in people with and without diabetes

1.29 1.89 2.49

Point estimate (our ‘best guess’) is 1.89; i.e. risk of CVD is


almost twice as likely in individuals with T2DM compared to
those without T2DM
• 95% Confidence Limits are 1.29 and 2.49
• RR 1.89 (95% CI: 1.29, 2.49)
What is a 95% confidence interval
• Strictly speaking: a 95% confidence interval means that if
we were to take 100 different samples and compute a
95% confidence interval for each sample, then
approximately 95 of the 100 confidence intervals will
contain the true underlying population value
• Informally this is interpreted as: the 95% Confidence
Interval is the range within which we can be 95% certain
that the true value of the underlying truth really lies
• The range is centred on the observed value because it is
always our ‘best guess’ at the true underlying value.
What happens to a confidence interval as
sample size increases?
• Risk of developing cardiovascular disease in
people with and without diabetes
N=100 RR 1.89 (95% CI:1.29, 2.49)
N=1000 RR 1.89 (95% CI:1.49, 2.19)
N=10,000 RR 1.89 (95% CI:1.79, 1.99)

As sample size increases, random variation and


uncertainty decreases
Null values
• A null value is an estimate that shows
equivalence between groups
• For a difference the null values is 0
– E.g. Average SBP in men over 70 = 120 mm/HG
– Average SBP in women over 70 = 120mmHg
– Mean difference =120 -120 =0

• For a ratio the null value is 1


– Eg Risk of gestational diabetes in white pregnant women =
16/100=0.16
– Risk of gestational diabetes in black pregnant women =
16/100=0.16
– Risk ratio = 0.16/0.16 = 1
95% Confidence Interval and the p-value
• Observed rate ratio: Risk of developing cardiovascular
disease in people with and without diabetes

1.29 1.89 2.49


Given the confidence indicates the true underlying value
lies somewhere between 1.29 and 2.49, do you think this is
consistent with showing no difference in risk between the
two groups?
Remember the null value for a ratio is 1.00
95% Confidence Interval and the p-value
The confidence interval allows you to assess the
statistical significance of a result
The confidence interval and the p-value will always
agree on statistical significance

– null hypothesis value inside 95% CI  p ≥ 0.05


– null hypothesis value outside 95% CI  p < 0.05
95% Confidence Interval and the p-value
• Does this result show a statistically significant
association?
– The odds of developing lung cancer in smokers
compared to non-smokers:
– OR 2.59 (95% CI: 2.10, 3.08)
– P=0.01
– The confidence interval shows a statistically
significant association between smoking and lung
cancer as it does not contain the null value of 1.
– The p-value shows a statistically significant
association between smoking and lung cancer as it is
less than 0.05
95% Confidence Interval and the p-value
• Does this result show a statistically significant
association?
– HbA1c levels were 0.24%; 95% CI 0.16 to 0.33% higher in
South Asians compared to whites.
– P<0.001
– The confidence interval indicates a statistically
significant difference in HbA1c between the two
ethnic groups as it does not contain the null value of
0.
– The p-value indicates a statistically significant
difference in HbA1c between the two ethnic groups as
it is less than 0.05
95% Confidence Interval and the p-value
• Does this result show a statistically significant
association between risk of stroke and oral
contraceptives, explain your answer?
• Relative risk of stroke in women using oral
contraceptives compared to those who were was
estimated as 1.23 (95% CI: 0.95 to 1.51),
p=0.065
95% Confidence Interval and the p-value
• Does this result show a statistically significant
association between systolic blood pressure and
sex, explain your answer?
• The mean difference in systolic blood pressure
between men and women was estimated as 4.56
mmHg (95% CI: -2.16 to 6.72), p=0.073.
Summary
• The best estimate of an underlying truth from a study is
the observed value
• A hypothesis test allows us to make a statement about
the likelihood of observing data at least as extreme as
that observed if the stated hypothesis is true
• A p-value < 0.05 indicates a statistically significance
difference/association between two groups, and the null
hypothesis is rejected
Summary
• An informal way of thinking about the 95% confidence
interval is to consider it to be the range in which you are
95% sure that the true underlying population value lies
• The 95% confidence interval is wider:
– The greater the variation in the population values
– The smaller the size of the sample used to calculate it
• A confidence interval that does not span the null value
shows a statistically significant result

Resource: www.youtube.com/watch?v=tFWsuO9f74o

You might also like