SAMPLE SIZE
DETERMINATION
Ephrem Mannekulih (BSc, MSc)
Biostatistics and Health Informatics
Sample size determination
Sample Size: The number of study subjects selected to
represent a given study population.
Sample size should be sufficient to represent the characteristics
of the a given study population.
Cont.…
How many people need to be studied in order to answer the
study objectives.
If the size is too small;
o We may fail to detect important effects
o May estimate effects too imprecisely
If the size is too large;
o It may be infeasible in terms of resources.
The eventual sample size is usually a compromise between
what is desirable and what is feasible.
Factors Affecting Sample size determination
Determining the right number of subjects to be studied depends
on the following factors:
1. Objective of the study
o Estimating single population mean or proportion
o Estimating difference between two population mean or
proportion
o Testing hypothesis about single population mean or
proportion
o Testing hypothesis about difference between two population
mean or proportion
o Estimating the effect size of certain variable on outcome of
interest
Cont.…
2. Accuracy of the measurements to be made
o The allowed deviation from the true population parameter
o It can be within 1% or 5%, etc.
3. Degree of confidence within which the results to be conclude
o Commonly specified as 95%.
4. Degree of precision required for generalization
o Commonly specified as power of 80 and 90.
Cont.…
5. Design of the study
o Cross sectional, case control, cohort etc.
o Sample size calculation depends on the type of the
epidemiological study designs.
o Descriptive, observational and randomized controlled studies
have different formulas to calculate sample size.
6. The size of the population that the sample is to represent.
o When population of size less than 10,000 or
o When n/N < 0.05
Sample size based on objectives
The objective of the study could be;
I. Estimating single population mean or proportion
II. Estimating difference between two population means or
proportion
III. Testing hypothesis about single population mean or
proportion
IV. Testing hypothesis about difference between two
population means or proportion
V. Estimating the effect size of certain variable on outcome
of interest
I. Sample size for estimating a single
population mean
Objective: To estimate population mean (µ) with narrow
confidence interval and high precision
How : Estimate (𝑋) ± d units
where d = Margin of error =
= Measure of precision
= Half of the width (w) of CI
Steps:
1. Specify d (or w = 2d)
2. Use known population σ2 or sample s2
Cont.…
Example.
o To estimate the mean survival time of HIV infected patients
taking ART
Cont…
Where;
n = sample size.
zα/2 = Level of confidence.
σ2 = variability of the variable of interest.
d = desired precision
Example:
Find the minimum sample size needed to estimate the drop
in mean heart rate (µ) for a new study using a higher dose of
propranolol than the standard one. We require that the two-
sided 95% CI for µ be no wider than 5 beats per minute and
the sample sd for change in heart rate equals 10 beats per
minute.
o n = (1.96)2102/(2.5)2 = 62 patients
Cont.…
What if the population 2 is unknown?
o Conducting pilot study
o Use previous or similar studies finding
II. Sample size for estimating a single
population proportion
Objective: To estimate population proportion (P) with
narrow confidence interval and high precision
How : Estimate (𝑃) ± d units
𝑃(1−𝑃)
where d = Margin of error = 𝑍𝛼/2
𝑛
= Measure of precision
= Half of the width (w) of CI
Steps:
1. Specify d (or w = 2d)
2. Use estimated population p or (use p=0.5 if no
information)
Cont.…
Example;
o To determine the proportion immunological treatment failure
among HIV infected patients
Cont…
Where;
n = sample size.
zα/2 = Level of confidence.
p = Proportion of the variable of interest.
d = desired precision
Example
Suppose that you are interested to know the proportion of
infants who breastfed >18 months of age in a rural area.
Suppose that in a similar area, the proportion (p) of breastfed
infants was found to be 0.20. What sample size is required to
estimate the true proportion within ±3% points with 95%
confidence. Let p=0.20, d=0.03, α=5%
Example
Suppose there is no prior information about the proportion (p)
who breastfeed
Assume p=q=0.5 (most conservative)
Then the required sample size increases
What if p is unknown or not available?
Sample size should be calculated based on various assumptions
for approximate values of p.
1 2 3 4 5 6 7 8 9
p 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
n 138 246 323 369 384 369 323 246 138
For a fixed level of precision (d), the required sample size
increases as P increases form 0 to 0.5, and then decreases in the
same way as the P approaches 1
Example
A survey is planned to determine what proportion of the medical
students have regularly chewed khat. If no estimate of p is
available and a pilot sample cannot be drawn, what sample size
would be required if a 95% confidence is desired, and d=0.04 is
to be used.
Ans: 600 students
III. Sample size for Estimating the difference
between two population means
Objective: To estimate the difference between two
population means (𝑋1 - 𝑋2 ) with narrow confidence interval
and high precision
How : Estimate (𝑋1 - 𝑋2 ) ± d units
𝜎1 2 𝜎2 2
Where d = 𝑍𝛼/2 ∗ +
𝑛1 𝑛2
Use σ12, σ22 or estimate using s12and s22
Cont.…
Example
o To estimate a difference of the mean survival time between
HIV infected patients taking treatment X versus Y”
o If equal sample size in both groups is required, then:
IV. Sample size for Estimating the difference
between two population proportion
Objective: To estimate the difference between two
population proportion (𝑃1 - 𝑃2 ) with narrow confidence
interval and high precision
How : Estimate (𝑃1 - 𝑃2 ) ± d units
𝑃1 (1−𝑃1 ) 𝑃2 (1−𝑃2 )
Where d = 𝑍𝛼/2 ∗ +
𝑛1 𝑛2
Use estimates of p1, p2 or (or p1=p2 =0.5 if unknown
Cont.…
Example
o To determine a difference in the proportion of
immunological treatment failure among HIV infected
patients taking treatment X versus Y
o If equal sample sizes in both groups, then
Sample Size for
Testing Hypothesis
Sample Size for Testing Hypothesis
The method of determining sample size based on hypothesis
testing considers the probability of both type I and type II
errors
The aim is to maintain low probability of a Type I error (α)
and low probability of a Type II error (β) and
To have enough samples to detect a difference in population
means or proportions
Cont.…
Type I error (α) = The probability of rejecting Ho when it is true
o 𝛼 = 𝑃(𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 /𝐻0 true)
o Significance level of a test = α = Type I error
Type II error () = The probability of fail to reject Ho when it is
false
o 𝛽 = 𝑃(𝑑𝑜 𝑛𝑜𝑡 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 /𝐻0 false)
o 1 – β = Power
Cont.…
Power (1-) = the probability 𝐻0 is rejected given that it is
false
o Power = 𝑃(𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 /𝐻0 false)
If the power of a test is low, then there is little chance of
detecting a difference even if one really exists
Most of the studies recommend power of 80%
(Power (1 - β) = 80%, Zβ = 0.84)
Factors affecting the power
If α decreases, the power decreases
When the difference between 𝐻0 and 𝐻𝐴 increases, then the
power increases
When increases, then the power decreases
If the sample size (n) increases, the power increases
Factors affecting the sample size
The sample size increases as 𝜎 2 increases
The sample size increases as the significance level (α) is
made smaller (α decreases)
The sample size increases as the required power increases
The sample size decreases as the absolute value of the
difference between the 𝐻0 and 𝐻𝐴 increases
Sample Size for Testing Hypothesis
Sample size to test hypothesis about single population mean
Sample size to test hypothesis about single population
proportion
Sample size to test hypothesis about the difference between
two population mean(paired or independent)
Sample size to test hypothesis about the difference between
two population proportion(paired or independent)
……Sample Size for Testing Hypothesis
I. Sample size for testing hypothesis about single population
mean
Notation used
o 𝜇0 = test value of the population mean under 𝐻0
o 𝜇𝑎 = hypothesized value of the population mean
o 𝜎 2 = variability in the variable of interest in the population
o 100(𝛼)% = level of significance
o 100(1 − β)% = power of test
Cont.…
𝐻0 = there is no difference between the two mean
o 𝐻0 : 𝜇0 = 𝜇𝑎
𝐻𝐴 = there is no difference between the two mean
o 𝐻𝐴 : 𝜇0 ≠ 𝜇𝑎 for two sided test
o 𝐻𝐴 : 𝜇0 > 𝜇𝑎 or 𝜇0 < 𝜇𝑎 for one sided test
Cont.…
Testing hypothesis about single population mean
For one sided test
(𝑍1−𝛼 + 𝑍1−𝛽 )2 (𝜎 2 )
on =
(𝜇0 − 𝜇1 )2
For two sided test
(𝑍1−𝛼/2 + 𝑍1−𝛽 )2 (𝜎 2 )
on =
(𝜇0 − 𝜇1 )2
…….Sample Size for Testing Hypothesis
II. Sample size for testing hypothesis about two population mean
Notation used
o μ1 - 𝜇2 = 0 test value of the difference between two population
means under 𝐻0
o μ1 and 𝜇2 = hypothesized value of the two population means
o 𝜎1 2 and 𝜎2 2 = variability in the variable of interest in the two
populations
o 100(𝛼)% = level of significance
o 100(1 − β)% = power of test
Cont.…
𝐻0 = there is no difference between the two population means
o 𝐻0 : μ1 - 𝜇2 = 0
𝐻𝐴 = there is difference between the two population means
o 𝐻𝐴 : μ1 ≠ 𝜇2 for two sided test
o 𝐻𝐴 : μ1 - 𝜇2 > 0 or μ1 - 𝜇2 < 0 for one sided test
Cont.…
Comparison between two means (Equal sample sizes)
For one sided test
(𝑍1−𝛼 + 𝑍1−𝛽 )2 (𝜎1 2 + 𝜎2 2 )
o 𝑛1 = 𝑛2 =
(𝜇1 − 𝜇2 )2
For two sided test
(𝑍1−𝛼/2 + 𝑍1−𝛽 )2 (𝜎1 2 + 𝜎2 2 )
o 𝑛1 = 𝑛2 =
(𝜇1 − 𝜇2 )2
Cont.…
Comparison between two means (Unequal sample sizes)
For one sided test
(𝑍1−𝛼 + 𝑍1−𝛽 )2 (𝜎1 2 + 𝜎2 2 /λ)
o 𝑛1 =
(𝜇1 − 𝜇2 )2
For two sided test
(𝑍 𝛼 + 𝑍1−𝛽 )2 (𝜎1 2 + 𝜎2 2 /λ)
1− 2
o 𝑛1 =
(𝜇1 − 𝜇2 )2
Where,
o 𝑛2 = λ𝑛1 ,
……Sample Size for Testing Hypothesis
III. Sample size for testing hypothesis about single
population proportion
Notation used
o 𝑃0 = test value of the population proportion under 𝐻0
o 𝑃𝑎 = hypothesized value of the population proportion
o 100(𝛼)% = level of significance
o 100(1 − β)% = power of test
Cont.…
𝐻0 = there is no difference between the two proportion
o 𝐻0 : 𝑃0 = 𝑃𝑎
𝐻𝐴 = there is no difference between the two proportion
o 𝐻𝐴 : 𝑃0 ≠ 𝑃𝑎 for two sided test
o 𝐻𝐴 : 𝑃0 > 𝑃𝑎 or 𝑃0 < 𝑃𝑎 for one sided test
Cont.…
For one sided test
*𝑍1−𝛼 ,𝑃0 1−𝑃0 +𝑍1−𝛽 𝑃𝑎 1−𝑃𝑎 +2
o n=
(𝑃0 −𝑃1 )2
For two sided test
*𝑍1−𝛼/2 ,𝑃0 1−𝑃0 +𝑍1−𝛽 𝑃𝑎 1−𝑃𝑎 +2
n=
(𝑃0 −𝑃1 )2
…….Sample Size for Testing Hypothesis
IV. Sample size for testing hypothesis about two population
proportion
Notation used
o 𝑃1 - 𝑃2 = 0 test value of the difference between two
population proportion under 𝐻0
o 𝑃1 and 𝑃2 = hypothesized value of the population proportions
o 100(𝛼)% = level of significance
o 100(1 − β)% = power of test
Cont.…
𝐻0 = there is no difference between the two proportion
o 𝐻0 : 𝑃1 - 𝑃2 = 0
𝐻𝐴 = there is no difference between the two proportion
o 𝐻𝐴 : 𝑃1 ≠ 𝑃2 for two sided test
o 𝐻𝐴 : 𝑃1 - 𝑃2 > 0 or 𝑃1 - 𝑃2 < 0 for one sided test
Cont.…
Comparison between two proportions (Equal
sample sizes)
For one sided test
*𝑍1−𝛼 2𝑃 1−𝑃 +𝑍1−𝛽 𝑃1 1−𝑃1 + 𝑃2 1−𝑃2 +2
on = (𝑃1 −𝑃2 )2
For two sided test
*𝑍1−𝛼/2 2𝑃 1−𝑃 +𝑍1−𝛽 𝑃1 1−𝑃1 + 𝑃2 1−𝑃2 +2
o n=
(𝑃1 −𝑃2 )2
𝑃1 + 𝑃2
Where 𝑃 =
2
Cont.…
Comparison between two proportions (Unequal
sample sizes)
For one sided test
1
*𝑍1−𝛼 𝑃 1−𝑃 (1+λ)+𝑍1−𝛽 𝑃1 1−𝑃1 + 𝑃2 1−𝑃2 /λ +2
o 𝑛1 = (𝑃1 −𝑃2 )2
For two sided test
1
*𝑍1−𝛼/2 𝑃 1−𝑃 (1+ )+𝑍1−𝛽 𝑃1 1−𝑃1 + 𝑃2 1−𝑃2 /λ +2
λ
o 𝑛1 = (𝑃1 −𝑃2 )2
𝑃1 + λ𝑃2
Where 𝑃 =
(1+λ)
𝑛2 = λ𝑛1
…….Sample Size for Testing Hypothesis
V. Sample size for paired data difference in mean
Notations used
o n = sample size
o 𝜎𝑑 = standard deviation of the within pair difference
o 𝜇1 and 𝜇2 = hypothesized value of the two population
means
o 100(𝛼)% = level of significance
o 100(1 − β)% = power of test
Cont.…
𝐻0 = there is no difference between the two proportion
o 𝐻0 : 𝜇𝑑 = 𝜇1 - 𝜇2 = 0
𝐻𝐴 = there is no difference between the two proportion
o 𝐻𝐴 : 𝜇𝑑 ≠ 0 for two sided test
o 𝐻𝐴 : 𝜇𝑑 > 0 or 𝜇𝑑 < 0 for one sided test
Cont.…
For one sided test
𝜎𝑑 2 (𝑍1−𝛼 + 𝑍1−𝛽 )2
o n=
(𝜇1 −𝜇2 )2
For two sided test
𝜎𝑑 2 (𝑍1−𝛼/2 + 𝑍1−𝛽 )2
o n=
(𝜇1 −𝜇2 )2
…….Sample Size for Testing Hypothesis
VI. Sample size for paired data difference in proportion
Notations used
o n = sample size
o 𝑃1 and 𝑃2 = hypothesized value of the two population
proportion
o 100(𝛼)% = level of significance
o 100(1 − β)% = power of test
Cont.…
𝐻0 = there is no difference between the two proportion
o 𝐻0 : 𝑃1 - 𝑃2 = 0
𝐻𝐴 = there is no difference between the two proportion
o 𝐻𝐴 : 𝑃1 ≠ 𝑃2 for two sided test
o 𝐻𝐴 : 𝑃1 - 𝑃2 > 0 or 𝑃1 - 𝑃2 < 0 for one sided test
Cont.…
For one sided test
𝑃 (1−𝑃 ) (𝑍1−𝛼 + 𝑍1−𝛽 )2
o n=
(𝑃1 −𝑃2 )2
For two sided test
𝑃(1−𝑃) (𝑍1−𝛼/2 + 𝑍1−𝛽 )2
on =
(𝑃1 −𝑃2 )2
Cont.…
If the OR or RR and one of the proportions are known, we can
compute the unknown proportion by:
P2
P1
1 P2
P2 P1 = P2 * RR
OR
Sample size for
Different study Designs
Sample size calculation for case control
study
When the exposure variable is qualitative/categorical
Comparing odds of exposure between case and control
o Eg. To see the link between childhood abuse with
psychiatric disorder in adulthood
o NB. This formula is for independent case control study
Cont.…
Where;
r = ratio of control to cases
P1 = proportion of exposure in cases
P2 = proportion of exposure in controls
p* = Average proportion of exposure
P1 – P2 = Expected difference in proportion between cases
and controls
𝑍𝛼/2 = Level of significance
𝑍𝛽 = Power
Cont…
When the exposure variable is quantitative
Comparing odds of exposure between case and control
o Eg. To see the link between birth weight with diabetes in
adulthood
o NB. This formula is for independent case control study
Cont…
Where; r = ratio of control to cases
SD = Standard deviation
d = Expected mean difference between cases and
controls
= Level of significance
= Power
Sample size calculation for cohort study
Comparing the rate of events between people with and without
exposure
o Eg. To see the impact of physical exercise on
cardiovascular mortality
o Where
o NB. This formula is for independent cohort study
Cont…
Where r = ratio of control to cases
P0 = proportion of event in unexposed group
P1 = proportion of event in exposed group
m = the number of unexposed per exposed group
P1 – P2 = Expected difference in proportion between
cases and controls
= Level of significance
= Power
Sample size for interventional studies
When the event of interest is quantitative
To see the effect of intervention on particular outcome of
interest
o Eg. To see the effect of antihypertensive drug on the mean
blood pressure of an individual
Cont…
Where
SD = Standard deviation
d = Expected mean difference between interventional
and control group
= Level of significance
= Power
Sample size calculation interventional
study
When Event of interest is qualitative
To see the effect of intervention on particular outcome of
interest
o Eg. To see the protective effect of drug on mortality in
patients with myocardial infarction
Cont…
Where
SD = Standard deviation
P1 – P2 = Expected difference in the proportion of events
between interventional and control group
P = average prevalence of events in two groups
= Level of significance
= Power
Sample size for qualitative studies
There are no fixed rules for sample size in qualitative research.
The size of the sample depends on what you try to find out,
and from what different informants or perspectives you try to
find that out
Saturation of idea – ending
Points for Consideration
Sample size estimates might need to be adjusted to
compensate for non-response rate, patient dropout or loss
to follow-up, lack of compliance, etc.
If sampling is from a finite population of size N, then
n0
n=
n0
1 +
N
where n0 is the sample from an infinite population. When N
is large in comparison to n, (i.e., n/N ≤ 0.05), the finite
population correction may be ignored.
Design effect for complex cluster sampling. Common
values: multiply n by 2, 3, …5.
Reading Assignment
Design Effect
Google for more!!!