0% found this document useful (0 votes)
55 views69 pages

2sample Size Determination Jan 2023

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 69

Sample Size Determination

#02
Wakgari Deressa, PhD.
Professor of Epidemiology & Public Health
Department of Preventive Medicine
School of Public Health, AAU

Jan 02-06, 2023


Learning Objectives
♣ Describe the importance of sample size
♣ Identify factors to be considered in sample
size calculation
♣ Describe the different sample size formulas
used for estimating the population mean
and proportion
♣ Calculate the sample size needed for the
population mean and proportion.
• An essential part of planning any study
is to decide how many people need to be
studied and how to choose them.
Sample Size (=n)
• Sample Size: The number of study
subjects selected to represent a given
study population.
• Important to make inferences based on
the findings from the sample.
• Should be sufficient to represent the
characteristics of interest of the study
population.
• Common questions:
– “How many subjects should I study?”
– Too small sample = Waste of time and resources
= Results have no practical use
– Too large sample = Waste of resources
= Data quality compromised
= Any small difference can be
statistically significant
When deciding on sample size:

PRECISION COST

Sample size = Precision = Cost

• Precision is related to confidence level & CI


Example
• A prevalence (=p) of 10% from a sample size
of n=20:
– would have a 95% CI of 1% to 31%,
– which is not very precise or informative.
• But, a prevalence of 10% from a sample of
size n=400:
– would have a 95% CI of 7% to 13%,
– which may be considered sufficiently precise
and accurate.
Margin of Error
(Precision of the estimate)
Sample size determination depends on the:
– Objective of the study
– Design of the study
• Descriptive/Comparative
– Degree of precision or accuracy – the
allowed deviation from the true population
parameter (can be within 1% or 5%, etc)
– Plan for statistical analysis
– Degree of confidence level required, usually
specified as 95%:
• For example: the confidence that the proportion in
the whole population is between (p-d) and (p+d)).
• The feasible sample size is also
determined by the availability of
resources:
– time
– manpower
– transport
– available facility, and
– money
Sample Size: Components
n = sample size
s or σ = standard deviation
d = desired precision = half of the CI (width=2d)
Z/2 =1.96 at 95% confidence level
β = Power (for comparative study)
p = Proportion or prevalence
r = Rate
1. Sample Size: Single Sample
• The aim is to have a large enough sample
with which to estimate a population mean
or proportion within a narrow interval with
high reliability.
• Concerned with the precision of the
estimate (“narrowness of the CI”).
estimate ± d units
Sample size for single sample
includes:
A. Sample size for estimating a single
population mean
B. Sample size to estimate a single
population proportion
A. Sample size for estimating
a single population mean
A 100(1-)% C.I. for  is:

We use a sample
mean as a point
estimate of 

 is to be chosen by the researcher, the most common


values of  are 0.05, 0.01 and 0.1.
A. Sample size for estimating
a single population mean
• AIM: Estimate µ
• WANT: Estimate ( ) ± d units
where d = Margin of error =
= Absolute precision
= Half of the width (w) of CI
Steps:
1. Specify d (or w = 2d)
2. Use known σ2 or estimate using s2
Standard error of the
estimator of the parameter
3. of interest

Where d = e in some text books


Example:
1. Find the minimum sample size (n) needed to
estimate the drop in heart rate (µ) for a new study
using a higher dose of propranolol than the
standard one. We require that the two-sided 95% CI
for µ be no wider than 5 beats per minute and the
sample sd for change in heart rate equals 10 beats
per minute.
2 2 2
n = (1.96) 10 /(2.5) = 62 patients

To change the confidence level, the multiplier (1.96)2 as


2 2
follows: 90% CL=(1.64) , 99% CL=(2.58)
2. Suppose that for a certain group of cancer patients, we
are interested in estimating the mean age at diagnosis.
We would like a 95% CI of 5 years wide. If the
population SD is 12 years, how large should our
sample be?
• Suppose d=1
• Then the sample size increases
3. A hospital administrator wishes to
estimate the mean weight of babies born
in the hospital. How large a sample of
birth records should be taken if she/he
wants a 95% CI of 0.5 wide? Assume that
a reasonable estimate of  is 2. Ans: 246
birth records.
But the population 2 is most of the
time unknown
As a result, it has to be estimated from:
• Pilot or preliminary sample:
– Select a pilot sample and estimate 2 with
the sample variance, s2
• Previous experience or expert opinion
• Similar studies
B. Sample size to estimate a
single population proportion
• Aim: Estimate p
• Want: Estimate ± d units where d = Z•SE
(95% CI of width=2d)
Steps:
1. Specify d (or w = 2d)
2. Use estimated p (use p=0.5 if no
information)
3. Solve for n
Examples
1. Suppose that you are interested to know
the proportion of infants who breastfed >18
months of age in a rural area. Suppose that
in a similar area, the proportion (p) of
breastfed infants was found to be 0.20.
What sample size is required to estimate
the true proportion within ±3% points with
95% confidence. Let p=0.20, d=0.03, α=5%
• Suppose there is no prior information
about the proportion (p) who breastfeed
• Assume p=q=0.5 (most conservative)
• Then the required sample size increases
• An estimate of p is not always available.
• However, the formula may also be used for
sample size calculation based on various
assumptions for the values of p.
• P = 0.1 → n = (1.96)2(0.1)(0.9)/(0.05)2 = 138
P = 0.2 → n = (1.96)2(0.2)(0.8)/(0.05)2 = 246
P = 0.3 → n = (1.96)2(0.3)(0.7)/(0.05)2 = 323
P = 0.5 → n = (1.96)2(0.5)(0.5)/(0.05)2 = 384
P = 0.7 → n = (1.96)2(0.7)(0.3)/(0.05)2 = 323
P = 0.8 → n = (1.96)2(0.8)(0.2)/(0.05)2 = 246
P = 0.9 → n = (1.96)2(0.9)(0.1)/(0.05)2 = 138
• For a fixed absolute precision (d), the
required sample size increases as P
increases from 0 to 0.5, and then
decreases in the same way as the
prevalence approaches 1.
2. Suppose that we wish to estimate the
prevalence of asthma in an adult population
with the width of the 95% confidence
interval of 0.10, an accuracy of ±0.05. An
estimate of the prevalence of asthma is
0.10.
Ans: 138
If we choose to double the accuracy to give a 95% confidence interval
of 0.05 width: n = 1.962 x 0.1 (1–0.1)/0.0252 = 553
3) p = 0.26 , d = 0.03 , Z = 1.96 ( i.e., for a 95% C.I.)

(1.96) (0.26  0.74)


2
n= 2
0.03
= 821.25  822
• Thus, the study should include at least 822 subjects.
Points for Consideration
1. Sample size estimates might need to be adjusted to compensate
for non-response rate, patient dropout or loss to follow-up, lack
of compliance, etc.
2. If sampling is from a finite population of size N, then:
n0
n=
 n0 
1 + 
 N
where n0 is the sample from an infinite population. When N is
large in comparison to n, (i.e., n/N ≤ 0.05), the finite population
correction may be ignored.
3. Design effect for complex cluster sampling. Common values:
multiply n by 2, 3, …5.
If the above sample is to be taken from a
relatively small population (say N = 3000),
the required minimum sample will be
obtained from the above estimate by
making some adjustments (if the population
is less than 10,000 then a smaller sample
size may be required.).
821.25
n final = = 644.7  645
821.25
1+
3000
2. Sample Size: Two Samples
Sample size based on hypothesis testing

A. Estimation of the difference between two


population means
B. Estimation of the difference between two
population proportions
Ho = There is no difference between the
two groups
Ho: µ1 - µ2 = 0
P1 - P2 = 0
HA = There is a difference between the
two groups
HA: µ1 - µ2 ≠ 0
P1 - P2 ≠ 0
• Type I error (α) = The probability of rejecting Ho
when it is true

• Type II error () = The probability of not rejecting


Ho when it is false
• α = Level of significance = Probability
of making Type I error
1 – α = Confidence
•  = Probability of making Type II error
1 – β = Power
• We would like to maintain a low probability of
a Type I error (α) and low probability of a Type
II error (β) [high power = 1 - β].

•Type I error = judging an innocent person as


guilty

•Type II error = judging a guilty person as


innocent.
Consider that the different types of errors may
have different consequences
• Consider an outbreak that might be
associated with a restaurant
• What are the consequences of making a
Type I error?
– Close restaurant or recall product that is not the
problem.
• Harmful to business, does not solve the problem.
• Health department accused of incompetence.
• What are the consequences of making a
Type II error?
– Fail to close the restaurant or remove the
product.
• People continue to get sick.
• Health department accused of incompetence.
• Power (1-) = the probability that the test
correctly rejects a false H0.
• The power of a test is 1 – β; because β is the
probability that a test fails to reject a false H0
and power is the probability that it does reject.
• If the power of a test is low, then there is little
chance of detecting a difference even if one
really exists.
• Power is an important part of the design
of a study.
• Obtaining a larger sample size
decreases the probability of a Type II
error, so it increases the power.
• Whenever a study fails to reject its H0,
the test’s power comes into question.
– Power (1 - β) = 50%, Zβ = 0.00
– Power (1 - β) = 75%, Zβ = 0.67
– Power (1 - β) = 80%, Zβ = 0.84
– Power (1 – β) = 90%, Zβ = 1.28

• Power is one-sided and Zβ is always one-sided


• Most studies recommend a power of 80%.
A. Comparison between two
means (Equal sample sizes)

Sample size in each group

∆ = /μ1-μ2/
The means and variances of the two
respective groups are (µ1, 21) and (µ2, 22).
Example
1. Determine the sample sizes required to detect a difference
of 5 mm in mean blood pressure between individuals
receiving placebo and those receiving drug with α =5% and
power of 0.80
• Assume σ1=σ2 = 15 mm in each group.
• We are interested in testing:
Ho: μ1- μ2 = 0, HA: μ1- μ2 = 5

• We would need 142 individuals in each group.


2. Suppose that the true blood pressure distribution
among OC users is normal with µ1 and 12.
Similarly, for non-users the distribution is normal
with µ2 and 22.We wish to test the hypothesis
that Ho: µ1 = µ2 versus µ1 ≠ µ2. Determine the
appropriate sample size for the study using α
=0.05 and a power of 80%. It was revealed by
the small study that: sample mean1=132.86,
s1=15.34, sample mean2=127.44, and s2=18.23.
Use the sample data to estimate population
parameters.
n = (15.342+18.232)(1.96+0.84)2/(132.86-127.44)2
= 152 in each group
B. Comparison between two
means (Unequal sample sizes)

λ =n2/n1
In some text books, λ = k = r
3. Suppose we anticipate twice as many non
OC users as OC users entering the study
using the previous example. Determine the
sample size to achieve an 80% power in the
study using α=0.05. λ = 2.

n1 = (15.342+18.232/2)(1.96+0.84)2/(5.42)2
= 108 OC users
and n2 = 2(108) = 216 non-OC users.
C. Comparison between two
proportions (Equal sample sizes)
• To test the hypothesis,
Ho: p1-p2 vs HA: p1-p2 ≠ 0,
|p1-p2| = ∆
with α and power (1-)
General formula using power of a study to
determine sample size (Comparative CS, CC,
Cohort) proportions

Where

∆ = p1-p2
P1 is the proportion in group 1
P2 is the proportion in group 2
Z/2 is the quintile of the standard normal distribution for type I error
Z is the quintile of the standard normal distribution for type II error/power
n1 is the sample size for group 1
n2 is the sample size for group 2
Example
• Let p1=0.35, p2=0.25, and Δ=p1-p2=0.35-
0.25 =0.10

• We would need approximately 329


subjects in each group.
• The proportion with condition X for group A is
expected to be 70% and for group B it is 60%.
• A study is planned to show the difference at the
significance level of 1% and power of 90%.
• p1 = 0.6; p2 = 0.7; p = (0.6 + 0.7)/2 = 0.65;
Z0.01 = 2.81; Z1-0.9 = 1.28.
• Calculate the sample size for each group.

n1 =
(2.81 2  0.65  0.35 + 1.28 0.6  0.4 + 0.7  0.3 )
2
 759
(0.6 - 0.7)2
Comparison of two proportions
• A better (more conservative) suggestion for
sample size is:

2
n 4 
n1adj. = 1 + 1 + 
4  n p1 - p2 
• Adjusted/conservative sample size is:

2
759  4 
n1adj = 1 + 1 +  = 836
4  759 0.6 - 0.7 
D. Comparison between two
proportions (Unequal sample sizes)
2
  1 P2 (1 - P2 ) 
 Z  1 +  P(1 - P) + Z  P1 (1 - P1 ) + 
 2  r  r 
n1 =
(P1 - P2 )2
Where P1 + rP2
P=
1+ r
Where r is the allocation ratio of group 2 to group 1, i.e., n2:n1
( n2 = r n1)

Note: This formula is quite general, and applies to cross-


sectional, case-control and cohort studies.
Example
• A study is proposed to study the effect of new
anticoagulant therapy. Patients are to be randomly
divided into two groups: one receives the anticoagulant
and the other placebo. The groups are then followed for
the incidence of major bleeding events over 3 years.
Suppose that 5% of treated patients and 22% of controls
are anticipated to experience a major event over 3 years.
How large a sample should such a study be to have an
80% chance of finding a significant difference at a ratio of
1:2 for treated and control at α =5%.
Solution
p1=0.05, p2=0.22, = (0.05+2*0.22)/(1+2) = 0.16
q1=0.95, q2=0.78, ∆ = 0.22-0.05 = 0.17
• If the OR or RR and one of the proportions
are known, we can compute the unknown
proportion by:

P2
P1 = P1 = P2 * OR
1 - P2
P2 +
OR
Example
• A case-control study to compare the efficacy of a
vaccine for the prevention of childhood
tuberculosis with a placebo. Let the proportion of
unvaccinated children is 30%, with an estimated
OR of at least 2.
P2 = 0.3, q2 = 0.7, OR = 2.0
P1 = 0.3/(0.3+0.7/2) = 0.462
• With equal cases and controls, what sample size
is required to detect, with 80% power and at α
5%?
= 140 in each group
Specific Formula: Cohort & Intervention
(z1 + z2)2 2 p (1- p)
n=
(p1–p2)2

Where
n = Sample size in each group
z1 = 1.96 for 95% confidence level

= 1.64 for 95% power


z2 = 1.28 for 90% power
= 0.84 for 80% power

p1 = proportion of outcome in exposed group


p2 = proportion of outcome in control group

p 1 + p2
p = = average of p1 and p2
2
p 1 –p 2 = Minimum meaningful difference in proportions between
exposed and control groups 60
Specific formula: Case-Control

( z1 + z 2) * 2 p(1 - p) c + 1
2
n= 
( p 2 - p1) 2
2c
p1= Expected frequency of exposure in cases
p2 =Expected frequency of exposure in controls
p = average of p1 and p2
c = ratio of controls to cases
Estimation of Single Rate
n ═ (Zα/2)2*r
d2
• The maternal mortality rate in a country is expected to
be 70 per 10,000 live births. A survey is planned to
determine the maternal mortality rate with a 95% CI of
60 to 80 per 10,000 live births. The required n would
be:
n ═ (Zα/2)2*r = (1.96)2 (70/10000) ═ 27,000 live births
d2 (10/10000)2
Using power of a study to
determine sample size
• Comparison between two rates

n ═ (Zα/2 +Zβ)2 (r1 +r2)


(r1 –r2 )2
Using a statistical program to do
the calculations
• Epi Info for estimation of the population
proportion
–Descriptive population or Cross-
sectional (CS) survey
–Unmatched/matched case-control
–Cohort or comparative CS.
Epi Info Sample Size Calculation
Special considerations
• Sample size calculations shown above are
based on SRS
• Stratification will increase precision
compared with SRS of the same size, smaller
sample can be selected
• Cluster sampling will reduce precision
compared to SRS of the same size, the larger
sample will be needed (design effect)
Summary
• Sample size calculations depend on a
number of assumptions:
– the hypothesized difference of interest, Δ
– the probability of Type I error (α)
– the probability of Type II error (β)
– the variance
• Choice of sample size depends on a
balance of reasonable assumptions, time,
effort, and finance
• Sample sizes provide a minimum estimate
of the desired sample sizes for the study.
References
1. Daniel W. Biostatistics: A Foundation for
Analysis in the Health Sciences
2. Rosner B. Fundamentals of Biostatistics
3. P. Armitage & G. Berry. Statistical methods in
medical research

68
Thanks!

You might also like