Sample Size Phcs
Sample Size Phcs
24 February 2016
Geneva, Switzerland
Presented to LFAs
Principles of Sampling
1
Steps for deriving a sample of health facilities common to
all sampling methods
Identify survey design
• experimental
• non-experimental
Establish sampling frame
• Geographic coverage
• Where does the program offer services?
• Seasonal variation?
• Facility types
• Community, Primary, Secondary, Tertiary
• Management authority (e.g. public/private)
• NGO, donor-funded, disease specific, etc.
• Exclusion criteria
• safety, accessibility, political, programmatic
• decreasing the sampling frame must be done in a purposeful manner to avoid the appearance
of bias.
2
Survey of Sampling Methods Appropriate for Small N
• Purposive Sample
• Simple Random Sampling
• Cluster Sampling
• Lot Quality Assurance Sampling
3
Purposive Sampling
4
Purposive Sampling
Advantages:
• Simplicity of sampling and ease of research
• Helpful for pilot studies and for hypothesis generation
• Data collection can be facilitated in short duration of time
• Cost effectiveness
Disadvantages:
• Highly vulnerable to selection bias
• Generalizability unclear
• High level of sampling error
5
Simple Random Sampling (SRS) (1)
6
Simple Random Sampling (2)
• Disadvantages of SRS
• Often leads to high assessment implementation costs due to time and travel
required
• Requires a complete list of sample elements
7
Simple Random Sampling (3)
8
Simple Random Sampling (4)
9
Simple Random Sampling (5)
Relative variance:
• When the proportion of the parameter of interest in the population is small,
the margin of error is often too large for meaningful interpretation.
• Use relative variance (relative error vs. standard [or sampling] error) for
margin of error.
• Relative error also known as the coefficient of variation (cv).
• Statistically, the cv = SE(
• Results in larger samples, particularly for small values of p.
10
Simple Random Sampling (3)
The formula for calculating a sample size for SRS is:
�� ∗ � 1 − �
�=
��
11
Simple Random Sampling (6)
The formula for calculating a sample size for SRS with relative error is:
�� ∗ � ∗ �
�=
�� ∗ �
Where there is a predetermined population (e.g., total number of facilities in the country),
the sample size needs to be multiplied by the Finite Population Correction (FPC) factor.
The formula can be expressed as:
n
New n =
n−1
1+
N
Where:
New n = the adjusted new sample size
N = the population size
n = the sample size obtained from the general formula
(the FPC should be applied if n/N ≥ 5%)
13
Example: Stratified Random Sampling
Sample size formula for HF � � � �
]]*d
Sampling (with FPC & deff) :
In Excel: =(((D7*G7*H7)+F7)/(F7+D7*G7*H7/C7))*I7
square of
Ekiti State sample over
Strata Total HF normal ME ME^2 p q deff
Nigeria size sample
deviate
Ekiti State, Nigeria: By facility type and
14
Sample sizes for Conf = 95% Prevalence
Margin of error 50% 60% 70% 80% 90%
SRS 5% 384 369 323 246 138
– differing levels 10% 96 92 81 61 35
15% 43 41 36 27 15
of prevalence, 20% 24 23 20 15 9
confidence, and
margin of error Conf = 90% Prevalence
Margin of error 50% 60% 70% 80% 90%
5% 269 258 226 172 97
10% 67 65 56 43 24
15% 30 29 25 19 11
20% 17 16 14 11 6
15
Sample sizes for Conf = 95% Prevalence
Margin of error 50% 60% 70% 80% 90%
SRS with relative 5% 1537 1024 659 384 171
error 10% 384 256 165 96 43
15% 171 114 73 43 19
– differing levels 20% 96 64 41 24 11
of prevalence,
confidence, and Conf = 90% Prevalence
Margin of error 50% 60% 70% 80% 90%
margin of error 5% 1076 717 461 269 120
10% 269 179 115 67 30
15% 120 80 51 30 13
20% 67 45 29 17 7
16
Simple Random Sampling (8)
• Each estimate requires a different sample size – take the largest
• Adjust for non-response
• Adjust for equal reliability of domains (strata, e.g. urban/rural)
• If sampling rate in domain is ≥ 50% take a census
• If stratifying on size and don’t know the size, you can use a proxy measure:
• Client volume (e.g. indicator value), number of staff, number of beds
• Exact numbers can be rounded without loss of precision
• Large facilities should be sampled purposively to ensure inclusion
• Confidence level is more important than ME
• For our purposes, SRS probably only useful for larger values of p, and wide margin of
error
17
Cluster Sampling (1)
• Often results in increase in sample size relative to SRS due to fewer areas → ↓
representativeness → ↑ variance
• Design effect - the precision of the cluster sample compared to a simple random
sample of the same size
• Recommended to use design effect 1.2 for health facility sampling (with caveats)
19
Cluster Sampling (3)
20
Cluster
Sampling (4)
Example
21
Cluster Conf = 95% Prevalence
Margin of error 50% 60% 70% 80% 90%
Sampling 5% 461 443 387 295 166
– Sample sizes 10% 115 111 97 74 41
15% 51 49 43 33 18
for differing 20% 29 28 24 18 10
levels of
prevalence, Conf = 90% Prevalence
Margin of error 50% 60% 70% 80% 90%
confidence, and 5% 323 310 271 207 116
margin of error 10% 81 77 68 52 29
15% 36 34 30 23 13
20% 20 19 17 13 7
22
Cluster Sampling
23
Cluster Sampling: Data Quality Example - Modified 2-stage
cluster sample
Precision for varying number of clusters and sites per cluster - modified 2-stage cluster sample
Number of Districts (Clusters)
No. of HFs per
4 8 10 15 20 30
Stratum
1 0.472 0.249 0.212 0.164 0.138 0.110
25
Cluster Sampling: Data Quality Example - Modified 2-stage cluster sample
Procedure for Sampling Clusters
• Order districts and results alphabetically (or according to another method that will not introduce
periodicity or other forms of bias into the list of districts, e.g. serpentine method)
• Exclude districts for which it is not possible/advisable to conduct the assessment (e.g. due to
security concerns).
• Calculate the sampling interval - sum of indicator value over all clusters divided by the number of
clusters to sample.
• Select a starting point at random – number between 1 and sampling interval. This is your first
cluster.
• Calculate the running sum of cluster volume over all clusters.
• Add sampling interval to random starting point – find in running sum column the number for 2nd
cluster. If 1st cluster + sampling interval is below running sum value for the cluster then select.
• Proceed down the ordered list of clusters and select up to the desired number of clusters.
• Clusters with larger volume may be selected more than once. If a cluster is selected more than
once you will sample more health facilities per cluster in these clusters. (Number of times district is
selected as cluster x number of desired health facilities per cluster.)
• In the field, if a facility is deemed inaccessible, the next nearest non-sampled facility can used.
26
Cluster Sampling: Data Quality Example - Modified 2-stage
cluster sample
Example
27
Cluster Sampling: Data Quality Example - Modified 2-stage cluster sample
28
Cluster Sampling: Data Quality Example - Modified 2-stage cluster sample
CS Cotonou 1
Littoral 9 12 6 27
(DIST)
Hôpital Camp
Littoral 4 19 3 26
Guézo
Hôpital St Luc
Littoral 0 0 7 7
(Cotonou)
Littoral HOMEL 0
29
Cluster Sampling: Data Quality Example - Modified 2-stage cluster sample
Analysis:
30
Lot Quality Assurance Sample (LQAS)
• Often when small samples are necessary, classification is used to assess
elements of the population to make inferences to the larger population.
• LQAS is a method of classification which can also be seen as a stratified random
sampling design.
• LQAS consists of two stages:
• the first obtains random samples from, say, districts within a region in order to classify
each as belonging to one of two classes that is often labeled either “acceptable” or
“unacceptable”.
• The second combines information from such districts to obtain a measure of the region
as a whole (either as a stratified sample or a cluster sample)
• Regardless of the LQAS design selected for use, its primary purpose is the
classifications carried out at the local level.
• The second stage employs traditional sampling theory.
31
Lot Quality Assurance Sample (LQAS)
• A relatively rapid and inexpensive data collection approach that allows
program implementers to use small sample sizes and more frequent
sampling to categorize and prioritize areas by their performance on key
indicators.
• The lower expense is due to LQAS’s primary purpose of classification rather
than point estimate
• Since the 1980s it has been used to assess immunization coverage, post-
disaster assessment of health status, family planning and antenatal care,
growth and nutrition monitoring, diarrheal disease control, and quality
management, in urban zones, rural areas, or on a national scale in over 55
countries
32
Lot Quality Assurance Sample (LQAS)
34
Lot Quality Assurance Sample (LQAS)
• Advantages of LQAS:
• Can use smaller samples since response for each lot is binary (yes/no)
• Simplified data analysis
• Good for routine monitoring of program and data quality
• Wide range of applications
• Disadvantages of LQAS:
• Requires updated list of health facilities
• Facilities need to be selected randomly within lots
• Initial conceptual leap
• Smaller, but growing body of research
35
Lot Quality Assurance Sample (LQAS)
36
Lot Quality Assurance Sample (LQAS)
• Define lots: a ‘lot’ can be defined as a population unit assigned to health unit, a health
center, or even health records within a health center. An ideal lot is the smallest unit that
could provide meaningful information to a health planner when evaluating a health program
• PU = Bench mark for quality established equal to or above which service quality is deemed
acceptable
• PL = Bench mark for quality below which service quality is deemed very unacceptable
• n = sample size
• α error = the probability of misclassifying an area with unacceptable performance as
acceptable (1- α = specificity), also known as ‘consumer risk’
• β error = the probability of misclassifying an area with acceptable performance as
unacceptable (1 - β = sensitivity), also known as ‘provider risk’
• d = the decision rule – the number of sampled elements that must be deemed acceptable in
order for the entire ‘lot’ to be deemed acceptable. If the number of acceptable sampled
elements is not reached the ‘lot’ must be rejected as unacceptable. (The decision rule
depends on all the other sampling parameters).
37
LQAS
Provider vs. Consumer Risk
• The provider error measures the risk, or probability, that an acceptable lot will be
classified as unacceptable
• The consumer error measures risk that an unacceptable lot will be classified as
acceptable.
• In the health care setting, the provider is the organization providing the health
care interventions (most often the government), while the consumer is the
intended beneficiaries of the health care intervention (e.g. the general
population, or pregnant women, or children less than five years of age, etc.)
• Provider and consumer error are also known as sensitivity and specificity,
respectively. A low provider error typically results in a higher consumer error, and
vice versa. However, they should be kept as equal as possible.
• In general, a sample size of 19 permits both provider and consumer errors of ≤
10% which is typically sufficient confidence for both providers and consumers for
routine monitoring
38
LQAS – Sample size calculation
Sample sizes in LQAS are derived using the following binomial formula which calculates the
probability of finding a certain number of failures from a population for which, say, 80% are successes.
�!
�� = �� � � ���
�! (� − �)!
Where;
Pa = the probability of selecting “a” successes (e.g. rapid diagnostic tests) in a sample of “n” elements
p = the benchmark for quality (80% in this example)
q = the expected proportion of failures (q = 1-p)
n = the sample size
a = the exact number of ‘successes’ in the sample (i.e. the acceptable performance)
n-a = the number of ‘failures’ in the sample (i.e. the unacceptable performance; in LQAS this
expression is referred to as “d”, the “decision rule”)
39
LQAS – Table of cumulative probabilities, n=19
d 0.05 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 0.95
0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.014 0.135 0.377
1 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.010 0.083 0.420 0.755
2 0.000 0.000 0.000 0.000 0.000 0.000 0.005 0.046 0.237 0.705 0.933
3 0.000 0.000 0.000 0.000 0.000 0.002 0.023 0.133 0.455 0.885 0.987
4 0.000 0.000 0.000 0.000 0.001 0.010 0.070 0.282 0.673 0.965 0.998
5 0.000 0.000 0.000 0.000 0.003 0.032 0.163 0.474 0.837 0.991 1.000
6 0.000 0.000 0.000 0.001 0.012 0.084 0.308 0.666 0.932 0.998 1.000
7 0.000 0.000 0.000 0.003 0.035 0.180 0.488 0.818 0.977 1.000 1.000
8 0.000 0.000 0.000 0.011 0.088 0.324 0.667 0.916 0.993 1.000 1.000
9 0.000 0.000 0.002 0.033 0.186 0.500 0.814 0.967 0.998 1.000 1.000
10 0.000 0.000 0.007 0.084 0.333 0.676 0.912 0.989 1.000 1.000 1.000
11 0.000 0.000 0.023 0.182 0.512 0.820 0.965 0.997 1.000 1.000 1.000
12 0.000 0.002 0.068 0.334 0.692 0.916 0.988 0.999 1.000 1.000 1.000
13 0.000 0.009 0.163 0.526 0.837 0.968 0.997 1.000 1.000 1.000 1.000
14 0.002 0.035 0.327 0.718 0.930 0.990 0.999 1.000 1.000 1.000 1.000
15 0.013 0.115 0.545 0.867 0.977 0.998 1.000 1.000 1.000 1.000 1.000
16 0.067 0.295 0.763 0.954 0.995 1.000 1.000 1.000 1.000 1.000 1.000
17 0.245 0.580 0.917 0.990 0.999 1.000 1.000 1.000 1.000 1.000 1.000
18 0.623 0.865 0.986 0.999 1.000 1.000 1.000 1.000 1.000 1.000 1.000
19 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
40
LQAS – Operation Characteristic Curve (OC Curve) for
n=19, d = 13
41
LQAS Table: Decision Rules for Sample Sizes of 12-30 and Coverage Targets / Average of 10% - 95%
LQAS Sample Average Coverage (Baselines) / Annual Coverage Target (Monitoring and Evaluation)
Size*
Table 12
10%
N/A
15%
N/A
20%
1
25%
1
30%
2
35%
2
40%
3
45%
4
50%
5
55%
5
60%
6
65%
7
70%
7
75%
8
80%
8
85%
9
90%
10
95%
11
13 N/A N/A 1 1 2 3 3 4 5 6 6 7 8 8 9 10 11 11
14 N/A N/A 1 1 2 3 4 4 5 6 7 8 8 9 10 11 11 12
15 N/A N/A 1 2 2 3 4 5 6 6 7 8 9 10 10 11 12 13
16 N/A N/A 1 2 2 3 4 5 6 7 8 9 9 10 11 12 13 14
17 N/A N/A 1 2 2 3 4 5 6 7 8 9 10 11 12 13 14 15
18 N/A N/A 1 2 2 3 5 6 7 8 9 10 11 11 12 13 14 16
19 N/A N/A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
20 N/A N/A 1 2 3 4 5 6 7 8 9 11 12 13 14 15 16 17
21 N/A N/A 1 2 3 4 5 6 8 9 10 11 12 13 14 16 17 18
22 N/A N/A 1 2 3 4 5 7 8 9 10 12 13 14 15 16 18 19
23 N/A N/A 1 2 3 4 6 7 8 10 11 12 13 14 16 17 18 20
24 N/A N/A 1 2 3 4 6 7 9 10 11 13 14 15 16 18 19 21
25 N/A 1 2 2 4 5 6 8 9 10 12 13 14 16 17 18 20 21
26 N/A 1 2 3 4 5 6 8 9 11 12 14 15 16 18 19 21 22
27 N/A 1 2 3 4 5 7 8 10 11 13 14 15 17 18 20 21 23
28 N/A 1 2 3 4 5 7 8 10 12 13 15 16 18 19 21 22 24
29 N/A 1 2 3 4 5 7 9 10 12 13 15 17 18 20 21 23 25
30 N/A 1 2 3 4 5 7 9 11 12 14 16 17 19 20 22 24 26
N/A: Not applicable, meaning LQAS cannot be used in this assessment because the coverage is either too low or too high to assess an SA
Alpha or beta errors are ≥ 10%
42
Alpha or beta errors are > 15%
LQAS = Online sample calculator
43
No. Health Facility Name Recounted Reported VF Acceptable (Y/N)
LQAS and 2
3
CS KANZEZNZE
CS SAMBA
97
79
106
53
0.92
1.49
Y
quality 5
6
CS LUMU
CS BURHIBA
265
127
271
127
0.98
1.00
Y
7 CS KASHEKE 89 98 0.91 Y
9 CS TSHIKAJI 40 37 1.08 Y
11 CS LUBONDAIE 21 70 0.30 N
12 CS KATEBA 84 84 1.00 Y
13 CS TSHILOMBA 28 49 0.57 N
15 CS TSHUMBE 1 37 43 0.86 Y
16 CS TSHUMBE 2 80 82 0.98 Y
17 CS CINDUNDU 25 41 0.61 N
18 CS KATABAYI 45 43 1.05 Y
19 CS BILOMBA 68 72 0.94 Y
44
Total Acceptable 12
LQAS - Aggregating lots to derive a point estimate
• Lots can be aggregated to provide a point Sample sizes for LQAS with varying levels
estimate provided there are sufficient of precision (95% confidence)
sample elements to ensure adequate Precision Sample Size
precision (a total of 95 elements yields
±10% precision, at 95% confidence) 0.13 57
• Judgements about the health areas that
make up the total sample can also be 0.11 76
made (acceptable/ unacceptable).
• “Rolling sample”: Assessments of lots can 0.10 95
be staggered to yield the point estimate
over time (assess one lot per quarter, at 0.09 114
the end of the year derive point estimate)
0.08 133
45
LQAS - Aggregating lots to derive a point estimate
Calculate the rate of Example–Weighted estimate of parameter of interest
retention on ART through a
Service wt. * n-
record review at a sample Area
n d n-d/n N wt.
d/n
of health facilities:
1 19 5 0.74 176 176/660 0.20
47
Sampling Health Facility Staff
48
Sampling Health Facility Clients
49