Lecture 3 Sampling
Lecture 3 Sampling
February,
2025
Addis Ababa,
1 Ethiopia
Outline
Introduction
Sampling
Types of sampling
probability and non probability sampling
Sample size determination for mean and
proportion
2
Introduction
The population is too large for us to consider
collecting information from all its members
3
Common terms used in
sampling
Population:-any specified groups of persons, things
and measurements
Target/source/references population
A collection of items that have something in common
for which we wish to draw conclusion at a particular
time
The whole groups of interest that we want to
generalize the results of the study
Study or sample population
The subset of target population that has at least
some chance of being sampled
The specific population from which data are collected
Sample
The subset of the population, about which information
is actually obtained or
A subset of population whose properties are to be
generalized to the larger population
4
Common terms used in
sampling
Sampling unit
The unit of selection in sampling process
6
The hierarchy of sampling
Study subjects
The actual participants in the study
Sample
Subjects who are selected
Sampling Frame
The list of potential subjects from which the sample is drawn
Study population
The Population from whom the study subjects would be obtained
Source population
The population to whom the results would be applied
7
Advantages of sampling
Feasibility: Sampling may be the only feasible
method of collecting information
Reduced cost: Sampling reduces demands on
resource such as finance, personnel, and
material.
Greater accuracy: Sampling may lead to better
accuracy of collecting data
Sampling error: Precise allowance can be made
for sampling error
Greater speed: Data can be collected and
summarized more quickly
Time efficient
8
Disadvantages of
sampling
There is always a sampling error
9
Key Points During Sampling
10
Steps in sampling
1. State the objectives
2. Define the target population
3. Define the data to be collected
4. Define the required precision and accuracy
5. Define the measurement instrument
6. Define sample frame, size and method then
select the sample
11
Sampling methods
There are two types of sampling
methods:
B. Non-probability methods
12
Probability sampling
methods
A sample obtained in a way that every number of
the population has a known &non-zero
Probability sampling is
more complex,
more time-consuming
Usually more costly than non-probability
13
sampling.
Probability sampling…
There are several different ways in which a probability
sample can be selected.
population
Involve random selection procedures
Generalization is possible
14
Advantages and disadvantages
Advantages
Selection is based on the principle of randomization
or chance
Reliable estimates can be produced
Generalization can be made about the population
Disadvantages
More complex, more time-consuming and usually
more costly than non probability sampling
15
Classification
Probability Sampling
Simple Random
Sampling Stratified Multistage
Sampling sampling
Systematic Cluster
Homogenous
Frame Sampling Sampling Stratified
Cluster
Homogenous Homogenous Sampling
No Frame Wide area
Not
Not Homogenous
Homogenous Wide area
16
Simple random
sampling
17 This is the most common and the simplest of the
sampling methods
In this method, the subjects are chosen from the
population with equal probability of selection
Assumption of the study population:
Homogeneous population
Availability of frame
One may use a random number table;
Lottery method
Table of random number
Computer generated random number
SRS….
Lottery methods
used for small population
each unit in the population is represented by a slip
of paper
put in a box and mixed
A sample of required size is drawn from the box.
Advantage
It is simple, easy to apply when the population is
small.
Known and equal chance of selection
being selected.
It does not require additional information on the frame (such
as geographic areas)
The formulas are easy to use.
20
Example
Age at first sex and associated factors for early
sexual initiation among female students at BHU,
Ethiopia
There are a total of 6, 000 students
We want to select 800 sample students
In this case, we assumed homogeneity with respect to age at
first sex
Their ID can be taken as frame
Hence we can use computer generated random number to
select 800 students randomly
21
Systematic sampling
In systematic sampling individuals are chosen at
regular intervals from the sampling frame
22
Systematic sampling
Important if the reference population is arranged in
some order:
Order of registration of patients
Numerical number of house numbers
Student’s registration books
23
Steps in systematic random
sampling
1. Number the units on your frame from 1 to N (where N is the
total population size).
2. Determine the sampling interval (K) by dividing the number
of units in the population by the desired sample size.
K=N/n k=sampling interval
N=population size n=sample size
3. Draw a random number between one and K. This number is
called the random start and would be the first number
included in your sample.
Let the selected number be j
4. Select every Kth unit after that first number j, j+k,
j+2k, j+3k---------------------j+nk
24
Example
A systematic sample is to be selected from 1200
students of a school. The sample size selected is
100. The sampling fraction is (skip interval)
k=1200/100=12
The number of the first student to be included in
the sample is chosen randomly, for example by
blindly picking one out of twelve pieces of paper,
numbered 1- 12.
If number 6 is picked, then every twelfth student
will be included in the sample, starting with
student number 6, until 100 students are
selected: then numbers selected would be 6, 18,
25
30, 42, etc.
Merits/
advantages
Less time consuming
Easier to perform than simple random sampling.
28
Stratified random
sampling
It is used when the population is known to have
heterogeneity with regard to some factors and those
factors are used for stratification
The population is 1st divided into homogeneous,
mutually exclusive groups called strata, and
A population can be stratified by any variable that is
available for all units prior to sampling (e.g., age,
sex, province of residence, income, etc.).
A separate sample is taken independently from each
stratum, by simple sampling or systematic
sampling
29
Stratified sampling
…
The procedures are:
Divided the total population in to different
homogeneous subgroups (strata)
Allocate sample for each strata (n i)
Proportional (ni =Ni(n/N))
Where ni =sample for each strata
Ni=total population of each strata
n=required sample size
N=total population of the
Disproportional (equal allocation)
30
Example
A survey is conducted on household water supply in
a district comprising 20,000 households, of which
20% are urban and 80% rural
32
Cluster sampling
In many administrative surveys, studies are done on
large populations which may be geographically quite
dispersed
Using simple random sample method will require large
costs and will be inconvenient
Sometimes it is too expensive to carry out SRS
Population may be large and scattered
Complete list of the study population unavailable
Travel costs can become expensive if interviewers
have to survey people from one end of the
country to the other
Cluster sampling is the most widely used to reduce the
cost
33
Cluster sampling…
Clusters appropriate in such conditions and
random samples of clusters will be included in the
study
34
Steps in cluster
sampling
The reference population (homogeneous) is divided
into clusters.
These clusters are often geographic units (e.g. districts,
villages, etc.).
A number of clusters are selected randomly to
represent the total population, and then all units
within selected clusters are included in the sample.
No units from non-selected clusters are included in
the sample—they are represented by those from
selected clusters
This differs from stratified sampling, where some units are
selected from each group
All the units in the selected clusters are studied
35
Example
In a study of knowledge, attitudes, and practices
related to family planning in rural communities of
a region, a list is made of all the villages.
36
Merit and demerit
Merit/advantages-
A list of all the individual study units in the
reference population is not required.
It is sufficient to have a list of clusters.
Demerit/disadvantage -
Sampling error is usually higher than for a
simple random sample of the same size.
modified cluster sampling
divide into sub cluster
and then take equal number from each
WHO-EPI coverage evaluation technique 3-7 subject
from each
37
Multi-stage sampling
Many studies, especially large nationwide surveys,
will incorporate different sampling methods for
different groups, and may be done in several
stages
38
Multi-stage
sampling…
Is appropriate when the reference population is large
and widely scattered.
Similar to the cluster sampling, except that it involves
large size).
The secondary sampling unit (SSU)
Is the sampling unit in the second sampling stage, etc.
house hold.
39
Merit and demerit
Merit/advantage
Cuts the cost of preparing sampling frame
save a great amount of time and effort
Demerit/disadvantage
Sampling error is increased compared with a
simple random sample
Gives less precise estimates than simple
random sampling for the same sample size
40
Example
In a study of utilization of pit latrines in a district,
150 homesteads are to be visited for interviews
with family members as well as for observations on
types and cleanliness of latrines.
The district is composed of six wards and each
ward has between six and nine villages.
The following four stage sampling procedure could
be performed:
Select three wards out of the six by simple
random sampling
For each ward, select five villages by simple
random sampling (15 villages in total)
41
Example
For each village select ten households. Because
simply choosing households in the center of the
village would produce a biased sample, the
following systematic sampling procedure is
proposed:
Go to the center of the village
Choose a direction in random way
Walk in the chosen direction and select every third or every
fifth household (depending on the size of the village) until
you have the ten you need.
Decide beforehand whom to interview (for example the head
of the household, if present, or the oldest adult who lives there
and who is available.)
42
Non Probability sampling
techniques
A type of sampling where each study unit in the
population has an unknown probability of
inclusion in the sample
43
When to use Non probability
Sampling
Group that represents the target population
already exists.
44
Non probability
sampling…
Advantages
Used when a sampling frame does not exist.
necessary.
Disadvantage
No random selection (unrepresentative).
sample.
Inappropriate for generalizing findings.
45
Classification
Non-probability
Sampling
48
Quota sampling…
Advantages
Is generally less expensive than random sampling.
Easy to administer, especially considering the tasks
of listing the whole population, randomly selecting
the sample and following-up on non-respondents can
be omitted from the procedure.
Is an effective sampling method when information is
urgently required and can be conducted without
sampling frames.
Disadvantages
It does not meet the basic requirement of
randomness.
Some units may have no chance of selection or the
chance of selection may be unknown
Therefore, the sample may be biased
49
Convenience/Haphazard
Sometimes referred to as haphazard or accidental
sampling
Selection of subjects based on easy availability &
accessibility
Examples :People who just happen walking
Is a method in which for convenience sake the
study units that happen to be available at the time
of data collection are selected
Often used in face to face interviews
50
Convenience or haphazard
sampling…
Advantage
Easy to use
Can delivery accurate results when the population
is homogeneous.
Disadvantage
Not normally representative of the target
population
E.g. scientists could use this method to determine
whether a lake is populated or not.
Assuming that the lake water is well-mixed, any
sample would yield similar information
A scientist could safely draw water anywhere on
the lake without bothering about whether or not
the sample is representative
51
Snowball
Involves a process of “chain referrals”
Thus the sample group appears to grow like a
rolling snowball
Suitable for locating key informants
You start with one or two key informants
and ask them if they know persons who
know a lot about your topic of interest
Used when trying to interview hard to
reach groups
52
Snowball sampling…
Often used in hidden populations which are difficult
for researchers to access;
example populations would be drug users or commercial
sex workers.
53
Exercise 1: Group
Choose appropriate sampling technique for the
following proposed research topics
1. Prevalence of Depression and Associated Factors Among
Urban Civil Servants, in AA city, Ethiopia
54
ERROR IN SAMPLING
55
ERROR IN SAMPLING….
Non-sampling error(systematic error)
A bias to which the investigator may not aware
that it distorts the results.
It is a type of systematic error in the design or
conduct of a sampling procedure which results
in distortion of the sample size
We can eliminate or reduce the non sampling
error by careful design of the sampling
procedures and by increasing the sample size.
There are several possible sources of bias in
sampling.
56
Sample Size Determination…
Introduction
Among the questions that a researcher should
ask when planning a survey is that "How large a
sample do I need?“
The answer will depend on the aims, nature and
scope of the study and on the expected result.
In planning a sample survey, the decision on the
size of sample is very important.
57
Introduction…
In general, sample size depends on:
The objective of the study
The study design
Degree of confidence with which to conclude.
The type of data analysis to be performed
The desired precision of the estimates one wishes
to achieve
The kind and number of comparisons that will be
made
The number of variables that have to be examined
simultaneously
How heterogeneous the population is
58
Sample size-Qualitative
studies
There are no fixed rules for sample size in
qualitative research.
The size of the sample depends on
What you try to find out
From what different informants or
59
Sample size-Quantitative
studies
Calculations made
The bigger the sample ,the better the study
becomes to a certain point
The desirable sample size depends on the
expected variation in the data (of the most
important variables):
The more varied the data are, the larger the
sample size we would need to attain the desired
level of accuracy
60
We can use either of the following methods
Using a census for small populations
calculate the sample size
1. Using a computer package
2. Using formulas to calculate a Sample Size
61
Using A Census For Small Populations
62
Using formulas to calculate a Sample Size
63
1. Rules of thumb
1. For smaller samples (N ‹ 100), there is little point
in sampling. Survey the entire population.
2. If the population size is around 500 (give or take
100), 50% should be sampled.
3. If the population size is around 1500, 20% should
be sampled.
4. Beyond a certain point (N = 5000), the population
size is almost irrelevant and a sample size of 400
may be adequate.
5. Statistician – maximalist – at least 500
64
2. Confidence interval
approach
Given confidence interval
66
For prevalence study
proportion
67
Example:-
A population of cancer patients has survival
standard deviation of 43.4 months. If one wants to
conduct a study on these populations how large
sample size is needed, so that 95% of the sample
mean of this size will be within ±6 months of the
population mean. Population size is 480 patients.
Solution: - δ=43.4 month, CI=95% d= ±6 months, α/2=0.025
(z=1.96), N=480 patients.
n= (Z α/2)2 δ2/d2 = (1.96)2x (43.4)2/62
=200
68
Example:-
In a survey of school children to determine the
population of immunized children against polio, an
investigator determined the maximum discrepancy b/n
sample and population proportion of immunized to be
0.04, at level of confidence of 99%.further the
investigator had a previous knowledge on the
prevalence among children in a similar community to be
90% and the total population of school children is 800.
Solution:-d=0.04, α=0.01, α/2 =0.005(2.58), p=0.9, q=0.1,
N=800, n=?
n= (Z α/2)2 p.q/d2 = (2.58)2x0.9x0.1/ (0.04)2 =374.4 = 374
ni/N=374/800=4.7% which is <5% there fore the initial
sample size is sufficient.
69
3. Hypothesis testing
approach
Comparison of two PROPORTIONS
The number of samples required in each group to
70
Hypothesis…
Sample sizes to estimate the difference
between two populations mean (assuming
that the two populations have common
standard deviations).
72
Cont,…
Where n1 = the size of sample one,
73
Sample-case control, cohort
and cross sectional studies
Sample size - for test of significant difference
between two proportions, the following formula can
be used:
n
Z Z p 1 p p 1 p
2
2
1 1 2 2
p1 p2
2
Parameters:
n - size of sample in each group
P1 ,P2–estimated population prevalence in the comparison
groups
β = 1- Power (the probability that if the two proportions differ
the test will produce a significant difference)
Usually a power of 80% is used
74
Sample-case control, cohort and
cross sectional studies…
To test a hypothesis about the difference between
two population proportion
For Cohort, case control and cross sectional studies.
75
Sample size using statistical
software
As an alternative method, we can use EPI INFO
statistical software to calculate the sample size
required for the study.
76
Start page
Step 1: Go to the main menu at the top left
corner and you will see the following window
77
‘statCalc’
Step 2: Go to ‘statCalc’
78
Sample size
Population survey
For single population, the information needed are
Size of the study population (N)
Proportion of the variable of interest from previous study
(p)
Worst acceptable result (d)
Design effect
79
Sample size
Let us assume the population that we want to
conduct the study has target population of size
N=120,000
The proportion of the variable of interest is not
known which means there is no previous study
done and hence we decided to use 50 percent as
an estimate of the prevalence for that variable
Finally different level of confidence with the
corresponding sample size will be calculated and
displayed.
The choice depends on
The level of confidence you fixed
80
Sample size
81
Sample size
Sample size for two populations: unmatched case
control study
82
Sample size
Sample size for two populations: cohort or cross
sectional
83
n k
h a
T
ou
y 84