Chapter 4 Samplind Distr and SS
Chapter 4 Samplind Distr and SS
03/26/2025 1
Population
A group of individuals, objects, or items
from among which samples are taken for
measurement.
Decisions must be made concerning the
population or individual units (persons,
households, etc.) to be investigated
The population under consideration should
be clearly and explicitly defined in terms
of place, time, and other relevant criteria
03/26/2025 2
Source of Population
03/26/2025 6
Sampling
It is not easy to collect all the information about
population and also it is not possible to study
the characteristics of the entire population
(finite or infinite) due to time factor, cost factor
and other constraints.
Thus we need sample.
Sample is a finite subset of statistical
individuals in a population and the number of
individuals in a sample is called the sample size.
03/26/2025 7
Sampling
• Sampling is the process involving the selection
of a finite number of elements from a given
population of interest, for purposes of inquiry.
• A main concern in sampling:
– Ensure that the sample represents the
population, and
– The findings can be generalized.
03/26/2025 8
Sampling ---
• Inferences about the population are based on
the information from the sample drawn from
that population.
• However, due to the variability in the
characteristics of the population, scientific
sample designs should be applied to select a
representative sample.
• Sampling enables us to estimate the
characteristic of a population by directly
observing a portion of the population.
03/26/2025 9
Sample Information
Population
03/26/2025 10
Common terms used in sampling
• Population: it is the collection of all items
of interest.
• Sample: It is a subset of the population.
• Sampling: It is the method by which we
select a sample from the population
• Reference population (or target
population): the population of interest to
whom the researchers would like to make
generalizations.
03/26/2025 11
Common terms ----
• Sampling population: the actual group in which the
study is conducted = Sample
• Study population: the subset of the target population
from which a sample will be drawn.
• Study unit: the population on which information will
be collected(measurement is done).
• Sampling frame: It is the list of all the sampling units
in the source population and from which a random
sample is to be drawn.
03/26/2025 12
Common terms ----
Sampling scheme (Design): method of selecting
sampling units from sampling frame.
Random, convenience sample…
Reasons for Sampling
• For an exploratory purpose – to get general
impression of the total population.
• For the purpose of obtaining estimates on
certain characteristics of the population
03/26/2025 13
Researchers are interested to know about factors associated with
ART use among HIV/AIDS patients attending certain hospitals in a
given Region
Sample
03/26/2025 14
03/26/2025 15
Advantages of sampling:
• Probability Sampling
– Every element in the target population or universe [sampling
frame] has equal probability of being chosen in the sample
for the survey being conducted.
– Scientific, operationally convenient and simple in theory.
– Results may be generalized.
• Non-Probability Sampling
– Every element in the universe [sampling frame] does not
have equal probability of being chosen in the sample.
– Operationally convenient and simple in theory.
– Results may not be generalized.
03/26/2025 29
• Probability sampling is:
– more complex,
– more time-consuming and
– usually more costly than non-probability
sampling.
® a technique you can use to maximize external
validity or generalizability of the results of the
study.
• However, because study samples are randomly
selected and their probability of inclusion can be
calculated,
– reliable estimates can be produced and
03/26/2025 30
– inferences can be made about the population.
• There are several different ways in which a
probability sample can be selected.
• The method chosen depends on a number of
factors, such as
– the available sampling frame,
– how spread out the population is,
– how costly it is to survey members of the
population
03/26/2025 31
When to use Non probability Sampling
• Group that represents the target population already
exists.
• Difficult or impossible to obtain the list of names for
sampling (Homeless, IV Drug user).
• All of the cases of interest may not be identified ahead
of time.
• For rare population.
03/26/2025 32
Advantages Non-probability sampling
03/26/2025 33
Disadvantage Non-probability sampling
03/26/2025 34
03/26/2025 35
03/26/2025 36
03/26/2025 37
Types of Sampling Methods
Samples
Method
Probability Samples
Non-Probability
Samples
Convenience
Multistage Random Sampling
Quota
03/26/2025 38
Simple Random Sampling
03/26/2025 39
1. Simple random sampling
Each member of a population has an equal chance of being
included in the sample.
• To use a SRS method:
– Make a numbered list of all the units in the population
i.e. Sampling frame ( not always mandatory )
– Each unit should be numbered from 1 to N (where N is
the size of the population)
– Select the required number.
• The randomness of the sample is ensured by:
• Use of “lottery’ methods
• Table of random numbers
• Computer programs
03/26/2025 40
Random number table
• It is a table of random numbers constructed by a
process that
1. In any position in the table, each of the numbers 0
through 9 has a probability 1/10 of
occurring.
2. The occurrence of any number in one part of the table is
independent of the occurrence of any number in any
other part of the table.
• SRS has certain limitations:
– Difficult if the reference population is dispersed.
– Minority subgroups of interest may not be selected.
03/26/2025 41
Assumption of the population
Availability of frame
03/26/2025 42
Simple random sampling
03/26/2025 43
How to Use Random Number Tables
03/26/2025 45
Feed the information
03/26/2025 46
The numbers generated
are:
03/26/2025 47
Simple random sampling
03/26/2025 48
03/26/2025 49
Systematic Random Sampling
03/26/2025 50
Steps in Drawing a Systematic Random Sample
03/26/2025 52
Systematic random sampling
03/26/2025 53
03/26/2025 54
3. Stratified random sampling
03/26/2025 56
Steps in Drawing a Stratified Random
Sample
03/26/2025 58
Stratified …
Population of L strata, stratum l contains nl units
03/26/2025 59
Why do we need to create strata?
• It can make the sampling strategy more efficient.
03/26/2025 61
4. Cluster sampling
• Sometimes it is too expensive to carry out SRS
– Population may be large and scattered.
– Complete list of the study population unavailable
– Travel costs can become expensive if interviewers have to
survey people from one end of the country to the other.
• Cluster sampling is the most widely used to reduce the
cost
• The clusters should be homogeneous, unlike stratified
sampling where the strata are heterogeneous
03/26/2025 62
Cluster sampling
• Principle
– Whole population divided into groups
e.g. woreda, kebele or Got
– Random sample taken of these
groups (“clusters”)
– Within selected clusters, all units
e.g. households included
03/26/2025 63
Steps in cluster sampling
• Cluster sampling divides the population into groups
or clusters.
• A number of clusters are selected randomly to
represent the total population, and then all units
within selected clusters are included in the sample.
• No units from non-selected clusters are included in
the sample—they are represented by those from
selected clusters.
• This differs from stratified sampling, where some
units are selected from each group.
03/26/2025 64
Example: Cluster sampling
Section 1 Section 2
Section 3
Section 5
Section 4
03/26/2025 65
Example
• In a school based study, we assume students of
the same school are homogeneous.
• We can select randomly sections and include all
students of the selected sections only
03/26/2025 66
Cluster sampling
03/26/2025 67
Cluster sampling
• Advantages
– Simple as complete list of sampling units
within population not required
– Less travel/resources required
• Disadvantages
– Potential problem is that cluster members are
more likely to be alike, than those in another
cluster (homogenous)…
– This “dependence” needs to be taken into
account in the sample size….and the analysis
(“design effect”)
03/26/2025 68
Difference Between Cluster and Stratified
Sampling
Take simple random sample in every stratum Take srs of clusters, sample
every unit in chosen
03/26/2025
clusters 69
5. Multi-stage sampling
• In a very large and diverse population,
sampling may be done in two or more stages
• Carried out in phases and usually involves more
than one sampling method.
• Design effect should be considered
Kebele SSU
Sub-Kebele TSU
HH
03/26/2025 72
• In the first stage, large groups or clusters are identified and
selected. These clusters contain more population units than
are needed for the final sample.
03/26/2025 76
• Reliability cannot be measured in non-probability
sampling; the only way to address data quality is to
compare some of the survey results with available
information about the population.
03/26/2025 77
• Despite these drawbacks, non-probability
sampling methods can be useful when
descriptive comments about the sample itself
are desired.
03/26/2025 80
1. Convenience or haphazard sampling
• Convenience sampling is sometimes referred to as
haphazard or accidental sampling.
• It is not normally representative of the target
population because sample units are only selected if
they can be accessed easily and conveniently.
• The obvious advantage is that the method is easy to
use, but that advantage is greatly offset by the
presence of bias.
• Although useful applications of the technique are
limited, it can deliver accurate results when the
population is homogeneous.
03/26/2025 81
Convenience or haphazard sampling…
03/26/2025 83
Volunteer sampling….
• In exchange, the volunteers accept the possibility of a
lengthy, demanding or sometimes unpleasant process.
• Sampling voluntary participants as opposed to the
general population may introduce strong biases.
• Often in opinion polling, only the people who care
strongly enough about the subject tend to respond.
• The silent majority does not typically respond, resulting
in large selection bias.
03/26/2025 84
3. Purposive/Judgemental
• The researchers choose the sample based on who they
think would be appropriate for the study.
03/26/2025 86
4. Quota sampling
03/26/2025 87
• Quota sampling is generally less expensive than
random sampling.
• It is also easy to administer, especially considering the
tasks of listing the whole population, randomly
selecting the sample and following-up on non-
respondents can be omitted from the procedure.
• Quota sampling is an effective sampling method
when information is urgently required and can be
conducted without sampling frames.
• In many cases where the population has no suitable
frame, quota sampling may be the only appropriate
sampling method.
03/26/2025 88
Quota sampling
03/26/2025 89
5. Snowball sampling
• A technique for selecting a research sample where
existing study subjects recruit future subjects from among
their friends.
• Thus the sample group appears to grow like a rolling
snowball.
• This sampling technique is often used in hidden
populations which are difficult for researchers to access;
example populations would be drug users or commercial
sex workers.
03/26/2025 90
Snowball sampling
• Because sample members are not selected from a
sampling frame, snowball samples are subject to numerous
biases. For example, people who have many friends are more
likely to be recruited into the sample.
• Involves a process of “chain referrals”
• You start with one or two key informants and ask them if they
know persons who know a lot about your topic of interest.
• Used when trying to interview hard to reach groups.
03/26/2025 91
Sample Size Determination
03/26/2025 92
• Sample Size: The number of study subjects selected to
represent a given study population.
• Important to make inferences based on the findings from
the sample.
• Should be sufficient to represent the characteristics of
interest of the study population.
• In estimating a certain characteristic of a population,
sample size calculations are important to ensure that
estimates are obtained with required precision or
confidence
• The accuracy of the predicted results determine the size
of the sample.
03/26/2025 93
• In studies concerned with detecting an effect
(e.g. a difference between two groups),
sample size calculations are important to
ensure the detection of whether association
exists or not.
03/26/2025 95
• Common questions:
– “How many subjects should I study?”
– Too small sample = Waste of time and resources
= Results have no practical use
– Too large sample = Waste of resources
= Data quality compromised
03/26/2025 96
• When deciding on sample size:
∆ COST
PRECISION
03/26/2025 97
• The feasible sample size is also determined
by the availability of resources:
– time
– manpower
– transport
– available facility, and
– money
03/26/2025 98
03/26/2025 99
1. Sample Size: Single Sample
• The aim is to have a large enough sample with which
to estimate a population mean or proportion within a
narrow interval with high reliability.
• Concerned with the precision of the estimate
(“narrowness of the CI”).
estimate ± d units
03/26/2025 104
Standard error of the
estimator of the parameter
of interest
3.
03/26/2025 105
Sample size for single population mean
Populations of cancer patient have a survival
standard deviation of 43.3 months. If one
wants to conduct a sample survey on these
populations, how large sample is needed so
that 95% of the means of these samples of
size will be with in 6 months of the population
mean? The population size is 480 patients.
03/26/2025 106
Example:
1. Find the minimum sample size needed to estimate the drop in
heart rate (µ) for a new study using a higher dose of
propranolol than the standard one. We require that the two-
sided 95% CI for µ be no wider than 5 beats per minute and
the sample sd for change in heart rate equals 10 beats per
minute.
n = (1.96)2102/(2.5)2 = 62 patients
2. Suppose that for a certain group of cancer patients, we are
interested in estimating the mean age at diagnosis. We would
like a 95% CI of 5 years wide. If the population SD is 12 years,
how large should our sample be?
03/26/2025 107
• Suppose d=1
• Then the sample size increases
03/26/2025 111
B. Sample size to estimate a single population
proportion
• Aim: Estimate p
• Want: Estimate ± d units where d = Z•SE
(95% CI of width=2d)
Steps:
1. Specify d (or w = 2d)
2. Use estimated p (use p=0.5 if no information)
3. Solve for n
03/26/2025 112
1. Suppose that you are interested to know the
proportion of infants who breastfed >18 months of
age in a rural area. Suppose that in a similar area,
the proportion (p) of breastfed infants was found to
be 0.20. What sample size is required to estimate the
true proportion within ±3% points with 95%
confidence. Let p=0.20, d=0.03, α=5%
03/26/2025 113
• Suppose there is no prior information about
the proportion (p) who breastfeed
• Assume p=q=0.5 (most conservative)
• Then the required sample size increases
03/26/2025 114
• An estimate of p is not always available.
• However, the formula may also be used for
sample size calculation based on various
assumptions for the values of p.
• P = 0.1 n = (1.96)2(0.1)(0.9)/(0.05)2 = 138
P = 0.2 n = (1.96)2(0.2)(0.8)/(0.05)2 = 246
P = 0.3 n = (1.96)2(0.3)(0.7)/(0.05)2 = 323
P = 0.5 n = (1.96)2(0.5)(0.5)/(0.05)2 = 384
P = 0.7 n = (1.96)2(0.7)(0.3)/(0.05)2 = 323
P = 0.8 n = (1.96)2(0.8)(0.2)/(0.05)2 = 246
03/26/2025 115
Some Considerations
03/26/2025 116
But the population 2 is most of the time unknown
03/26/2025 117
• For a fixed absolute precision (d), the required
sample size increases as P increases form 0 to
0.5, and then decreases in the same way as
the prevalence approaches 1.
03/26/2025 118
2. A survey is planned to determine what proportion
of the medical students have regularly chewed
khat. If no estimate of p is available and a pilot
sample cannot be drawn, what sample size would
be required if a 95% confidence is desired, and
d=0.04 is to be used.
• Ans: 600 students
03/26/2025 119
Some Considerations
03/26/2025 120
Example 2
03/26/2025 121
Sample size using statistical
software
• As an alternative method, we can use EPI INFO statistical
software to calculate the sample size required for the
study.
• Let us assume the population that we want to conduct the
study has target population of size N=100,000.
• Sample size determination for Epidemiological study
design
• The proportion of the variable of interest is not known which
means there is no previous study done and hence we decided
to use 50 percent as an estimate of the prevalence for that
variable.
• Then the steps that we need to follow to get the required
sample size using EPI INFO statistical software are given
03/26/2025 122
below:
Steps to compute sample size
03/26/2025 123
Start page
03/26/2025 124
03/26/2025 125
03/26/2025 126
03/26/2025 127
03/26/2025 128
Thank. you!!!
129