Biostat Lecture Six
Biostat Lecture Six
02/27/2025 [email protected] 1
Introduction
Researchers often use sample survey methodology to obtain information
about a larger population by selecting and measuring a sample from that
population.
Since population is too large, we rely on the information collected from the
sample mainly for cost minimization.
Inferences about the population are based on the information from the
sample drawn from that population.
02/27/2025 [email protected] 2
Introduction…
A sample is a collection of individuals selected from a larger population.
02/27/2025 [email protected] 3
Sampling
The process of selecting a portion of the population to represent the entire
population.
A main concern in sampling:
Ensure that the sample represents the population, and
The findings can be generalized.
02/27/2025 [email protected] 4
Advantages of sampling
02/27/2025 [email protected] 5
Disadvantages of sampling:
There is always a sampling error.
Chance of bias
02/27/2025 [email protected] 6
Errors in sampling
1) Sampling error: Errors introduced due to problems in the selection of a sample.
They cannot be avoided or totally eliminated but can be reduced.
2) Non-sampling error:
Observational error
Respondent error
02/27/2025 [email protected] 8
A. Probability sampling
Involves random selection of a sample
Every sampling unit has a known and non-zero probability of selection into
the sample.
Involves the selection of a sample from a population, based on chance.
Probability sampling is:
More complex,
More time-consuming and
Usually more costly than non-probability sampling.
02/27/2025 [email protected] 9
However, because study samples are randomly selected and their
probability of inclusion can be calculated:
reliable estimates can be produced and
inferences can be made about the population.
02/27/2025 [email protected] 10
Most common probability sampling methods
Simple random sampling
Systematic random sampling
Stratified random sampling
Cluster sampling
Multi-stage sampling
02/27/2025 [email protected] 11
1. Simple random sampling
The required number of individuals are selected at random from the sampling frame, a
list or a database of all individuals in the population.
Each member of a population has an equal chance of being included in the sample.
To use a SRS method:
Make a numbered list of all the units in the population
Each unit should be numbered from 1 to N (where N is the size of the population)
02/27/2025 [email protected] 13
2. Systematic random sampling
02/27/2025 [email protected] 14
Important if the reference population is arranged in some order:
Order of registration of patients
Numerical number of house numbers
Student’s registration books
02/27/2025 [email protected] 15
Steps in systematic random sampling
1. Number the units on your frame from 1 to N (where N is the total population size).
2. Determine the sampling interval (K) by dividing the number of units in the
population by the desired sample size.
3. Select a number between one and K at random. This number is called the random
start and would be the first number included in your sample.
Note: Systematic sampling should not be used when a cyclic repetition is inherent in the
sampling frame.
02/27/2025 [email protected] 16
Example
To select a sample of 100 from a population of 400, you would need a sampling
interval of 400 ÷ 100 = 4. Therefore, K = 4.
You will need to select one unit out of every four units to end up with a total of 100
units in your sample.
Select a number between 1 and 4 from a table of random numbers.
If you choose 3, the third unit on your frame would be the first unit included in your
sample
The sample might consist of the following units to make up a sample of 100: 3 (the
random
02/27/2025
start), 7, 11, 15, 19...395, 399 (up to N, which is 400 in this case).
[email protected] 17
3. Stratified random sampling
It is done when the population is known to be have heterogeneity with regard to some
factors and those factors are used for stratification
Using stratified sampling, the population is divided into homogeneous, mutually
exclusive groups called strata, and
A population can be stratified by any variable that is available for all units prior to
sampling (e.g., age, sex, province of residence, income, etc.)
A separate sample is taken independently from each stratum.
Any of the sampling methods mentioned in this section can be used to sample within
each02/27/2025
stratum. [email protected] 18
If you create strata within which units share similar characteristics (e.g.,
income) and are considerably different from units in other strata
(e.g., occupation, type of dwelling) then you would only need a small sample
from each stratum to get a precise estimate of total income for that stratum.
Then you could combine these estimates to get a precise estimate of total
income for the whole population.
02/27/2025 [email protected] 19
If you use a SRS approach in the whole population without stratification, the
sample would need to be larger than the total of all stratum samples to get an
estimate of total income with the same level of precision.
Stratified sampling ensures an adequate sample size for sub-groups in the
population of interest.
When a population is stratified, each stratum becomes an independent
population and you will need to decide the sample size for each stratum.
02/27/2025 [email protected] 20
Allocation of sample size to stratum
Proportionate allocation
n
nj N j
N
• Village A B C D Total
• HHs 100 150 120 130 500
• S. size ? ? ? ? 60
02/27/2025 [email protected] 22
4. Cluster sampling
Sometimes it is too expensive to carry out Simple RS
Population may be large and scattered.
Complete list of the study population unavailable
Travel costs can become expensive if interviewers have to survey people
from one end of the country to the other.
Cluster sampling is the most widely used to reduce the cost
02/27/2025 [email protected] 24
Example
In a school based study, we assume students of the same school are
homogeneous.
We can select randomly sections and include all students of the selected
sections only
Main advantage is Cost reduction
02/27/2025 [email protected] 25
5. Multi-stage sampling
Similar to the cluster sampling, except that it involves picking a sample from
within each chosen cluster, rather than including all units in the cluster.
This type of sampling requires at least two stages.
The primary sampling unit (PSU) is the sampling unit in the first sampling
stage.
The secondary sampling unit (SSU) is the sampling unit in the second
sampling stage, etc.
02/27/2025 [email protected] 26
Woreda PSU
Kebele SSU
Sub-Kebele TSU
HH
02/27/2025 [email protected] 27
In the first stage, large groups or clusters are identified and selected.
These clusters contain more population units than are needed for the final
sample.
In the second stage, population units are picked from within the selected
clusters (using any of the possible probability sampling methods) for a final
sample.
If more than two stages are used, the process of choosing population units
within clusters continues until there is a final sample.
02/27/2025 [email protected] 28
B. Non-probability sampling
In non-probability sampling, every item has an unknown chance of being
selected.
In non-probability sampling, there is an assumption that there is an even
distribution of a characteristic of interest within the population.
This is what makes the researcher believe that any sample would be
representative and because of that, results will be accurate.
02/27/2025 [email protected] 29
For probability sampling, random is a feature of the selection process, rather
than an assumption about the structure of the population.
In non-probability sampling, since elements are chosen arbitrarily, there is no
way to estimate the probability of any one element being included in the
sample.
Also, no assurance is given that each item has a chance of being included
02/27/2025 [email protected] 31
The most common types of non-probability sampling
2. Volunteer sampling
3. Judgment sampling
4. Quota sampling
02/27/2025 [email protected] 32
Sampling Distributions
02/27/2025 [email protected] 33
Introduction
Parameter: Population characteristics or descriptive measure taken
from the population e.g. μ, σ, P etc.
Sample statistic: Any quantity computed from values in a sample e.g.
,sample proportion etc.
The value of population parameters are fixed.
02/27/2025 [email protected] 34
Introduction…
A sampling distribution is a distribution of all possible values of a
statistic computed from samples of the same size randomly selected
from the same population.
Serves to answer probability questions about sample statistics
02/27/2025 [email protected] 36
Central Limit Theorem
The central limit theorem states that if you have a population with mean μ
and standard deviation σ then the distribution of the sample means will be
approximately normally distributed provided the sample size is sufficiently
large (usually n > 30).
If the population is normal, then the theorem holds true even for samples
smaller than 30.
For the population proportions, provided that (np, n(1-p))> 5, where n is the
sample size and p is the probability of success in the population.
02/27/2025 [email protected] 37
So we can use the normal probability model to quantify uncertainty
when making inferences about a population mean based on the
sample mean.
When the sampling is done from a non-normally distributed
population, the central limit theorem is used.
The larger the sample size, the better will be the normal
approximation to the sampling distribution of the mean.
02/27/2025 [email protected] 38
Applications of the sampling distributions of
sample mean
Helps in computing the probability of obtaining a sample with a
mean of some specified magnitude.
z-value for sampling distribution of x
(x μ)
z
σ
n
02/27/2025 [email protected] 40
Solution:
with mean μx = 8
and σ 3
σx 0.5
n 36
02/27/2025 [email protected] 41
7.8 - 8 μx -μ 8.2 - 8
P(7.8 μ x 8.2) P
3 σ 3
36 n 36
P(-0.4 z 0.4) 0.3108
-0.4 0.4
x 7.8
μx 8
8.2
x μz 0 z
02/27/2025 [email protected] 42
B. Distribution of the sample
proportion
The sample proportion is derived from counts or frequency data.
Sample proportion =
Population proportion = p or π
02/27/2025 [email protected] 43
Population proportion (p) = the proportion of population having some
characteristic
02/27/2025 [email protected] 44
Properties of the sampling distribution of
sample proportion
Construction of the sampling distribution of the sample proportion is done
in a manner similar to that of the mean.
Applying the central limit theorem, the shape of the sampling distribution is
approximately normal provided that n is large enough.
The mean of the distribution, μp, will be equal to the true population
proportion, p, and the variance of the distribution, σp2 will be equal to
p(q)/n.
02/27/2025 [email protected] 45
How large does n need to be?
Central limit theorem for proportions:
np 5
n(1 p) 5
02/27/2025 [email protected] 46
z-Value for Proportions
Standardize p to a z value with the formula:
p p p p
z
σp p(1 p)
n
02/27/2025 [email protected] 47
Example
According to a recent estimate, 19.4% of the adult male population was obese. What is
the probability that in a random sample of size 150 from this population fewer than 15%
will be obese?
nq=150 *0.806=120.9>5
02/27/2025 [email protected] 48
Find the z score
02/27/2025 [email protected] 49
THANKS FOR YOUR ATTENTION!!!!
02/27/2025 [email protected] 50