0% found this document useful (0 votes)
5 views50 pages

Biostat Lecture Six

The document discusses sampling methods and distributions, highlighting the importance of sampling in research to make inferences about larger populations. It outlines various sampling techniques, including probability and non-probability methods, their advantages and disadvantages, and the concept of sampling distributions, particularly the Central Limit Theorem. The document emphasizes the need for proper sampling to ensure accurate representation and reliability of research findings.

Uploaded by

Telila Ayela
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views50 pages

Biostat Lecture Six

The document discusses sampling methods and distributions, highlighting the importance of sampling in research to make inferences about larger populations. It outlines various sampling techniques, including probability and non-probability methods, their advantages and disadvantages, and the concept of sampling distributions, particularly the Central Limit Theorem. The document emphasizes the need for proper sampling to ensure accurate representation and reliability of research findings.

Uploaded by

Telila Ayela
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 50

Sampling Methods and Sampling Distributions

TESHOME .D (BSc, MPH)

02/27/2025 [email protected] 1
Introduction
Researchers often use sample survey methodology to obtain information
about a larger population by selecting and measuring a sample from that
population.
Since population is too large, we rely on the information collected from the
sample mainly for cost minimization.
Inferences about the population are based on the information from the
sample drawn from that population.

02/27/2025 [email protected] 2
Introduction…
A sample is a collection of individuals selected from a larger population.

For example, we may have a single sample composed of 50 individuals,


representing a population of 1000 people.
Researchers are not interested in the sample itself, but in what can be
learned from the sample—and how this information can be applied to the
entire population.

02/27/2025 [email protected] 3
Sampling
The process of selecting a portion of the population to represent the entire
population.
A main concern in sampling:
 Ensure that the sample represents the population, and
The findings can be generalized.

02/27/2025 [email protected] 4
Advantages of sampling

Feasibility: Sampling may be the only feasible method of collecting


information.
Reduced cost: Sampling reduces demands on resource such as finance,
personnel, and material.
Greater accuracy: Sampling may lead to better accuracy of collecting data

Greater speed: Data can be collected and summarized more quickly

02/27/2025 [email protected] 5
Disadvantages of sampling:
There is always a sampling error.

Sampling may create a feeling of discrimination within the population.

Difficulties in selecting a truly representative sample

Chance of bias

02/27/2025 [email protected] 6
Errors in sampling
1) Sampling error: Errors introduced due to problems in the selection of a sample.
They cannot be avoided or totally eliminated but can be reduced.

2) Non-sampling error:
 Observational error

 Respondent error

 Lack of preciseness of definition

 Errors in editing and tabulation of data


02/27/2025 [email protected] 7
Sampling Methods
Two broad divisions:

A. Probability sampling methods

B. Non-probability sampling methods

02/27/2025 [email protected] 8
A. Probability sampling
Involves random selection of a sample

Every sampling unit has a known and non-zero probability of selection into
the sample.
Involves the selection of a sample from a population, based on chance.
Probability sampling is:
 More complex,
 More time-consuming and
 Usually more costly than non-probability sampling.
02/27/2025 [email protected] 9
However, because study samples are randomly selected and their
probability of inclusion can be calculated:
 reliable estimates can be produced and
 inferences can be made about the population.

The method chosen depends on a number of factors, such as


the availability of sampling frame
how spread out the population is and
how costly it is to survey members of the population

02/27/2025 [email protected] 10
Most common probability sampling methods
Simple random sampling
Systematic random sampling
Stratified random sampling
Cluster sampling
Multi-stage sampling

02/27/2025 [email protected] 11
1. Simple random sampling

Basic and common sampling technique in quantitative research

The required number of individuals are selected at random from the sampling frame, a
list or a database of all individuals in the population.
Each member of a population has an equal chance of being included in the sample.
To use a SRS method:
Make a numbered list of all the units in the population

Each unit should be numbered from 1 to N (where N is the size of the population)

Select the required number.


02/27/2025 [email protected] 12
The randomness of the sample is ensured by:
Use of “lottery’ methods =for small samples
Table of random numbers
Computer programs
SRS has certain limitations:
Requires a sampling frame.
Difficult if the reference population is dispersed.
Minority subgroups of interest may not be selected.

02/27/2025 [email protected] 13
2. Systematic random sampling

Sometimes called interval sampling

Selection of individuals from the sampling frame is done systematically.

Individuals are taken at regular intervals down the list

The starting point is chosen at random

02/27/2025 [email protected] 14
Important if the reference population is arranged in some order:
Order of registration of patients
Numerical number of house numbers
Student’s registration books

Taking individuals at fixed intervals (every kth) based on the sampling


fraction

02/27/2025 [email protected] 15
Steps in systematic random sampling
1. Number the units on your frame from 1 to N (where N is the total population size).

2. Determine the sampling interval (K) by dividing the number of units in the
population by the desired sample size.

3. Select a number between one and K at random. This number is called the random
start and would be the first number included in your sample.

4. Select every Kth unit after that first number

Note: Systematic sampling should not be used when a cyclic repetition is inherent in the
sampling frame.
02/27/2025 [email protected] 16
Example
To select a sample of 100 from a population of 400, you would need a sampling
interval of 400 ÷ 100 = 4. Therefore, K = 4.
You will need to select one unit out of every four units to end up with a total of 100
units in your sample.
Select a number between 1 and 4 from a table of random numbers.

If you choose 3, the third unit on your frame would be the first unit included in your
sample
The sample might consist of the following units to make up a sample of 100: 3 (the
random
02/27/2025
start), 7, 11, 15, 19...395, 399 (up to N, which is 400 in this case).
[email protected] 17
3. Stratified random sampling

It is done when the population is known to be have heterogeneity with regard to some
factors and those factors are used for stratification
Using stratified sampling, the population is divided into homogeneous, mutually
exclusive groups called strata, and
A population can be stratified by any variable that is available for all units prior to
sampling (e.g., age, sex, province of residence, income, etc.)
A separate sample is taken independently from each stratum.

Any of the sampling methods mentioned in this section can be used to sample within
each02/27/2025
stratum. [email protected] 18
If you create strata within which units share similar characteristics (e.g.,
income) and are considerably different from units in other strata
(e.g., occupation, type of dwelling) then you would only need a small sample
from each stratum to get a precise estimate of total income for that stratum.
Then you could combine these estimates to get a precise estimate of total
income for the whole population.

02/27/2025 [email protected] 19
If you use a SRS approach in the whole population without stratification, the
sample would need to be larger than the total of all stratum samples to get an
estimate of total income with the same level of precision.
Stratified sampling ensures an adequate sample size for sub-groups in the
population of interest.
 When a population is stratified, each stratum becomes an independent
population and you will need to decide the sample size for each stratum.

02/27/2025 [email protected] 20
Allocation of sample size to stratum

Equal allocation: Allocate equal sample size to each stratum

Proportionate allocation
n
nj  N j
N

nj is sample size of the jth stratum

 Nj is population size of the jth stratum

 n = n1 + n2 + ...+ nk is the total sample size


02/27/2025 [email protected] 21
Example: Proportionate Allocation

• Village A B C D Total
• HHs 100 150 120 130 500
• S. size ? ? ? ? 60

02/27/2025 [email protected] 22
4. Cluster sampling
Sometimes it is too expensive to carry out Simple RS
Population may be large and scattered.
Complete list of the study population unavailable
Travel costs can become expensive if interviewers have to survey people
from one end of the country to the other.
Cluster sampling is the most widely used to reduce the cost

The clusters should be homogeneous, unlike stratified sampling where the


strata are heterogeneous
02/27/2025 [email protected] 23
Steps in cluster sampling
Cluster sampling divides the population into groups or clusters.

A number of clusters are selected randomly to represent the total population,


and then all units within selected clusters are included in the sample.
No units from non-selected clusters are included in the sample—they are
represented by those from selected clusters.
This differs from stratified sampling, where some units are selected from
each group.

02/27/2025 [email protected] 24
Example

In a school based study, we assume students of the same school are
homogeneous.
We can select randomly sections and include all students of the selected
sections only
Main advantage is Cost reduction

02/27/2025 [email protected] 25
5. Multi-stage sampling
Similar to the cluster sampling, except that it involves picking a sample from
within each chosen cluster, rather than including all units in the cluster.
This type of sampling requires at least two stages.

The primary sampling unit (PSU) is the sampling unit in the first sampling
stage.
The secondary sampling unit (SSU) is the sampling unit in the second
sampling stage, etc.

02/27/2025 [email protected] 26
Woreda PSU

Kebele SSU

Sub-Kebele TSU

HH

02/27/2025 [email protected] 27
In the first stage, large groups or clusters are identified and selected.

These clusters contain more population units than are needed for the final
sample.
In the second stage, population units are picked from within the selected
clusters (using any of the possible probability sampling methods) for a final
sample.
If more than two stages are used, the process of choosing population units
within clusters continues until there is a final sample.
02/27/2025 [email protected] 28
B. Non-probability sampling
In non-probability sampling, every item has an unknown chance of being
selected.
In non-probability sampling, there is an assumption that there is an even
distribution of a characteristic of interest within the population.
This is what makes the researcher believe that any sample would be
representative and because of that, results will be accurate.

02/27/2025 [email protected] 29
For probability sampling, random is a feature of the selection process, rather
than an assumption about the structure of the population.
In non-probability sampling, since elements are chosen arbitrarily, there is no
way to estimate the probability of any one element being included in the
sample.
Also, no assurance is given that each item has a chance of being included

Reliability cannot be measured and there is no way to measure the precision


of the resulting sample
02/27/2025 [email protected] 30
Despite these drawbacks, non-probability sampling methods can be useful
when descriptive comments about the sample itself are desired.
Secondly, they are quick, inexpensive and convenient.

There are also other circumstances, such as researches, when it is unfeasible


or impractical to conduct probability sampling.

02/27/2025 [email protected] 31
The most common types of non-probability sampling

1. Convenience or haphazard sampling

2. Volunteer sampling

3. Judgment sampling

4. Quota sampling

5. Snowball sampling technique

02/27/2025 [email protected] 32
Sampling Distributions

02/27/2025 [email protected] 33
Introduction
Parameter: Population characteristics or descriptive measure taken
from the population e.g. μ, σ, P etc.
Sample statistic: Any quantity computed from values in a sample e.g.
,sample proportion etc.
The value of population parameters are fixed.

The value of statistic vary from one sample to another.

02/27/2025 [email protected] 34
Introduction…
A sampling distribution is a distribution of all possible values of a
statistic computed from samples of the same size randomly selected
from the same population.
Serves to answer probability questions about sample statistics

When sampling a discrete, finite population, a sampling distribution


can be constructed.
However, this construction is difficult with a large population and
impossible with an infinite population.
02/27/2025 [email protected] 35
If we take a sample and calculate the statistic, e.g., mean.

Take another sample (same size) and calculate mean.

Repeat & repeat & repeat & ………..

We do not expect all the sample means to be the same

They will vary

Put all these sample statistics together to get a distribution of sample


statistics.

02/27/2025 [email protected] 36
Central Limit Theorem
 The central limit theorem states that if you have a population with mean μ
and standard deviation σ then the distribution of the sample means will be
approximately normally distributed provided the sample size is sufficiently
large (usually n > 30).
 If the population is normal, then the theorem holds true even for samples
smaller than 30.
 For the population proportions, provided that (np, n(1-p))> 5, where n is the
sample size and p is the probability of success in the population.
02/27/2025 [email protected] 37
So we can use the normal probability model to quantify uncertainty
when making inferences about a population mean based on the
sample mean.
When the sampling is done from a non-normally distributed
population, the central limit theorem is used.
The larger the sample size, the better will be the normal
approximation to the sampling distribution of the mean.

02/27/2025 [email protected] 38
Applications of the sampling distributions of
sample mean
Helps in computing the probability of obtaining a sample with a
mean of some specified magnitude.
z-value for sampling distribution of x
(x  μ)
z 
σ
n

where: X = sample mean


μ = population mean
σ = population standard deviation
n = sample size
02/27/2025 [email protected] 39
Example
Suppose a population has mean μ = 8 and standard deviation σ = 3.
Suppose a random sample of size n = 36 is selected.
What is the probability that the sample mean is between 7.8 and 8.2?

02/27/2025 [email protected] 40
Solution:

Even if the population is not normally distributed, the central limit


theorem can be used (n > 30)

so the sampling distribution of x is approximately normal

with mean μx = 8

and σ 3
σx   0.5
n 36

02/27/2025 [email protected] 41
 
 7.8 - 8 μx -μ 8.2 - 8 
P(7.8  μ x  8.2)  P   
 3 σ 3 
 36 n 36 
 P(-0.4  z  0.4)  0.3108

Sampling Standard Normal


Distribution Distribution
.1554
+.1554
Sample Standardize

-0.4 0.4
x 7.8
μx 8
8.2
x μz 0 z
02/27/2025 [email protected] 42
B. Distribution of the sample
proportion
The sample proportion is derived from counts or frequency data.

Easier and more reliable, does not depend on variance.

Sample proportion =

Population proportion = p or π

02/27/2025 [email protected] 43
Population proportion (p) = the proportion of population having some
characteristic

Sample proportion ( ) provides an estimate of p:


x number of successes in the sample
p 
n sample size

02/27/2025 [email protected] 44
Properties of the sampling distribution of
sample proportion
Construction of the sampling distribution of the sample proportion is done
in a manner similar to that of the mean.
Applying the central limit theorem, the shape of the sampling distribution is
approximately normal provided that n is large enough.
The mean of the distribution, μp, will be equal to the true population
proportion, p, and the variance of the distribution, σp2 will be equal to
p(q)/n.
02/27/2025 [email protected] 45
How large does n need to be?
Central limit theorem for proportions:

np 5
n(1 p) 5

02/27/2025 [email protected] 46
z-Value for Proportions
Standardize p to a z value with the formula:

p p p p
z  
σp p(1  p)
n

02/27/2025 [email protected] 47
Example
According to a recent estimate, 19.4% of the adult male population was obese. What is
the probability that in a random sample of size 150 from this population fewer than 15%
will be obese?

Note: np = 150*0.194 = 29.1 > 5

nq=150 *0.806=120.9>5

n = 150, p = .194, Find P( p < 0.15)

02/27/2025 [email protected] 48
Find the z score

A value of z = -1.36 gives an area of .0869 which is the probability P


(z < -1.36) = .0869
The probability that p < 15 is .0869.

02/27/2025 [email protected] 49
THANKS FOR YOUR ATTENTION!!!!

02/27/2025 [email protected] 50

You might also like