0% found this document useful (0 votes)
18 views48 pages

Sampling and Sample Size Calculation - 2017

The document outlines the objectives and methods of sampling, including definitions, sampling errors, and sample size calculations. It discusses the importance of sampling for obtaining information from large populations efficiently and accurately, and details various sampling techniques such as probability and non-probability sampling. Additionally, it provides formulas for calculating sample sizes and adjustments based on population size, emphasizing the need for representativeness in samples.

Uploaded by

moses
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views48 pages

Sampling and Sample Size Calculation - 2017

The document outlines the objectives and methods of sampling, including definitions, sampling errors, and sample size calculations. It discusses the importance of sampling for obtaining information from large populations efficiently and accurately, and details various sampling techniques such as probability and non-probability sampling. Additionally, it provides formulas for calculating sample sizes and adjustments based on population size, emphasizing the need for representativeness in samples.

Uploaded by

moses
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Sampling and Sample Size

Calculation

Sep, 2024
Dr. Okube Tekeste
Objectives: sampling
To understand:
• Why we use sampling
• Definitions in sampling
• Sampling errors
• Main methods of sampling
• Sample size calculation
Why do we use sampling?
Get information from large populations with:
– Reduced costs
– Reduced field time
– Increased accuracy
– Enhanced methods
Research topic
• Prevalence and factors associated with
hypertension among adults attending Church
X
• Inclusion criteria: All adults attending Church X
• Exclusion Criteria:
 those adults who may not willing to participate
 Those who are feeling mentally or physically
Population:
 A population is the entire aggregation of cases
that meet a designed set of criteria.
 In identifying the population, you must state
the inclusion/exclusion criteria.

 Theinclusion/exclusion criteria forms the


basis on which a decision is made about
whether or not an individual would or would
not be classified as a member of the
population in question.
A Study Population
• A study population is the entire group of individuals,
events or objects that have common observable
characteristics.
• The population is the larger set from which the sample
was taken; contains all the subjects of interest.
For example:
I. All expectant women attending ANC at X hospital.

I. All under fives in a given community.

II. All hypertensive patients attending KNH

I. All academic staff with PhD degree


 A sample consists of a subject of the units that
compose the population.
 A sample is a set of observations drawn from a
larger population.

 The sample is the numbers (data) collected.

 A sample should be representative of the


population. A representative sample is one
whose key characteristics closely approximate
those of the population.
Definition of sampling
 Sampling refers to the process of selecting a portion
of the population to represent the entire population.
 Sampling is the use of a subset of the population to
represent the whole population.
 n = z2. p.q/ e2 (if the target population >10000)
 Where z= 1.96, p= prevalence (20%), q = 1-p, e =
0.05
 n= 1.96x1.96x0.2x 0.8/0.0025 = 245.9 = 246
 Procedure by which some members of a given
population are selected as representatives of the
entire population.
Sample size adjustment
• Since the target population is less than 10000,
sample adjustment will be done using the following
formula:
• Nf = n/1+n/N
• n=246
• N = 500

• NF = 246/ 1+246/500 = 165


Qualitative
• 10- 30% = 10% of 500 = 50
Definition of sampling terms
Sampling unit (element)
• Subject under observation on which information is
collected
– Example: children <5 years, type 2 DM patients,
hospital discharges, health events…

Sampling fraction
• Ratio between sample size and population size
– Example: 100 out of 2000 (5%)
Representativeness (validity)
A sample should accurately reflect distribution of
relevant variable in population

• Person e.g. age, sex


• Place e.g. urban vs. rural
• Time e.g. seasonality

Representativeness essential to generalise

Ensure representativeness before starting,

Confirm once completed


Types of sampling
Sampling techniques fall under two main groups:

 Probability sampling

 Non-probability sampling
Probability Sampling
• Probability sampling looks at the entire group of
individuals, events or objects that have common
observable characteristics.

• It has been found to give accurate results when


one is studying groups that are too large to study
in their entity.
Probability Sampling
The following are the most commonly used
methods in probability sampling:
 Simple random sampling
 Systematic random sampling (500/165=3)

 Stratified sampling
 Cluster sampling
 Multi-stage sampling
Sampling methods

Simple Random Sampling


• This is the simplest form of probability sampling.
• It means that every sampling unit in the population
has an equal chance of being included in the
sample. You can draw a simple random sample
using the following steps:
• Make a list of all the units in the population to be
studied
• Decide on the sample size
• Select the required number of units using ballot or
lottery method or random numbers
Sampling methods
Simple Random Sampling…
• For example to draw a random sample of 25 patients from
a list of 250 using the ballot method, you follow this
method:
• Give each client a number (1 - 250)
• Write them on a small piece of paper
• Fold them individually and put them in a box
• Shake the box vigorously to mix them
• Pick 25 pieces one by one and note the numbers and
record.
• Each patient is a unit and the names of the patients on
these numbered papers form the sample or study
population.
Systematic random sampling
• Principle
– Select sample at regular intervals based on sampling
fraction
• Advantages
– Simple
– Sampling error easily measured
• Disadvantages
– Need complete list of units
Systematic Random Sampling:
• Here you first decide the sample size you want
and then proceed to select the individuals or
units using a systematic method.
Systematic sampling

• N = 1200, and n = 60
 sampling fraction = 1200/60 = 20

• List persons from 1 to 1200

• Randomly select a number between 1 and 20


(ex : 8)
 1st person selected = the 8th on the list
 2nd person = 8 + 20 = the 28th etc .....
Stratified sampling
• Principle :
– Divide sampling frame into homogeneous subgroups
(strata) e.g. age-group, occupation;

– Draw random sample in each strata.


Sampling methods
Stratified Sampling:
• This is dividing the sample frame into smaller sub
samples in order to enable you to capture the
variable aspects of each subgroup.
• This method is used when the study population is
very variable, for example, different ethnic groups,
different ecological areas, or age groups.
• It allows you to subdivide the population into sub
populations which are more homogeneous.
• You then apply simple random sampling to each
subgroup or stratum.
nursing =100 == 100/30 == 3 (1,2, 3)
Pharmac y =200
Medicine 300
Ph. = 66

Total population = 666


Sample size = 200

Nursing: 100/666 x 200 = 30


Pharmacy = 60
Medicine =90
PH = 20
Stratified sampling
• Advantages
– Can acquire information about whole population and
individual strata
– Precision increased if variability within strata is less
(homogenous) than between strata.

• Disadvantages
– Can be difficult to identify strata
– Loss of precision if small numbers in individual strata
• resolve by sampling proportionate to stratum
population
Sampling methods

Cluster Sampling:
• In this method, you randomly select groups or
clusters and not the individuals or cases.

• This method is used when it is not possible to


obtain a sampling frame because the
population is either too large or scattered over
a large geographical area.
Cluster Sampling:
 For example, say you want to study patients
suffering from malaria in your district.
 It would be expensive and time consuming to
compile a list of all malaria patients who have
been hospitalized in your district.
 So the logical thing is to list all health facilities in
your district and then randomly select them
according to your sample size.
 Once you select them, you would then include all
the malaria patients in those health facilities in
your sample.
Cluster sampling
• Advantages
– Simple as complete list of sampling units within population
not required

– Less travel/resources required

• Disadvantages
– Potential problem is that cluster members are more likely
to be alike, than those in another cluster (homogenous)….

– This “dependence” needs to be taken into account in the


sample size….and the analysis (“design effect”)
 This is a development of cluster sampling. It is
called multi-stage sampling because successive
stages in cluster sampling are used.

 This technique is meant for big enquiries


extending to a considerably large geographical
area like the entire country.

 If random sampling is applied at all stages, the


sampling procedure is described as multistage
random sampling.
Multi-Stage Sampling
• For example, if we intend to select 12 health facilities
in a district with 36 facilities, we can first group them
into the various clusters or levels of health care such as
community clinics, health centers, and hospitals.
• We can then randomly select 12 facilities from the 3
groups.
• This is two-stage sampling. If we want to select 30
prescriptions from each facility, we can continue from
this stage with the process described in selecting every
42nd prescription under systematic sampling. This is
three-stage sampling.
Non Probability Sampling Methods:
• Non probability sampling methods are used when a
researcher is not interested in selecting a sample
that is representative of the population.
• They are mainly used in qualitative studies where the
focus is on in-depth information rather than making
generalisations.
• Some examples of non-probability sampling methods are:
1. convenient sampling (available sample).
2. quota sampling (Some part of popn must be included).
3. purposive sampling (units with particular c/tics).
Convenience Sampling
• In this method, you select cases or units of
observation as they become available.
• For example, a health worker wanting to study
attitudes of villagers towards family planning may
decide to interview all adults visiting Maternal
Child Health or Family Planning (MCH/FP) clinic on
that day.
• Such a sample is useful for giving a first impression
of a situation.
• However, it is not representative of the community.
Convenience Sampling…

• This sample is considered unrepresentative


because some units can easily be missed out
or under selected.
Quota Sampling
• In this method, the researcher simply selects
subjects to fit in identified quotas, say for
example, a certain religion or social class.
• Quota sampling ensures that various groups
or quotas of the population are included in
the study according to some criteria.
• The selection is not random as the
individuals are just picked as they fit into the
identified quotas.
• e.g. if you want to study attitude of people towards
use of family planning methods, catholic…
Purposive Sampling
• Here the researcher simply picks individuals or cases
that have the information or characteristics which
they requires.
• It is sometimes used in one of the stages in the
sampling procedure, for instance, to get the location
or district in which the units of observation have the
required characteristics.
• Once the units are selected, the researcher may then
apply random sampling to obtain the actual sample
of cases.
e.g.
• Assessment of post traumatic stress disorder in Garisa attack.
• Life experience of patients living with HIV/DM/Cancer…
 The governing criterion is to avoid systematic bias and
sampling error. Systematic bias results from errors in
the sampling procedures and increasing the sample
size cannot eliminate it.
 Systematic bias: it is the result of one or more of the
following factors:
 i) Inappropriate sampling frame: If the sampling frame
is inappropriate ,i.e. a biased representation of the
universe, it will result in a systematic bias.

 ii) Defective Measuring Device: If the measuring


device is constantly in error , it will result in systematic
bias. In survey research, systematic bias can result if
the questionnaire or the interviewer is biased.
 iii) Non-respondents: If we are unable to
sample all the individuals initially included in
the sample, there may arise a systematic bias.
 iv) Indeterminacy Principle: Sometimes
individuals may act differently when kept under
observation than what they do when kept in
non-observed situations. This may cause
systematic bias.
 v) Natural bias in the Reporting of data: This
occurs when respondents report data such as
their incomes. People in general understate
their incomes if asked about it for tax
purposes, but they overstate the same if asked
for social-status or their affluence.
A good sampling design should be one that:
 Produce a truly representative design.
 Produce a small sampling error.
 Is viable in the context of funds available for
the research study.
 Is able to control the systematic bias in a
better way.
 Produces results from the sample study that
can be applied in general for the whole
population with a reasonable level of
confidence.
Steps in estimating sample size
for descriptive survey
• Identify major study variable
• Determine type of estimate (%)
• Decide on desired precision of the estimate
• Decide on acceptable risk that estimate will fall outside
its real population value
• Adjust for estimated design effect
• Adjust for expected response rate
Sample size for
descriptive survey
Simple random / systematic sampling
z² * p * q 1.96²*0.15*0.85
n = -------------- ---------------------- = ?
e² (0.05)²

Cluster sampling
z² * p * q 2*1.96²*0.15*0.85
n = g* -------------- ------------------------ =?
d² 0.05²

z: alpha risk expressed in z-score


p: expected prevalence
q: 1 - p
d: absolute precision
g: design effect
Sample size for
descriptive survey
Simple random / systematic sampling
z² * p * q 1.96²*0.15*0.85
n = -------------- ---------------------- = 544
d² (0.03)²

Cluster sampling
z² * p * q 2*1.96²*0.15*0.85
n = g* -------------- ------------------------ = 1088
d² 0.03²

z: alpha risk expressed in z-score


p: expected prevalence
q: 1 - p
d: absolute precision
g: design effect
Sample size determination

𝒛𝟐 𝒑 𝒒
The Fischer’s formula 𝑛= , Where:
𝒅𝟐

• n = Sample size

• Z = Normal deviation at the desired confidence interval. In this


case it will be taken at 95%, Z value at 95% is 1.96.

• P = Proportion of the population with the desired


characteristic.

• Q (1-P) = Proportion of the population without the desired


characteristic.

• d2= Degree of precision; will be taken to be 5%.


The Fischer’s formula…

𝒁𝟐 𝑷(𝟏−𝑷)
• Accordingly: 𝒏 =
𝒅𝟐

• n= (1.96)2 x 0.5 x 0.5 = 384


• (0.05)2
• Therefore, the sample size of the study will be 384.
Sample size adjustment
• n= (1.96)2 x 0.5 x 0.5 = 384
0.052
• Adjustment of sample size if the target population
is less than 10,000
• nf = n/ 1+n/N , when you make your sample less
Sample size adjustment
• nf = n/ 1+n/N , when you make your sample
less
• n= (1.96)2 x 0.5 x 0.5 = 384
• 0.052
• Taking the value of p at 50% with this
assumption (Abebe et at.,2003), the sample
size will be:
• n = 100% x n = 100%x 384 = 768
• p 50%
Sample size determination
• Census

• Level of satisfaction among nurses who work at


Mbagathi Hospital (eg.if they are 100)
 A local health department wishes to
estimate the prevalence of Malnutrition
among children under age 5 in the locality.

 How many children should be included in


the sample so that the prevalence may be
estimated to be within 5% of the true value
with a 95% confidence interval.

 It is known that the true rate is unlikely to


exceed 20%
 a) p=20%
 b) confidence interval=95% and
 c) absolute precision d=0.05
 n=Z2 (p(1-p))/d2 = ?
 Where Z is the corresponding value to
the 95% confidence interval (1.96)

 n=1.96x1.96.0.2x0.8/0.0025 = 246
 The Finite Population Correction (FPC) factor is routinely used in
calculating sample sizes for simple random samples. In fact, many
sample size formulas for simple random samples include the FPC as part
of the formula. It has very little effect on the sample size when the
sample is small relative to the population but it is important to apply the
FPC when the sample is large (10% or more) relative to the population.
Suppose n=384 and N=16450
 The sample size equation solving for n’(new sample size) when taking
the FPC into account is:

 n’=n/(1+n/N)
 where,
 n is the sample size based on the calculations above, and N is
population size.
 Calculating the new sample size using the formula above, we find:
 n’= 384/(1+384/16450)= 375.37

You might also like