0% found this document useful (0 votes)
12 views62 pages

Methodology of Sampling

Good handbook for easy access to sampling

Uploaded by

Vale Diode
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views62 pages

Methodology of Sampling

Good handbook for easy access to sampling

Uploaded by

Vale Diode
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

SAMPLING

METHODS L-2
Prof Dr Najlaa Fawzi
1
POPULATION:
The largest collection of anything, if this
collection has limits, this is finite population,
and if not, this is infinite population.

It can be
A-Population of entities: is the largest collection
of entities in which we have an interest at
a particular time (e.g. population of humans);
each population member has many variables.
2
B-Population of values: it is the largest collection

of values of a random variable from which we


have an interest of a variable for a particular time
e.g. blood urea.

3
4
SAMPLING
 A sample is “a smaller (but hopefully
representative) collection of units from a
population used to determine truths about
that population” Why sample?
 Resources (time, money) and workload
 Gives results with known accuracy that can
be calculated mathematically

Sampling definition: Sampling is a technique


of selecting individual members or a subset of
the population to make statistical inferences
from them and estimate characteristics of
the whole population.
5
6
 What is your population of interest?
To whom do you want to generalize
your results?
All doctors
School children
Women aged 15-45 years
Other
 Can you sample the entire population?

7
SAMPLING

 3 factors that influence sample


representativeness
Sampling procedure
Sample size
Participation (response)

 When might you sample the entire population?

When your population is very small


When you have extensive resources
When you don’t expect a very high response

8
Define the Population

Determine the Sampling Frame

Select Sampling Technique(s)

Determine the Sample Size

Implement the Sampling Process

9
Process
 The sampling process comprises several
stages:
 Defining the population of concern
 Specifying a sampling frame, a set of items
or events possible to measure
 Specifying a sampling method for selecting
items or events from the frame
 Determining the sample size
 Implementing the sampling plan
 Sampling and data collecting
 Reviewing the sampling process 10
Population definition
 A population can be defined as including
all people or items with the characteristic
one wishes to understand.

 Because there is very rarely enough time


or money to gather information from
everyone or everything in a population, the
goal becomes finding a representative
sample (or subset) of that population.

11
Population definition
 Note also that the population from which the sample
is drawn may not be the same as the population about
which we actually want information. Often there is
large but not complete overlap between these two
groups due to frame issues etc.

 Sometimes they may be entirely separate - for


instance, we might study rats in order to get a
better understanding of human health, or we might
study records from people born in 2013 in order to
make predictions about people born in 2014.

12
SAMPLING FRAME
The sampling frame is the list from which the
potential respondents are drawn
 Registrar’s office
 Class lists
 Must assess sampling frame errors

13
 A sampling frame has the property that we
can identify every single element and include
any in our sample .

 The sampling frame must be representative


of the population

14
Types of sampling methods: they are probability
methods and non-probability methods, the

problem in the second type that they cannot


be generalized.

15
Types of Samples
 Probability (Random) Samples
 Simple random sample
 Systematic random sample
 Stratified random sample
 Multistage sample
 Multiphase sample
 Cluster sample
 Non-Probability Samples
 Convenience sample
 Purposive sample
 Quota
16
 Snow ball sample
PROBABILITY SAMPLING

 A probability sampling scheme is one in which


every unit in the population has a chance
(greater than zero) of being selected in the
sample, and this probability can be accurately
determined.

 . When every element in the population does


have the same probability of selection, this is
known as an 'equal probability of selection'
(EPS) design. Such designs are also referred
to as 'self-weighting' because all sampled units
are given the same weight.
17
In a population of 1000 members, every member will
have a 1/1000 chance of being selected to be a part
of a sample. Probability sampling eliminates bias in the
population and gives all members a fair chance to be
included in the sample.

18
PROBABILITY SAMPLING

 Simple Random Sampling,


 Systematic Sampling,
 Stratified Random Sampling,
 Cluster Sampling
 Multistage Sampling.
 Multiphase sampling

19
Stages in random sampling:

Randomly Systematically
Assign select the select random
Define Develop
each required numbers until
sampling
population frame unit a amount of it meets the
number random sample size
numbers requirements

Is also known as ‘unrestricted random sampling’


– Used in clinical trials
20
SIMPLE RANDOM SAMPLING

• Applicable when population is small, homogeneous &


readily available
• All subsets of the frame are given an equal
probability. Each element of the frame thus has an
equal probability of selection.
• It provides for greatest number of possible samples.
This is done by assigning a number to each unit in the
sampling frame.
• One of the best probability sampling techniques that
helps in saving time and resources
• A table of random number or lottery system is used
to determine which units are to be selected.
21
 Estimates are easy to calculate.
 Simple random sampling is always an EPS design,
but not all EPS designs are simple random
sampling.

 Disadvantages
 It needs complete list of study population ,
which is often difficult to obtain.

 If sampling frame large, this method useless.


 Minority subgroups of interest in population may
not be present in sample in sufficient numbers
for study.

22
REPLACEMENT OF SELECTED UNITS

 Sampling schemes may be without replacement


('WOR' - no element can be selected more than
once in the same sample) or with replacement
('WR' - an element may appear multiple times in
the one sample).
 For example, if we catch fish, measure them,
and immediately return them to the water before
continuing with the sample, this is a WR design,
because we might end up catching and measuring
the same fish more than once. However, if we do
not return the fish to the water (e.g. if we eat
the fish), this becomes a WOR design.
23
Methods of Simple random sampling:
• Lottery method

• Random no. tables

* Computer software.

24
SYSTEMATIC SAMPLING

 Systematic sampling depend on arranging the


target population according to some ordering
scheme and then selecting elements at
regular intervals through that ordered list.

 This method is preferred when the


population is large , scattered and not
homogenous .

25
Systematic sampling involves a random start
and then proceeds with the selection of every
kth element from then onwards.

In this case, k=(population size/sample size).


It is important that the starting point is not
automatically the first in the list, but is
instead randomly chosen from within the first
to the kth element in the list.

26
Systematic Random Sampling
–– Based on sampling fraction: Every Kth unit is chosen
in the population list,
where K is chosen by sampling interval
–– Sampling Interval (K) Q = Total no. of units in
population/ Total no. of units in
sample
–– Applicable for large, non-homogenous populations
where complete list of individuals is available

–– For example, if there is a population of 1000 from


which sample of 20 is to be
chosen, then K = 1000/20 = 50; thus every 50th unit
will be included in the
sample (i.e. 1st, 51st, 101st, so on…)
First unit among first 50 is chosen by simple
random sampling. 27
SYSTEMATIC SAMPLING……
First
unit
select
Work out Select
by
what according
Decide random
fraction to
Define Develop the of the fraction
numbers
sampling then
population sample frame the (100 sample
every
frame size sample from 1,000
frame then nth unit
size 10% so selected
represents every 10th (e.g.
unit) every
10th)

28
As described above, systematic sampling is an EPS
method, because all elements have the same
probability of selection (in the example given, one
in ten). It is not 'simple random sampling' because
different subsets of the same size have different
selection probabilities - e.g. the set
{4,14,24,...,994} has a one-in-ten probability of
selection, but the set {4,13,24,34,...} has zero
probability of selection.

29
 ADVANTAGES:
 Sample easy to select
 Suitable sampling frame can be identified easily
 Sample evenly spread over entire reference
population
 Time and labour for sample collection is
relatively small.
 DISADVANTAGES:
 Sample may be biased if hidden periodicity in
population coincides with that of selection.
 Difficult to assess precision of estimate from
one survey.

30
STRATIFIED SAMPLING

This method used when the population is not


homogenous and is composed of diverse segments.
Where population embraces a number of distinct
categories, the frame can be organized into
separate "strata." Each stratum is then sampled
as an independent sub-population, out of which
individual elements can be randomly selected.
Every unit in a stratum has same chance of being
selected.

31
Every unit in a stratum has same chance of
being selected. This method gives more
representative sampling than simple random
sampling in a given large population.

32
Stratified Random Sampling
–Non-homogenous population is converted to homogenous
groups/classes
(strata); sample is drawn from each strata at random,
in proportion to its size
–Applicable for large non-homogenous population
–– Gives more representative sample than simple random
sampling
–None of the categories is under or over-represented
–– For example, In a population of 1000, sample of 100
is to be drawn for
Hemoglobin estimation; first convert non-homogenous
population is converted
to homogenous strata (i.e. 700 males and 300 females),
then draw 70 males and
30 females randomly respectively
33
Determine Systematic
Develop
the sampling
sampling
proportion of methods can
Define frame
each then be
population according to
population followed to
characteristi
variable of select sample
cs required
interest unit

34
Using same sampling fraction for all strata
ensures proportionate representation in the
sample.

Adequate representation of minority


subgroups of interest can be ensured by
stratification & varying sampling fraction
between strata as required.

35
 Finally, since each stratum is treated as an
independent population, different sampling
approaches can be applied to different
strata.

36
 Types of Stratified Samples
 Proportional Stratified Sample:
 The number of sampling units drawn
from each stratum is in proportion to
the relative population size of that
stratum
 Disproportional Stratified Sample:
 The number of sampling units drawn
from each stratum is allocated according
to analytical considerations e.g. as
variability increases sample size of
stratum should increase 37
 Optimal allocation stratified sample
The number of sampling units drawn
from each stratum is determined on
the basis of both size and variation.

Calculated statistically

38
Advantage
 It is more representative
 It gives estimates with increased precision
 As the population is more concentrated, the
time and money will be saved.

Disadvantage
Requires accurate information on proportions
of each stratum
It is very difficult task to divide the
population into homogenous strata .
Stratified lists costly to prepare. This may
require considerable time , money and
statistical expertise. 39
Disadvantages to using stratified sampling.
First, sampling frame of entire population has
to be prepared separately for each stratum
Second, when examining multiple criteria,
stratifying variables may be related to some,
but not to others, further complicating the
design, and potentially reducing the utility of
the strata.
Finally, in some cases (such as designs with a
large number of strata, or those with a
specified minimum sample size per group),
stratified sampling can potentially require a
larger sample than would other methods

40
Select a stratified random sample of 20 patients
from 200 patients.
Disease Disease Disease Disease TOTAL
A B C D
No. of 100 60 20 20 200
patients
% 50 30 10 10 100

Out of 20 patients the no. to be selected wise area.

Strata Disease A ( 100/ 200)X 20 = 50% OF 20 = 10


Disease B(60/ 200)X 20 = 30% OF 20 =6
Disease C(20/ 200)X 20 = 10% OF 20 =2
41
Disease D(20/ 200)X 20 = 10% OF 20 =2
Cluster Random Sampling
–Applicable when units of population are natural
groups or clusters.
* Clusters are heterogeneous within themselves
but homogenous with respect to each other

*Clusters are identified and included in a


sample based on demographic parameters like
age, sex, location, etc.

o Often used to evaluate vaccination coverage


in EPI
42
 Cluster sampling is an example of 'two-stage
sampling' .
 First stage a sample of areas is chosen;
 Second stage a sample of respondents within
those areas is selected.
 Population divided into clusters of homogeneous
units, usually based on geographical contiguity.
 Sampling units are groups rather than individuals.
 A sample of such clusters is then selected.
 All units from the selected clusters are studied.

43
Two types of cluster sampling methods
One-stage sampling. All of the elements within
selected clusters are included in the sample.

Two-stage sampling. A subset of elements


within selected clusters are randomly
selected for inclusion in the sample.

44
45
 Advantages :
 Cuts down on the cost of preparing a sampling
frame.
 This can reduce travel and other administrative
costs.
 Requires list of all clusters, but only of individuals
within chosen clusters
 Can estimate characteristics of both cluster and
population

 Sampling interval is also calculated in CRS


 Accuracy : Low error rate of only ± 5%

46
Disadvantages:
Sampling error is higher for a simple random
sample of same size. Clusters cannot be
compared with each other

Often used to evaluate vaccination coverage


in EPI

Use in India: Evaluation of immunization coverage


–– WHO technique used: 30 × 7 technique (total = 210
children)
–– WHO technique used in CRS: 30 × 7 technique (total
= 210 children)
* 30 clusters, each containing
* 7 children who are 12 – 23 months age and are
completely immunized for 47
primary immunization (till Measles vaccine)
Difference Between Strata and Clusters

 Although strata and clusters are both non-


overlapping subsets of the population, they
differ in several ways.

 All strata are represented in the sample; but
only a subset of clusters are in the sample.

 With stratified sampling, the best survey


results occur when elements within strata are
internally homogeneous.
 However, with cluster sampling, the best
results occur when elements within clusters are
internally heterogeneous 48
Moreover, by avoiding the use of all sample
units in all selected clusters, multistage
sampling avoids the large, and perhaps
unnecessary, costs associated with traditional
cluster sampling.

49
MULTISTAGE SAMPLING
 Complex form of cluster sampling in which two or
more levels of units are embedded one in the
other.

 First stage, random number of districts chosen


in all states.

 Followed by random number of towns , villages.

 Then third stage units will be houses.

 All ultimate units (houses, for instance) selected


at last step are surveyed.
50
 This technique, is essentially the process of taking
random samples of preceding random samples.

 Not as effective as true random sampling, but


probably solves more of the problems inherent to
random sampling.

 An effective strategy because it banks on multiple
randomizations. As such, extremely useful.

 Multistage sampling used frequently when a


complete list of all members of the population not
exists and is inappropriate.

51
Multistage Random Sampling
–Is done in successive stages; each successive
sampling unit is nested in the previous sampling
unit. For example, in large country surveys,
states are chosen, then districts, then
villages, then every 10th person in village as
final sampling unit.

52
Advantage:
 Introduces flexibility in sampling. This
method is very helpful in many large scale
surveys where population list preparation is
difficult.
 It is less expensive and less time consuming.
 It permits available resources to be
concentrated on
limited numbers of units of the frame.

Disadvantages
 Sampling error is usually increased.
 Sampling units will be of unequal size at various
stages resulting in analytical difficulties.
53
MULTI PHASE SAMPLING
• Is done in successive phases
 Part of the information collected from whole
sample & part from subsample.

 Study of nutrition ,all the families in the original


sample are covered for KAP study in 1st phase.
A sub –sample of the families is than surveyed
for dietary intake in 2nd phase
 Then a sub-sample of family members covered in
2nd phase is subjected to anthropometric
examination in 3rd phase.
 Survey by such procedure is less costly, less
difficult & more purposeful
54
A further sub –sample from 3rd phase is subjected to
Biochemical tests in 4th phase
Thus the number of subject or units gets reduced in every
Succeeding phase, thereby reducing the magnitude of the
Complicated and costly procedure reserved for the last
Phase.

Survey by such procedure is less costly, less


difficult & more purposeful

55
MATCHED RANDOM SAMPLING

A method of assigning participants to groups


in which pairs of participants are first
matched on some characteristic and then
individually assigned randomly to groups.

 The Procedure for Matched random sampling


can be briefed with the following contexts.

 Two samples in which the members are clearly


paired, or are matched explicitly by the
researcher. For example, IQ measurements or
pairs of identical twins.
56
• Those samples in which the same
attribute, or variable, is measured
twice on each subject, under
different circumstances. Commonly
called repeated measures.
• Examples include the times of a
group of athletes for 1500m before
and after a week of special training;
the milk yields of cows before and
after being fed a particular diet.

57
Uses of probability sampling
There are multiple uses of probability sampling.
They are:
•Reduce Sample Bias: Using the probability
sampling method, the bias in the sample derived
from a population is unimportant to non-existent.
The selection of the sample mainly describes the
understanding and the inference of the
researcher. Probability sampling leads to higher
quality data collection as the sample
appropriately represents the population.

58
•Diverse Population: When the population is infinite
and different, it is essential to have adequate
representation so that the data is not tilted towards
one demographic. For example, if Square would like
to understand the people that could make their point-
of-sale devices, a survey conducted from a sample of
people across the US from different industries and
socio-economic backgrounds helps.

•Create an Accurate Sample: Probability sampling


helps the researchers plan and create an accurate
sample. This helps to obtain well-defined data.

59
Example: Identify each of the following examples as
(qualitative) or numerical (quantitative) variables.

1. The place of residence for each student in a statistics


class.
2. The amount of gasoline pumped by the next 10
customers at the gas station .
3. The amount of water in the tanks of each of 25 homes in
certain city.
4. The color of the T-shirts worn by each of 20 children .

5. The length of time to complete a statistic homework


assignment.

60
Example: Identify each of the following as examples of (1)
nominal, (2) ordinal, (3) discrete, or (4) continuous
variables:

1. The length of time until a pain reliever begins to work.


2. The number of chocolate chips in a cookie.
3. The number of colors used in a statistics textbook.
4. The brand of refrigerator in a home.
5. The overall satisfaction rating of a new car.
6. The number of files on a computer’s hard disk.
7. The pH level of the water in a swimming pool.
8. The number of doctors in a health center .

61
END OF PART ONE

62

You might also like