Reseach Methods: Sampling Designs
Reseach Methods: Sampling Designs
SAMPLING DESIGNS
Objectives
Define some of the key terms in sampling
Explain probability and Non –probability
sampling methods and work through the major
types in each.
Justify use of certain sampling designs
Population:
a set which includes all
measurements of interest
to the researcher
(The collection of all
responses, measurements,
or counts that are of interest)
Target population
- The group you would like to sample
from because this is the group you are Sample:
interested in generalizing to. A subset of the
population
Theoretical population
homeless males
Accessible population.
Definition
Sampling is the process of selecting units
(e.g., people, organizations) from a
population of interest so that by studying
the sample we may fairly generalize our
results back to the population from which
they were chosen.
What is a sample?
Any sub set of a target /accessible
population
The proportion of a whole that is meant to
give or provide the researcher with an
insight into the characteristics of the
population from which the sample was
drawn.
Critical issues to consider
The sample design
Locating the subjects
Appropriate number of subjects
LOCATING THE SAMPLE
Sampling frame.
The listing of the accessible population
from which you'll draw your sample
List of all the sampling units from which
sample is drawn
E.g mailing list
Sampling Model.
Step 1 identify the population you would like to
generalize to.
Step 2 draw a fair sample from that population
and conduct your research with the sample.
Step 3 because the sample is representative of the
population, you can automatically generalize your
results back to the population
Challenges of the approach
sometimes you don't know at the time of
your study who you might ultimately like
to generalize to.
you may not be easily able to draw a fair or
representative sample.
It is not possible to sample across all times
that you might like to generalize to
Proximal Similarity Model
Proximal means nearby
Step 1 Identify different generalizability contexts
Step 2 Develop a theory about which contexts are
more like our study and which are less so.
For instance, we might imagine several settings that
have people who are more similar to the people in our
study or people who are less similar. This also holds
for times and places. When we place different contexts
in terms of their relative similarities, we can call this
implicit theoretical a gradient of similarity.
Step 3 Generalize.
How to generalize?
We conclude that we can generalize the
results of our study to other persons, places
or times that are more like (that is, more
proximally similar) to our study.
Notice that here, we can never generalize
with certainty -- it is always a question of
more or less similar.
Why do we need to sample?
Economy of expenditure– less costly
Greater speed - less field time
Greater scope-
Greater accuracy- can do a better job off
data collection
Practicability- when it is not possible to
study the whole population
Rationale1: Economy of expenditure
Data can be readily seen
E.g To establish brand preference for soap, it
cost too much money and consumes too much
time to ask every person in the country if
investigation is to be repeated as we do in
market research
Greater speed
Data can be collected and summarized
more readily especially when information
is required urgently
Greater scope
When complete destruction of a product is
involved a cent per cent test of production
is out of the question.
E.g. to ascertain pressure glass containers
can withstand, we put a sample of
containers under pressure until they break
therefore we do not break every item
produced.
Greater accuracy
Sample results are more accurate than
census results.
Quality data come from samples
Smaller numbers allow quality field study,
checks and tests are affordable at all stages,
editing and analysis of results can be
carefully done
Practicability
Its practical to interview a sample than the
whole population
You don’t really have to study all birds in
order to know how they live.
DESIGN PROCESS
DESIGN PROCESS
1. DEFINE
POPULATION In terms of
Quota sampling
DESIGN PROCESS
DESIGN PROCESS
1. DETERMINE
The number of
SAMPLE SIZE elements or units
of the population to
be sampled is
decided.
DESIGN PROCESS
DESIGN PROCESS
6. SPECIFY SAMPLING Respond to operational
PLAN concerns
How are household to be
The operational defined? What if a selected
procedure for selecting household is vacant?
the sample is described
How to distinguish between
a family and a household (2
families and some distant
relative of one of them share
same apartment
DESIGN PROCESS
DESIGN PROCESS
7. SELECT SAMPLE
Office or field
work necessary
for the selection
of sample are
carried out.
SAMPLING DESIGNS
PROBABILITY SAMPLING METHODS
SIMPLE RANDOM SAMPLING
STRATIFIED RANDOM SAMPLING
QUASI-RANDOM SAMPLING
SUB-SAMPLING
CLUSTER SAMPLING
SEQUENTIAL SAMPLING
INTERPENETRATING SAMPLING
Probability sampling
Involves random selection procedures to
ensure that each unit of the sample is
chosen on the basis of chance. All units of
the study population should have an equal
or at least a known chance of being
included in the sample.
Requires that a listing of all study units (a
sampling frame) exists or can be compiled.
SIMPLE RANDOM
SAMPLING
Each possible sample of n different units
has an equal chance of being selected.
Can be unrestricted and restricted random
sampling
restricted random sampling is done without
replacement so that no unit can appear
more than once in the sample
Does not ensure that the proportions of individuals
with certain characteristics in the sample will be the
same as those in the whole population
Make a numbered list of all units in the population
from which you wish to draw a sample
Decide on the size of the sample
Select the required number of sample units using
Lottery method
Table of random numbers
Simple random sampling
EPSAM
Equal probability selection method (epsam)
From a population of 4 embers A,B,C, and D you
need a sample of size n=2
There are six possible samples without
replacement. AB,AC,AD,BC,BD,CD. You can
chose any two of these combinations.
Each member can be picked three out of 6 times
therefore each member has 3 out of 6 chances of
being selected.
The probability of selecting a member is 3/6 or
1/2
Unrestricted random sampling
is done with replacement -a unit can
appear more than once in the sample.
Some suggestions
If populations finite, one can draw lots or
use table of random numbers
If infinite, one can divide the lot into parts
and randomly select some of these units
Use a sample one can get if population can
not be easily counted/defined (fish in a
dam)
MERITS DEMERITS
Ignores researcher’s prior
Requires minimum
knowledge of population
advance knowledge
Large samples may be
about the population required for reliability
Is free from assurance
classification errors Cases selected may be
too widely spaced
Sampling error can
geographically
easily be computed
Requires completely
Accuracy of estimates catalogued population-
can easily be assessed difficult to get up to date
records/list of population
QUASI-RANDOM SAMPLING
Uses sampling intervals/systematic sampling.
Samples are drawn by selecting every nth item from
the population
Also known as systematic sampling
Sampling interval n = population size
sample size
Randomly select a starting member/number and take
every nth number from there on
E.g. n = 32000/320 =100 and start at random number
6 so start with person/unit 6 the next is
100+6 =106; then 106 +100 =206 etc
Sample will comprise numbers 6, 106, 206; 306 etc
Systematic sampling
DEMERITS
MERITS
Not truly random only first
Is simple to follow item is randomly selected,
Distributes sample rest are predetermined by a
constant interval
more evenly over
May result in biased
entire population samples, by chance every
Less time consuming nth ember may fall on a
critical point/day/event
than simple random
E.g. market day, Sunday,
sampling chisi, china chemadzimai,
executive representative if
organizations are arranged in
order or superiority and you
wish to take every nth
member
STRATIFIED RANDOM
SAMPLING
Used with heterogeneous populations –
populations with defined stratification
variables (low income and high income, male
and female, first class consumers, high class
consumers)
Sampling frame is divided into decrete
strata/groups in such a way that
There is greater homogeneity within groups
There is marked difference between strata
Simple random sampling is then used to
Number each case/ person within each stratum
with a unique number
Select your sample using Simple Random or
Systematic Sampling
This technique is possible only when you know
what proportion of the study population
belongs to each group you are interested in
Sample that includes representative groups of
study units with specific characteristics
Can be proportionate and or unproportionate
Proportionate stratified random
sampling
Number of sample units in various strata are in
the same proportion as found in the population.
The larger the stratum, the more weight it
receives in the analysis
If a stratum accounts for 40% of the sample
frame and a total sample of 100 units is selected
then 40 sampling units are selected from that
stratum
A stratified sample of size n = 60 is to be taken
from a population of size N=4000 which comprise
3 strata X = 2000, and Y = 1200 and Z = 800. If
allocation is to be proportional how large a sample
should be taken from each stratum?
Sample x = n(X/N) = 2000/4000x 60 = 30
Sample y = n(Y/N) = 1200/4000 x 60 = 18
Sample z = n(Z/N) = 800/4000 x 60 = 12
General formula ni = ni . n
Ntotal
ni = stratum size, , n = sample size required, N
population size
Disproportionate stratified
sampling
Strata are represented in the total
population in proportion other than the one
with which they arte found in the
population
A stratum may receive a weighting based
on other factors in addition to its size based
on three factors
Optimal allocation of sample
among strata
If variability in one stratum is greater than that in the others
(measured by their standard deviations) a proportionally
larger sample may be allocated to the more variable stratum
to obtain a meaningful representation of its cross section
If the cost of sampling varies from one stratum to the next
so that the sampling error can be reduced more cheaply in
one stratum than in the other, the cheaper stratum may have
a larger proportion of the sample size allocated to it.
If in the opinion of the researcher, sampling units of one
stratum are more important than those of the other and or
need greater representation in the sample, s/he can allocate
a larger proportion of the sample to that stratum
Disproportionate stratified
random sampling
A population is divided into 3 strata s that X
=5000,1 = 15; Y= 2000, 2 = 18 and Z= 3000,
3 = 5. How should a sample of size n = 84 be
allocated to the three strata if we want optimum
allocation using disproportionate sampling.
Each stratum sample size is determined according
to the proportion which the stratum size
multiplied by its standard deviation (i )bears to
the summation of sizes of all strata multiplied by
their respective standard deviations
From X= 5000,1 = 15; Y= 2000, 2 = 18
and Z= 3000, 3 = 5, we need n=84
Proportion of X = (5000x15)[(5000x15)
+(2000x18)+(3000x5)] = 25/42
Proportion of Y = (2000x18) [(5000x15)
+(2000x18)+(3000x5)] = 12/42
Proportion of Z = (3000 x 5) [(5000x15)
+(2000x18)+(3000x5)] = 5/42
Sample size for X = 84 x 25/42 = 50
Sample size for Y = 84 x 12/42 = 24
Sample size for Z = 84 x 5/42 = 10
MERITS DEMERITS
Ensures Accurate information on
representation of all population proportions
groups are required for
proportionate
A random sample can
stratification
be kept small in size
without losing its It is costly to prepare
stratified lists of all
accuracy
members
Characteristics of Possibilities if having
each stratum can be faulty classifications are
estimated therefore inevitable- variability
comparisons can be therefore increases
made
SUB-SAMPLING
Sampling in stages
Population is divided into a number of
sampling units each is then divided into
smaller units.
a random sample is taken at each stage
This is why it is referred to a multi stage
sampling method
Example
To study 5000 households in Zimbabwe
First divide the country into 10 provinces and
sample randomly e.e. 5 provinces
Divide the five provinces into districts and
randomly select 10 districts
Divide each district into villages and
randomly sect e.g. 2 villages per district etc
Then select 5000 households from the
selected villages at random
MERITS DEMERITS
Sampling lists, Errors increase as
identification and number of sampling
numbering are units selected
required only for decrease
units selected at each
stage
Cuts down on costs if
sampling units are
geographically
defined
CLUSTER SAMPLING
Is diametrically opposed to stratified sampling
Cluster: a group of sampling units close to each other
i.e. crowding together in the same area or
neighborhood
First select at random natural groups of units/clusters
from the population
Chose some of the units within each cluster to make up
a sample
Units within each cluster should be as heterogeneous
as possible
There should be as small a difference as possible
between clusters
Cluster sampling
Section 1 Section 2
Section 3
Section 5
Section 4
Example
You study grocery store sales of town
Select at random a few localities
make a random selection of grocery stores
in selected localities
MERITS DEMERITS
Less costly and time
consuming- fieldwork is
localized Sometimes yield
inaccurate estimates
Only requires lists of
about the population
members of selected
clusters A cluster is
Same sample of clusters sometimes
can be used again and representative of the
again for drawing parent population
random samples of - birds of the same
elements in subsequent feathers cluster
searches together
SEQUENTIAL SAMPLING
Sampling in installments
Researcher takes a small sample at a time
Popular in samples for studies on destructible
items e.g explosives, manufactured goods- goods
that are lost once they are tested
Also used in problems involving alternatives
(accept or reject entire lot or continue further
analysis)
Each installment is selected at random.
MERITS DEMERITS
Usually permits use sample units have to
of small samples be have observable
units.
Selecting respondents using a
Table of random numbers
INTER-PENETRATING
SAMPLING
Means to draw two or more independent
replicas of one sample by some random
method.
Divide population into as many zones as
there are sampling units to be included in
each sample
MERITS
Findings of various sub samples can be
compared with each other before arriving at
final conclusions
Allow assurance against unforeseen mishaps ,
e.g. if you are discard data from one or two
replicas
Allow you to study samples in short time – if
you can assign different replicas to different
researchers
SAMPLING DESIGNS Cont.
NON PROBABILITY SAMPLING
METHODS
PURPOSIVE SAMPLING
QUOTA SAMPLING
CONVENIENCE SAMPLING
CHAIN REFERRAL SAMPLING
SELF SELECTION SAMPLING
USE OF INTERNET
PURPOSIVE SAMPLING
Judgment sampling sample is chosen which is
thought to be typical of the population with
regard to characteristics under study.
Used when working with small samples
Used in
case study research- you select cases that are
particularly informative
Grounded theory designs cases are taken according
research questions and objectives
Sample sizes, which may or may not be fixed
prior to data collection, depend on
the resources and time available, as well as the
study’s objectives.
Purposive sample sizes are often determined on
the basis of theoretical saturation (the point in
data collection when new data no longer bring
additional insights to the research questions).
Purposive sampling is therefore most successful
when data review and analysis are done in
conjunction with data collection.
QUOTA SAMPLING
Selection of people from quotas /persons who
meet certain conditions
Used for large populations- may have high
sample representation depending on the quota
variables chosen
Can be set up quickly
Used when costs constrains are high, data is
required quickly
Has higher chances of controlling sample
contents /can determine sample characteristics
sometimes considered a type of purposive
sampling, is also common.
In quota sampling, we decide while designing
the study how many people with which
characteristics to include as participants.
Characteristics might include age, place of
residence, gender, class, profession, marital status,
use of a particular contraceptive method, HIV
status, etc.
The criteria we choose allow us to focus on
people we think would be most likely to
experience, know about, or have insights
into the research topic.
Then we go into the community and – using
recruitment strategies appropriate to the
location, culture, and study population – find
people who fit these criteria, until we meet
the prescribed quotas
This method that ensures that a certain
number of sample units from different
categories with specific characteristics
appear in the sample as that all these
characteristics are represented
But it can not claim to be representative of
the entire population
Calculations of quotas are based on
available data so data is subject to bias
Choice of quota depends on
Usefulness as a means of stratifying the data
Ability to overcome likely variations between
groups in their availability for interviews
Quotas used in market research include
age, gender,socio-economic status and
social class
Researchers chose whom to interview
within these quota boundaries
Example
An example
An Alcohol choice suspects that religion
might have a strong effect on people’s
attitudes towards certain brands of alcohol.
S/he is afraid to miss out the Wachitawa
who are a minority in the area. He therefore
decides to include in the sample x people
from each category including Wachitawa
and extend data collection period (to
include drinking in secluded and or discrete
periods or places) to obtain desired sample
How do purposive and quota
sampling differ?
Purposive and quota sampling are similar
in that they both seek to identify
participants based on selected criteria.
However, quota sampling is more specific
with respect to sizes and proportions of
subsamples, with subgroups chosen to
reflect corresponding proportions in the
population.
If, for example, gender is a variable of interest
in how people experience HIV infection, a
quota sample would seek an equal balance of
HIV-positive men and HIV-positive women in
a given city, assuming a 1:1 gender ratio in the
population.
Studies employ purposive rather than quota
sampling when the number of participants is
more of a target than a steadfast requirement –
that is, an approximate rather than a strict
quota.
DIMENSIONAL SAMPLING
An extension of quota sampling
Identify various factors of interest in a
population
Obtain at least one respondent for every
combination of these factors
CONVENIENCE SAMPLING
Selection of people who appear to be convenient –
easy to access/ obtain according to the researcher
Sample selection process continue until the required
sample size is attained
Used where there is little variation in population-
beer drinkers, TB patients, HV positive persons,
Company directors
Has lots of biases and generalizations are flawed
Ok if samples are just for pilots to studies using
more structured samples
A method in which for convenience sake,
study units that happen to be at the time and
date of data collection are selected in the
sample
Sample may be unrepresentative of the
population wanted, some units may be over
selected or under selected
Some will be missed altogether.
Study buying patterns during working hours
CHAIN REFERRAL SAMPLING
Used when it is difficult to identify members of the desired
population; to find and recruit hidden populations
You need to
Make contact with one or two cases in the population
Ask the cases to identify further cases
Ask these new cases to identify further new cases etc
E.g identify professional sex workers, drug dealers, black market
forex dealers, orphans, disabled children, people working but
claiming unemployment benefits, people on indefinite sick leave
when they are doing something else, men who continue to live
with their parents into or beyond midlife
Problems of representation are huge respondents identify those
similar to themselves but samples have desired characteristics
Is also known as snowball sampling, friend of a friend
SELF SELECTION SAMPLING
Occurs when you allow a case e.g– an individual, to identify their
desire to
Other exploratory studies take part in the research
You
Publicize our need for cases, either by advert through
appropriate media or asking them to take part
Collect data from those who respond
Cases do so because of their own feelings and opinions about the
research question/s or stated objectives
E.e research on positive management of redundancy researcher
wrote a letter o the personnel trade press and generated a list of
elf selected organizations who were willing to devote time to
being interviewed
Drug tests, health related studies often ask for volunteers
Use of the Internet
Design or hire a company to design website for
your study (www.psychdata.com), or via e-
forum discussions, e-mails, personal contact
Usually they ask each subject to secure online
identification code and password and asked to
type in the appropriate survey
Present all your data collection instruments on
the website clearly and systematically so that
eligible subjects complete them at their
convenience
Download each participant’s response to the
database
What are the advantages and disadvantages
of selecting participants based on their
willingness to access an Internet Web and
complete surveys electronically?
Advantages of Internet use
Get a large number of responses more
efficiently and get data that is relatively
convinient to analyse
Possibility of getting geographically
heterogeneous samples not usually available
when using traditional strategies
Many subjects appreciates and trust the
anonymity they have by addressing a machine
rather than a physical being
Disadvantages of Internet use
We do not know enough about the
comparability of the samples
There may be bias in terms of computer
access and computer saavy
Subjects may fudge their responses using
the more impersonal format
Advantages of probability methods
Are based on the principle of randomization therefore unbiased
– you can not influence the sample composition once population
is operationally defined. Non probability samples are influenced
by personal judgment
Enable calculation of sampling error and degree of confidence
that can be placed in population estimates made on their basis
In random sampling each subject has a known probability of
being selected
RS allows application of statistical sampling theory to results to
Generalise findings and Test hypotheses
Probability samples are the best at ensuring precision and
representativeness
Why use non-probability
methods
Indefinite and small populations
Unavailability of sampling frames
Small budgets
Lack of time
Inexperienced researchers
Pressure for results
Disadvantages of NP sampling
Probability of being chosen is unknown
Cheaper- but unable to generalise
potential for bias
SAMPLING ERROR
When calculating a statistic from a sample and use
it as an estimate of the population parameter, we
subject ourselves to a sampling error
Is the difference between the statistic and the
parameter
Rises due to inductive process of inferring- the fact
that only part of the population has been used to
estimate the parameter
The error is random and can be compensatory-is
equally likely to occur in either direction
Can be reduced by increasing sample sizes
NON SAMPLING ERROR
Systematic and cumulative errors that arise due to
factors such as
Defective sampling frame and faulty sampling
Inappropriate methods of data collection, coding,
tabulation
Untrained researchers
Inadequate inspection and supervision of researchers
Incomplete coverage
Can occur in census
Are not improved by increasing sample sizes but by
improving data collection and compiling techniques
and training researchers
Sampling bias
It is the distortion caused by the way the
sample is no longer representative of the
population.
Major source of bias is the use of non
probability sampling methods- because it is
not possible to specify the probability or
chance that each member of the population
has of being selected for the sample
Determining sample sizes
Determining sample size
Determining sample size for non statistical
considerations depends on
Resources available
Nature of study
Method of sampling involved
Nature of respondents
Fields conditions
Representativeness
The greater the variation within the
populations the larger the sample
The greater the desired precision of an
estimate the larger the sample size
The higher the confidence level in the
estimate, the larger the sample size
The greater the number of subgroups
within the sample, the larger the sample
The methods used to select the sample
must not lead to a biased sample ie the
sample must be truly representative of the
large universe
The characteristics of the sample must be
consistent with those of the population of
interest (if the population of HIV positive
people then the sample must consist of HIV
positive people)
The numerical estimates provided by the
sample must accurately represent the true
values in the population
Use of power analysis
Suppose in the design industry you wish to
compare the level of creativity of artisans who
received apprenticeship training and those who
went through the formal polytechnic; you need
too collect sufficient data for difference to
appear in the results
The smaller the difference the more data you
must collect
To get the appropriate number of subjects
conduct a power analysis as it advises on how
many subjects are necessary to detect any
effects that result from independent variables
You need to have/suppose you are given
The size of the effect of the variables in the
population
The type of statistical test to be used
The level of significance /alpha level of the study
The level of power expressed as a probability lets
you know how likely you are to avoid type II error
( you failing to reject a null hypothesis even
though it is false, when an effect does exist but
was not detected by the study
As the probability of a type II error increases,
the power of a study decreases
Interpretation
Power =1-the probability of a type II error
If the probability of a type II error is .15,
power is 1-.15 = .85
An unpowered study is likely to obtain non
significant findings
Usually power calculations are done using
computer programmes SPSS module, Sample
power www.spss.com
e.g. aQuery Advisor 6.0 found at
http://www/statsolua.com ;
So if you need a power of .08 for a level of
significance .05 for a t test you will have
To use 64 participants for medium effects
Effect size N per group Total N
Small 393 786
Medium 64 128
Large 26 52
Calculations assumes that data will be
calculated from all cases in the sample and
is based on
How confident you need to be that the
estimate is accurate (the level of
confidence in the estimate)
How accurate the estimate needs to be (the
margin or error that can be tolerated)
The proportion of responses you expect to
have some particular attribute
Formula
n=p % x q %x[z/ₑ%]2
Where
n is minimum sample size required
p% is the proportion belonging to the specified
category
q% is the proportion not belonging to the specified
category
z = z value corresponding to the confidence required
e% is the margin of error required
z values for different levels of
confidence
90% certain z= 1.65
95% certain z = 1.96
99% certain z = 2.57
Adjusting minimum sample size
Used in populations less than 10 000
Smaller sample sizes can be used without
affecting accuracy
This is known as adjusting minimum sample size
Formula
n’ = n/[1+(n/N)]
where
n’ is the adjusted minimum sample size
n is the minimum sample size
N is the total population
example
To answer a research question you need to
estimate the proportion of a total population of
4000 home care clients who receive a visit from
their home care assistant at least once a week.
You have been told that you need to be 95%
certain that the ‘estimate’ is accurate (the level of
confidence in the estimate).;; This corresponds to
a z score of 1.96. You have also been told that
your ‘estimate’ needs to be accurate to within
plus or minus 5% of the true percentage (the
margin of error that can be tolerated).
You still need to estimate the proportion of
respondents who receive a visit from their
home care client at least once a week. From
your pilot survey you discover that 12 out
of the 30 clients reeive at least a visit once
a week, in other owrds 40% belong to the
specified category. This means 60% do not.
So now substitute the figures into the
equation for getting a sample size
calculation
N = 40 x 60 x 1.96/5)2
= 2400 x (0.392)2
= 2400 x 0.154
= 369.6
So minimum sample size = 370
Since the total population of home care
clients is 4000 the adjusted minimum
sample size can now be dsetermined as
Calculating adjusted sample size
N’= 369.6/1+[369.6/4000]
= 369.6/1+0.092
= 369.6/1.092
= 338.46