Sampling
Sampling
and
SAMPLING METHODS
SAMPLING
• If the data you collect really are the same as you
would get from the rest, then you can draw
conclusions from those answers which you can relate
to the whole group.
3
The need to sample
Sampling- a valid alternative to a census when;
9
Sampling Frame
• Within this population, there will probably be only
certain groups that will be of interest to your study,
this selected category is your sampling frame.
Populations can have the following
characteristics:
Characteristics Explains Examples
homogeneous all cases are similar bottles of beer on a production line
stratified contain strata or layers people with different levels of income: low,
medium, high
13
14
SAMPLING BREAKDOWN
SAMPLING…….
STUDY POPULATION
SAMPLE
TARGET POPULATION
15
SAMPLING METHODS
Process
The sampling process comprises several stages:
1. Defining the population of concern
2. Specifying a sampling frame, a set of items
or events possible to measure
3. Specifying a sampling method for selecting
items or events from the frame
4. Determining the sample size
5. Implementing the sampling plan
6. Sampling and data collecting
7. Reviewing the sampling process
17
Sampling techniques
Probability sampling techniques give the most reliable
representation of the whole population.
Cluster sampling
Systematic sampling
Simple random sampling
• As the name suggests is a completely random method of
selecting the sample. This sampling method is as easy as
assigning numbers to the individuals (sample) and then
randomly choosing from those numbers through an
automated process.
Simple random sampling
• Applicable when population is small, homogeneous &
readily available
• Estimates are easy to calculate.
• Simple random sampling is always an EPS design, but not all
EPS designs are simple random sampling.
Disadvantages
• If sampling frame large, this method impracticable.
• Minority subgroups of interest in population may not be
present in sample in sufficient numbers for study.
23
Selection of a Simple Random
Sample
(i) Lottery Method.
(ii) Use of Table of Random Numbers.
Replacement of selected units
Sampling schemes may be
• without replacement ('WOR' - no element can be selected
more than once in the same sample)
• with replacement ('WR' - an element may appear multiple
times in the one sample).
Numbers grouped in three’s are : 295, 266, 413, 992, 979, 279,
795, 911, 317, 056, 244, 167, 952, 415, 451, 396, 720, 353, 561,
300, 269, 323, 707, 483, 340 …
Example
Draw a random sample (without replacement) of size 5
students from a class of 450 students from a
population of 24 units
11,24,15,13,03
Example
• It may even happen that extract given from the table of random
numbers is so small that we are not able to draw a random sample
of the desired size.
• This difficulty can be overcome by assigning more than one number
to each of the sampling units.
• For previous ex, the first unit may be assigned the numbers :
1 , 1 + 24, 1 + 2 × 24, 1 + 3 × 24, and so on
1, 25, 49, 73, 97, 121, … and so on.
Similarly the second unit may be assigned the numbers :
2, 26, 50, 74, 98, 122, … and so on.
Finally, the last unit may be assigned the numbers :
0, 24, 48, 72, 96, 120, …
Probability of Selection of a Unit
Let the size of the population is N.
One out of N sampling unit is to be chosen.
SRSWOR
The probability of drawing a sampling unit = /
SRSWR
The probability of drawing a sampling unit = /
Probability of Selection of a Unit
Probability of Selection of a Sample
SRSWOR
• Total number of combinations to choose n
sampling units out of N
• sampling unit = C
• The probability of drawing a sample = / C
Probability of Selection of a Sample
SRSWOR
Probability of Selection of a Sample
SRSWR
SRSWR
Total number of combinations to choose n sampling
units out of N
sampling unit = Nn
The probability of drawing a sample = /Nn
Probability of Selection of a Sample
SRSWR
Drawing of sample using R
> heightdata
name height
1 A 151
2 B 152
3 C 153
4 D 154
5 E 155
6 F 156
7 G 157
8 H 158
9 I 159
10 J 160
> names=heightdata$name
> names
[1] A B C D E F G H I J
Levels: A B C D E F G H I J
> heights=heightdata$height
> heights
[1] 151 152 153 154 155 156 157
158 159 160
Drawing of sample using R
40
Stratified Random sampling
Let’s say, 100 (Nh) students of a school having 1000 (N) students
were asked questions about their favorite subject.
It’s a fact that the students of the 8th grade will have different
subject preferences than the students of the 9th grade.
For the survey to deliver precise results, the ideal manner is to
divide each grade into various strata.
Grade Number of students (n)
5 150
6 250
7 300
8 200
9 100
Stratified Random sampling
nh = ( Nh / N ) * n
n = Sample size for hth stratum
h
• 0-20 (693/3103)*100=22
• 21-40 (1203/3103)*100=39
• 41-60 (802/3103)*100=26
• 61-80 (405/3103)*100=13
Stratified Random sampling
Stratified Random sampling
75 105
24 21
Cluster sampling
• It is a way to randomly select participants when they are
geographically spread out. Cluster sampling usually analyzes
a particular population in which the sample consists of more
than a few elements, for example, city, family, university etc.
The clusters are then selected by dividing the greater
population into various smaller sections.
Cluster sampling
• Cluster sampling is an example of 'two-stage
sampling' .
• First stage a sample of areas is chosen;
• Second stage a sample of respondents within
those areas is selected.
• Population divided into clusters of homogeneous
units, usually based on geographical contiguity.
• Sampling units are groups rather than individuals.
• A sample of such clusters is then selected.
• All units from the selected clusters are studied.
50
Cluster sampling
Advantages :
• Cuts down on the cost of preparing a sampling
frame.
• This can reduce travel and other administrative
costs.
Disadvantages
• Sampling error is higher for a simple random sample
of same size.
• Often used to evaluate vaccination coverage in EPI
51
Cluster sampling
• Identification of clusters
– List all cities, towns, villages & wards of cities with
their population falling in target area under study.
– Calculate cumulative population & divide by 30, this
gives sampling interval.
– Select a random no. less than or equal to sampling
interval having same no. of digits. This forms 1st
cluster.
– Random no.+ sampling interval = population of 2nd
cluster.
– Second cluster + sampling interval = 4th cluster.
– Last or 30th cluster = 29th cluster + sampling
interval 52
Cluster sampling
53
Systematic Sampling
• It is when you choose every “nth” individual to be a part of
the sample. For example, you can choose every 5th person to
be in the sample. Systematic sampling is an extended
implementation of the same old probability technique in which
each member of the group is selected at regular periods to
form a sample. There’s an equal opportunity for every member
of a population to be selected using this sampling technique.
Systematic Sampling
• Systematic sampling relies on arranging the target population
according to some ordering scheme and then selecting elements
at regular intervals through that ordered list.
• Systematic sampling involves a random start and then proceeds
with the selection of every kth element from then onwards. In
this case, k=(population size/sample size).
• It is important that the starting point is not automatically the first
in the list, but is instead randomly chosen from within the first to
the kth element in the list.
• A simple example would be to select every 10th name from the
telephone directory (an 'every 10th' sample, also referred to as
'sampling with a skip of 10').
55
Systematic Sampling
ADVANTAGES:
• Sample easy to select
• Suitable sampling frame can be identified easily
• Sample evenly spread over entire reference population
DISADVANTAGES:
• Sample may be biased if hidden periodicity in population coincides
with that of selection.
• Difficult to assess precision of estimate from one survey.
56
Types of Non-probability Sampling
Four main techniques used for a non-probability sample:
Convenience
Judgemental
Snowball
Quota
Convenience Sampling
• It is a non-probability sampling technique used to create sample
as per ease of access, readiness to be a part of the sample,
availability at a given time slot or any other practical
specifications of a particular element.
• Convenience sampling involves selecting haphazardly those
cases that are easiest to obtain for your sample, such as the
person interviewed at random in a shopping center for a
television program.
CONVENIENCE SAMPLING…….
59
59
Judgmental Sampling
• In the judgmental sampling, also called purposive sampling,
the sample members are chosen only on the basis of the
researcher’s knowledge and judgment.
• It enables you to select cases that will best enable you to
answer your research question(s) and to meet your
objectives.
Snowball Sampling
• Snowball sampling method is purely based on referrals and that
is how a researcher is able to generate a sample. Therefore this
method is also called the chain-referral sampling method.
• This sampling technique can go on and on, just like a snowball
increasing in size (in this case the sample size) till the time a
researcher has enough data to analyze, to draw conclusive
results that can help an organization make informed decisions.
Quota Sampling
• Selection of members in this sampling technique happens on
basis of a pre-set standard. In this case, as a sample is formed
on basis of specific attributes, the created sample will have the
same attributes that are found in the total population. It is an
extremely quick method of collecting samples.
• Quota sampling is therefore a type of stratified sample in
which selection of cases within strata is entirely non-random.
Estimation of Population Mean and
Population Variance
• One of the main objectives after the selection of a
sample is to know about the tendency of the data to
cluster around the central value and the scatterdness of
the data around the central value in the population.
• Popular choices are arithmetic mean and variance.
• Population mean is generally measured by arithmetic
mean (or weighted arithmetic mean)
Estimation of Population Mean and
Variance: Notations
Population Variance
Sample Variance
Estimation of Population Mean
Estimation of Population Mean
Population: X1 = 1, X2 = 3, X3 = 5
Population mean = 3
Number of Samples of size 2= C n!/(n-r)!*r!
=3
Suppose the population mean is unknown.
Use sample arithmetic mean to estimate the
population mean.
Estimation of Population Mean
var(Ȳ)=(N-n)/(N-1). (σ2/n)
var(Ȳ)=σ2/n
Sampling and Non-Sampling Errors
• In a sample survey, since only a small portion of the population
is studied, its results are bound to differ from the census
results and thus have a certain amount of error.
• This error would always be there, no matter that the sample is
drawn at random and that it is highly representative.
• This error is attributed to fluctuations of sampling and is called
sampling error.
• Sampling error is due to the fact that only a subset of the
population (i.e., sample) has been used to estimate the
population parameters and draw inferences about the
population.
• Thus, sampling error is present only in a sample survey and is
completely absent in census method.
Sampling Errors
Reasons :
1. Faulty selection of the sample
Purposive or judgment sampling, Random sampling can be used
2. Substitution
Bias
3. Faulty demarcation of sampling units
Eg. Crop cutting surveys, border line cases
4. Error due to bias in the estimation method
5. Variability of the population
Population and sample variance
Sampling Errors
• Sampling errors follow random or chance variations and
tend to cancel out each other on averaging.
• A measure of the sampling error is provided by the
standard error of the estimate.
• Standard error of the estimate is inversely proportional
to the square root of the sample size
Non Sampling Errors
• Non-sampling errors are not attributed to chance and are a
consequence of certain factors which are within human
control.
• These errors can be traced and may arise at any stage of the
enquiry, viz., planning and execution of the survey and
collection, processing and analysis of the data.
• Non-sampling errors are thus present both in census surveys
as well as sample surveys
• Large magnitude in a census survey than in a sample survey
Non Sampling Errors
1. Faulty planning, including vague and faulty definitions of the population or the
statistical units to be used, incomplete list of population-members
2. Vague and imperfect questionnaire which might result in incomplete or wrong
information.
3. Defective methods of interviewing and asking questions.
4. Vagueness about the type of the data to be collected.
5. Exaggerated or wrong answers to the questions which appeal to the pride or
prestige or self-interest
of the respondents.
6. Personal bias of the investigator.
7. Lack of trained and qualified investigators and lack of supervisory staff.
8. Failure of respondents’ memory to recall the events or happenings in the past.
9. Non-response and Inadequate or Incomplete Response.
10. Improper coverage.
Non Sampling Errors
• In a census, sampling error is completely absent so the total error is
non-sampling error.
• A sample survey, on the other hand, contains both sampling and
non-sampling errors.
• In a sample survey non-sampling errors can be controlled by :
(i) Employing qualified and trained personnel
(ii) Using more sophisticated statistical techniques and equipment
(iii) Providing adequate supervisory checks on the field work
(iv) Pre-testing or conducting a pilot survey
(v) Through editing and scrutiny of the results
(vi) Effective checking of all the steps in the processing and analysis of the data
(vii) More effective follow up of non-response cases
(viii) Imparting thorough training to the investigators for efficient conduct of the
enquiry
Biased and Unbiased Errors
Biased Errors
(i) Bias on the part of the enumerator or investigator whose personal
beliefs and prejudices are likely to affect the results of the enquiry
(ii) Bias in the measuring instrument or the equipment used for
recording the observations.
(iii) Bias due to faulty collection of the data, and in the statistical
techniques and the formulae used for the analysis of the data.
(iv) Respondents’ bias. wrong answers with or without purpose
(v) Bias due to Non-response
(vi) Bias in the Technique of Approximations rounding off
Biased and Unbiased Errors
Unbiased Errors
• If the estimated or approximated values are likely to err on
either side, i.e., if the chances of making an over-estimate
is almost same as the chance of making an under-estimate.
• 385, 415, 355, 445 rounded to the nearest complete unit
i.e., hundred
• Since these errors move in both the directions, the errors
in one direction are more or less neutralised by the errors
in the opposite direction and consequently the ultimate
result is not much affected.
• Compensatory Errors
Type I, Type II Errors
80
Example
For the following population, consider all SRSWOR samples of size 3
i 1 2 3 4 5
yi 5 8 3 11 9
Ȳ=(5+8+3+11+9)=7.2
(Population Mean Square)
Yi 1 2 3 4 5
(yi-Ȳ)2 (5-7.2) 2 (8-7.2) 2 (3-7.2) 2 (11-7.2) 2 (9-7.2) 2 40.80
S2=1/4*40.80=10.2
From a population of 5 a sample of size 3 can be drawn in
5C3 ways= 10 ways
No Values Mean S2 (yi-Ȳ) (yi-Ȳ)2
1 1,2,3 16/3 19/3 -1.87 3.48
2
3
4
5
6
7
8
9
10
mean=(5+8+3)=16/3
Yi 1 2 3
(yi-Ȳ)2 (5-16/3) 2 (8-16/3) 2 (3-16/3) 2 12.66
S2=1/2*12.66=6.33=19/3
(16/3-7.2)=-1.87
No Values Mean S2 (yi-Ȳ) (yi-Ȳ)2
1 1,2,3 16/3 19/3 -1.87 3.48
2
3
4
5
6
7
8
9
10
216/3 306/3 13.60
=1/10*216/3=7.2=Ȳ
=1/10*306/3=10.2= S2
In SRSWOR
=1/10*13.6=1.36
=(5-3/5*3) *10.2=(2/15)*10.2=1.36
In SRSWR
=(5-1/5*3) *10.2=(4/15)*10.2=2.72
Varsrswr(ȳ)>Varsrswor(ȳ)
• Explain why a random sample of size 25 is to
be preferred to a random sample of size 20 to
estimate population mean.
var(ȳ)=σ2/n
S.E.(ȳ)=σ/sqrt(n)
Larger the sample smaller is the error