0% found this document useful (0 votes)
4 views

Sampling

The document discusses sampling and sampling methods. It defines sampling as selecting a small group from a large group to make conclusions about the whole group. Probability sampling techniques give a reliable representation by randomly selecting participants so that everyone has an equal chance of selection. Common probability sampling methods include simple random sampling, stratified random sampling, cluster sampling, and systematic sampling.

Uploaded by

Tanya Hinduja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Sampling

The document discusses sampling and sampling methods. It defines sampling as selecting a small group from a large group to make conclusions about the whole group. Probability sampling techniques give a reliable representation by randomly selecting participants so that everyone has an equal chance of selection. Common probability sampling methods include simple random sampling, stratified random sampling, cluster sampling, and systematic sampling.

Uploaded by

Tanya Hinduja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

SAMPLING

and
SAMPLING METHODS
SAMPLING
• If the data you collect really are the same as you
would get from the rest, then you can draw
conclusions from those answers which you can relate
to the whole group.

• This process of selecting just a small group of cases


from out of a large group is called sampling.
SAMPLING
• A sample is “a smaller (but hopefully representative)
collection of units from a population used to determine
truths about that population” (Field, 2005)
• Why sample?
– Resources (time, money) and workload
– Gives results with known accuracy that can be
calculated mathematically
• The sampling frame is the list from which the potential
respondents are drawn
– Registrar’s office
– Class rosters
– Must assess sampling frame errors

3
The need to sample
Sampling- a valid alternative to a census when;

1. A survey of the entire population is impracticable

2. Budget constraints restrict data collection

3. Time constraints restrict data collection

4. Results from data collection are needed quickly


When doing a survey, the question inevitably arises:

• How representative is the sample of the whole


population, in other words;
• How similar are characteristics of the small group
of cases that are chosen for the survey to those of
all of the cases in the whole group?
(i) The Census Method or Complete Enumeration.
(ii) The Sample Method or Partial Enumeration.
Population in Research
• It does not necessarily mean a number of people,
it is a collective term used to describe the total
quantity of things (or cases) of the type which are
the subject of your study.

• So a population can consist of certain types of


objects, organizations, people or even events.
Population
• Note also that the population from which the sample is
drawn may not be the same as the population about
which we actually want information.
• Often there is large but not complete overlap between
these two groups due to frame issues etc .
• Sometimes they may be entirely separate - for instance,
we might study rats in order to get a better understanding
of human health, or we might study records from people
born in 2008 in order to make predictions about people
born in 2009.

9
Sampling Frame
• Within this population, there will probably be only
certain groups that will be of interest to your study,
this selected category is your sampling frame.
Populations can have the following
characteristics:
Characteristics Explains Examples
homogeneous all cases are similar bottles of beer on a production line

stratified contain strata or layers people with different levels of income: low,
medium, high

proportional contains strata of percentages of different nationalities of


stratified known proportions students in a university

grouped by type contains distinctive of apartment buildings – towers, slabs, villas,


groups tenement blocks

grouped by different groups animals in different habitats – desert,


location according to where equatorial forest, savannah, tundra
they are
SAMPLING..
• What is your population of interest?
• To whom do you want to generalize your
results?
–All doctors
–School children
–Indians
–Women aged 15-45 years
–Other
• Can you sample the entire population?
12
SAMPLING...

• 3 factors that influence sample representative-ness


• Sampling procedure
• Sample size
• Participation (response)

• When might you sample the entire population?


(Census Method)
• When your population is very small
• When you have extensive resources
• When you don’t expect a very high response

13
14
SAMPLING BREAKDOWN
SAMPLING…….

STUDY POPULATION

SAMPLE

TARGET POPULATION

15
SAMPLING METHODS
Process
The sampling process comprises several stages:
1. Defining the population of concern
2. Specifying a sampling frame, a set of items
or events possible to measure
3. Specifying a sampling method for selecting
items or events from the frame
4. Determining the sample size
5. Implementing the sampling plan
6. Sampling and data collecting
7. Reviewing the sampling process

17
Sampling techniques
Probability sampling techniques give the most reliable
representation of the whole population.

Non-probability techniques, relying on the judgment


of the researcher or on accident, cannot generally be
used to make generalizations about the whole
population.
Probability Sampling
• It is a sampling technique in which sample from a larger
population are chosen using a method based on the theory of
probability.
• For a participant to be considered as a probability sample,
he/she must be selected using a random selection.
• The most important requirement of probability sampling is
that everyone in your population has a known and an equal
chance of getting selected.
• Equal Probability of Selection' (EPS)
• Probability sampling uses statistical theory to select randomly,
a small group of people (sample) from an existing large
population and then predict that all their responses together
will match the overall population.
Types of Probability Sampling
Four main techniques used for a probability sample:

Simple random sampling

Stratified random sampling

Cluster sampling

Systematic sampling
Simple random sampling
• As the name suggests is a completely random method of
selecting the sample. This sampling method is as easy as
assigning numbers to the individuals (sample) and then
randomly choosing from those numbers through an
automated process.
Simple random sampling
• Applicable when population is small, homogeneous &
readily available
• Estimates are easy to calculate.
• Simple random sampling is always an EPS design, but not all
EPS designs are simple random sampling.

Disadvantages
• If sampling frame large, this method impracticable.
• Minority subgroups of interest in population may not be
present in sample in sufficient numbers for study.

23
Selection of a Simple Random
Sample
(i) Lottery Method.
(ii) Use of Table of Random Numbers.
Replacement of selected units
Sampling schemes may be
• without replacement ('WOR' - no element can be selected
more than once in the same sample)
• with replacement ('WR' - an element may appear multiple
times in the one sample).

For example, if we catch fish, measure them, and immediately


return them to the water before continuing with the sample,
this is a WR design, because we might end up catching and
measuring the same fish more than once. However, if we do
not return the fish to the water (e.g. if we eat the fish), this
becomes a WOR design.
25
Example
Draw a random sample without replacement of 15 students from
a class of 450 students

Numbers grouped in three’s are : 295, 266, 413, 992, 979, 279,
795, 911, 317, 056, 244, 167, 952, 415, 451, 396, 720, 353, 561,
300, 269, 323, 707, 483, 340 …
Example
Draw a random sample (without replacement) of size 5
students from a class of 450 students from a
population of 24 units

11,24,15,13,03
Example
• It may even happen that extract given from the table of random
numbers is so small that we are not able to draw a random sample
of the desired size.
• This difficulty can be overcome by assigning more than one number
to each of the sampling units.
• For previous ex, the first unit may be assigned the numbers :
1 , 1 + 24, 1 + 2 × 24, 1 + 3 × 24, and so on
1, 25, 49, 73, 97, 121, … and so on.
Similarly the second unit may be assigned the numbers :
2, 26, 50, 74, 98, 122, … and so on.
Finally, the last unit may be assigned the numbers :
0, 24, 48, 72, 96, 120, …
Probability of Selection of a Unit
Let the size of the population is N.
One out of N sampling unit is to be chosen.
SRSWOR
The probability of drawing a sampling unit = /
SRSWR
The probability of drawing a sampling unit = /
Probability of Selection of a Unit
Probability of Selection of a Sample
SRSWOR
• Total number of combinations to choose n
sampling units out of N
• sampling unit = C
• The probability of drawing a sample = / C
Probability of Selection of a Sample
SRSWOR
Probability of Selection of a Sample
SRSWR
SRSWR
Total number of combinations to choose n sampling
units out of N
sampling unit = Nn
The probability of drawing a sample = /Nn
Probability of Selection of a Sample
SRSWR
Drawing of sample using R
> heightdata
name height
1 A 151
2 B 152
3 C 153
4 D 154
5 E 155
6 F 156
7 G 157
8 H 158
9 I 159
10 J 160
> names=heightdata$name
> names
[1] A B C D E F G H I J
Levels: A B C D E F G H I J
> heights=heightdata$height
> heights
[1] 151 152 153 154 155 156 157
158 159 160
Drawing of sample using R

sample(names, size=5, replace = FALSE)


> sample(names, size=5, replace = FALSE)
[1] G F A B H
Levels: A B C D E F G H I J

Suppose we want this sample in terms of heights of persons.


sample(heights, size=5, replace = FALSE)
> sample(heights, size=5, replace = FALSE)
[1] 152 156 154 155 158

> sample(names, size=5, replace = TRUE)


[1] F F I E A
Levels: A B C D E F G H I J
Stratified Random sampling
• It involves a method where a larger population can be
divided into smaller groups, that usually don’t overlap but
represent the entire population together. While sampling
these groups can be organized and then draw a sample from
each group separately. A common method is to arrange or
classify by sex, age, ethnicity and similar ways.
Stratified Random sampling

• Where population embraces a number of distinct


categories, the frame can be organized into separate
"strata." Each stratum is then sampled as an
independent sub-population, out of which individual
elements can be randomly selected.
• Every unit in a stratum has same chance of being
selected.
• Using same sampling fraction for all strata ensures
proportionate representation in the sample.
• Adequate representation of minority subgroups of
interest can be ensured by stratification & varying
sampling fraction between strata as required.
Stratified Random sampling
• Finally, since each stratum is treated as an independent population,
different sampling approaches can be applied to different strata.

Drawbacks to using stratified sampling.


• Sampling frame of entire population has to be prepared separately for
each stratum
• When examining multiple criteria, stratifying variables may be related
to some, but not to others, further complicating the design, and
potentially reducing the utility of the strata.
• In some cases (such as designs with a large number of strata, or those
with a specified minimum sample size per group), stratified sampling
can potentially require a larger sample than would other methods
STRATIFIED SAMPLING…….

Draw a sample from each stratum

40
Stratified Random sampling
Let’s say, 100 (Nh) students of a school having 1000 (N) students
were asked questions about their favorite subject.
It’s a fact that the students of the 8th grade will have different
subject preferences than the students of the 9th grade.
For the survey to deliver precise results, the ideal manner is to
divide each grade into various strata.
Grade Number of students (n)
5 150
6 250
7 300
8 200
9 100
Stratified Random sampling
nh = ( Nh / N ) * n
n = Sample size for hth stratum
h

N = Population size for hth stratum


h

N = Size of entire population


n = Size of entire sample

Stratified Sample (n5) = 100 / 1000 * 150 = 15


Stratified Sample (n6) = 100 / 1000 * 250 = 25
Stratified Sample (n7) = 100 / 1000 * 300 = 30
Stratified Sample (n8) = 100 / 1000 * 200 = 20
Stratified Sample (n9) = 100 / 1000 * 100 = 10
Stratified Random sampling
Stratified Random sampling

• 0-20 (693/3103)*100=22
• 21-40 (1203/3103)*100=39
• 41-60 (802/3103)*100=26
• 61-80 (405/3103)*100=13
Stratified Random sampling
Stratified Random sampling

75 105
24 21
Cluster sampling
• It is a way to randomly select participants when they are
geographically spread out. Cluster sampling usually analyzes
a particular population in which the sample consists of more
than a few elements, for example, city, family, university etc.
The clusters are then selected by dividing the greater
population into various smaller sections.
Cluster sampling
• Cluster sampling is an example of 'two-stage
sampling' .
• First stage a sample of areas is chosen;
• Second stage a sample of respondents within
those areas is selected.
• Population divided into clusters of homogeneous
units, usually based on geographical contiguity.
• Sampling units are groups rather than individuals.
• A sample of such clusters is then selected.
• All units from the selected clusters are studied.

50
Cluster sampling
Advantages :
• Cuts down on the cost of preparing a sampling
frame.
• This can reduce travel and other administrative
costs.
Disadvantages
• Sampling error is higher for a simple random sample
of same size.
• Often used to evaluate vaccination coverage in EPI

51
Cluster sampling
• Identification of clusters
– List all cities, towns, villages & wards of cities with
their population falling in target area under study.
– Calculate cumulative population & divide by 30, this
gives sampling interval.
– Select a random no. less than or equal to sampling
interval having same no. of digits. This forms 1st
cluster.
– Random no.+ sampling interval = population of 2nd
cluster.
– Second cluster + sampling interval = 4th cluster.
– Last or 30th cluster = 29th cluster + sampling
interval 52
Cluster sampling

Freq cf cluster XVI 3500 52500 17


I 2000 2000 1 XVII 4000 56500 18,19
II 3000 5000 2 XVIII 4500 61000 20
III 1500 6500 XIX 4000 65000 21,22
IV 4000 10500 3
XX 4000 69000 23
V 5000 15500 4, 5
XXI 2000 71000 24
VI 2500 18000 6
VII 2000 20000 7 XXII 2000 73000
VIII 3000 23000 8 XXIII 3000 76000 25
IX 3500 26500 9 XXIV 3000 79000 26
X 4500 31000 10 XXV 5000 84000 27,28
XI 4000 35000 11, 12 XXVI 2000 86000 29
XII 4000 39000 13 XXVII 1000 87000
XIII 3500 44000 14,15 XXVIII 1000 88000
XIV 2000 46000 XXIX 1000 89000 30
XV 3000 49000 16
XXX 1000 90000
90000/30 = 3000 sampling interval

53
Systematic Sampling
• It is when you choose every “nth” individual to be a part of
the sample. For example, you can choose every 5th person to
be in the sample. Systematic sampling is an extended
implementation of the same old probability technique in which
each member of the group is selected at regular periods to
form a sample. There’s an equal opportunity for every member
of a population to be selected using this sampling technique.
Systematic Sampling
• Systematic sampling relies on arranging the target population
according to some ordering scheme and then selecting elements
at regular intervals through that ordered list.
• Systematic sampling involves a random start and then proceeds
with the selection of every kth element from then onwards. In
this case, k=(population size/sample size).
• It is important that the starting point is not automatically the first
in the list, but is instead randomly chosen from within the first to
the kth element in the list.
• A simple example would be to select every 10th name from the
telephone directory (an 'every 10th' sample, also referred to as
'sampling with a skip of 10').

55
Systematic Sampling

ADVANTAGES:
• Sample easy to select
• Suitable sampling frame can be identified easily
• Sample evenly spread over entire reference population
DISADVANTAGES:
• Sample may be biased if hidden periodicity in population coincides
with that of selection.
• Difficult to assess precision of estimate from one survey.

56
Types of Non-probability Sampling
Four main techniques used for a non-probability sample:

Convenience
Judgemental
Snowball
Quota
Convenience Sampling
• It is a non-probability sampling technique used to create sample
as per ease of access, readiness to be a part of the sample,
availability at a given time slot or any other practical
specifications of a particular element.
• Convenience sampling involves selecting haphazardly those
cases that are easiest to obtain for your sample, such as the
person interviewed at random in a shopping center for a
television program.
CONVENIENCE SAMPLING…….

– Use results that are easy to get

59
59
Judgmental Sampling
• In the judgmental sampling, also called purposive sampling,
the sample members are chosen only on the basis of the
researcher’s knowledge and judgment.
• It enables you to select cases that will best enable you to
answer your research question(s) and to meet your
objectives.
Snowball Sampling
• Snowball sampling method is purely based on referrals and that
is how a researcher is able to generate a sample. Therefore this
method is also called the chain-referral sampling method.
• This sampling technique can go on and on, just like a snowball
increasing in size (in this case the sample size) till the time a
researcher has enough data to analyze, to draw conclusive
results that can help an organization make informed decisions.
Quota Sampling
• Selection of members in this sampling technique happens on
basis of a pre-set standard. In this case, as a sample is formed
on basis of specific attributes, the created sample will have the
same attributes that are found in the total population. It is an
extremely quick method of collecting samples.
• Quota sampling is therefore a type of stratified sample in
which selection of cases within strata is entirely non-random.
Estimation of Population Mean and
Population Variance
• One of the main objectives after the selection of a
sample is to know about the tendency of the data to
cluster around the central value and the scatterdness of
the data around the central value in the population.
• Popular choices are arithmetic mean and variance.
• Population mean is generally measured by arithmetic
mean (or weighted arithmetic mean)
Estimation of Population Mean and
Variance: Notations

Population Variance

Sample Variance
Estimation of Population Mean
Estimation of Population Mean
Population: X1 = 1, X2 = 3, X3 = 5
Population mean = 3
Number of Samples of size 2= C n!/(n-r)!*r!
=3
Suppose the population mean is unknown.
Use sample arithmetic mean to estimate the
population mean.
Estimation of Population Mean

Sample arithmetic mean is an unbiased


estimator of population mean.
Sample 1=(1,3) Sample mean x1 = 2
Sample 2=(3,5) Sample mean x2 = 4
Sample 3=(1,5) Sample mean x3 = 3
• y̅ is an unbiased estimator of Ȳ in SRSWOR

• y̅ is an unbiased estimator of Ȳ in SRSWR


Sample Mean Example
Y: Height of students in a class
N = 10 : Number of students in the class (Population size)
n = 3 : Number of students in the sample (Sample size)
Name of Student Yi = Height of students (in Centimeters)
A Y1= 151
B Y2= 152 Sample 1: 3rd , 7th and 9th student
C Y3 = 153 y1 = Y3 = 153 cms., y2 = Y7 = 157 cms., y3 = Y9 = 159 cms
D Y4= 154 Sample mean 1(̅ 1) = (153 + 157 + 159)/3= 156.33 cms
E Y5 = 155
F Y6= 156 Sample 2: 2nd , 5th and 4th student
y1 = Y2 = 152 cms., y2 = Y5 = 155 cms., y3 = Y4 = 154 cms
G Y7 = 157
Sample mean 2 (̅ 2) = (152 + 155 + 154)/3= 153.66 cms
H Y8= 158
I Y9 = 159 Sample 3: 1st , 6th and 10th student
J Y10= 160 y1 = Y1 = 151 cms., y2 = Y6 = 156 cms., y3 = Y10 = 160 cms
Sample mean 3 (̅ 3) = (151 + 156 + 160)/3= 155.66 cms
Population mean =Ȳ 155.5

The total number of samples = C=


Variance of Sample Mean

• Population variability is generally measured by variance.


• Several sample can be drawn by SRSWR as well as
SRSWOR from a population.
• Each sample will have different sample mean.
• Sample mean is a statistic, i.e., a function of random
variables.
• So sample mean will also have variance.
Variance of Sample Mean

var(Ȳ)=(N-n)/(N-1). (σ2/n)

var(Ȳ)=σ2/n
Sampling and Non-Sampling Errors
• In a sample survey, since only a small portion of the population
is studied, its results are bound to differ from the census
results and thus have a certain amount of error.
• This error would always be there, no matter that the sample is
drawn at random and that it is highly representative.
• This error is attributed to fluctuations of sampling and is called
sampling error.
• Sampling error is due to the fact that only a subset of the
population (i.e., sample) has been used to estimate the
population parameters and draw inferences about the
population.
• Thus, sampling error is present only in a sample survey and is
completely absent in census method.
Sampling Errors
Reasons :
1. Faulty selection of the sample
Purposive or judgment sampling, Random sampling can be used
2. Substitution
Bias
3. Faulty demarcation of sampling units
Eg. Crop cutting surveys, border line cases
4. Error due to bias in the estimation method
5. Variability of the population
Population and sample variance
Sampling Errors
• Sampling errors follow random or chance variations and
tend to cancel out each other on averaging.
• A measure of the sampling error is provided by the
standard error of the estimate.
• Standard error of the estimate is inversely proportional
to the square root of the sample size
Non Sampling Errors
• Non-sampling errors are not attributed to chance and are a
consequence of certain factors which are within human
control.
• These errors can be traced and may arise at any stage of the
enquiry, viz., planning and execution of the survey and
collection, processing and analysis of the data.
• Non-sampling errors are thus present both in census surveys
as well as sample surveys
• Large magnitude in a census survey than in a sample survey
Non Sampling Errors
1. Faulty planning, including vague and faulty definitions of the population or the
statistical units to be used, incomplete list of population-members
2. Vague and imperfect questionnaire which might result in incomplete or wrong
information.
3. Defective methods of interviewing and asking questions.
4. Vagueness about the type of the data to be collected.
5. Exaggerated or wrong answers to the questions which appeal to the pride or
prestige or self-interest
of the respondents.
6. Personal bias of the investigator.
7. Lack of trained and qualified investigators and lack of supervisory staff.
8. Failure of respondents’ memory to recall the events or happenings in the past.
9. Non-response and Inadequate or Incomplete Response.
10. Improper coverage.
Non Sampling Errors
• In a census, sampling error is completely absent so the total error is
non-sampling error.
• A sample survey, on the other hand, contains both sampling and
non-sampling errors.
• In a sample survey non-sampling errors can be controlled by :
(i) Employing qualified and trained personnel
(ii) Using more sophisticated statistical techniques and equipment
(iii) Providing adequate supervisory checks on the field work
(iv) Pre-testing or conducting a pilot survey
(v) Through editing and scrutiny of the results
(vi) Effective checking of all the steps in the processing and analysis of the data
(vii) More effective follow up of non-response cases
(viii) Imparting thorough training to the investigators for efficient conduct of the
enquiry
Biased and Unbiased Errors

Biased Errors
(i) Bias on the part of the enumerator or investigator whose personal
beliefs and prejudices are likely to affect the results of the enquiry
(ii) Bias in the measuring instrument or the equipment used for
recording the observations.
(iii) Bias due to faulty collection of the data, and in the statistical
techniques and the formulae used for the analysis of the data.
(iv) Respondents’ bias. wrong answers with or without purpose
(v) Bias due to Non-response
(vi) Bias in the Technique of Approximations rounding off
Biased and Unbiased Errors
Unbiased Errors
• If the estimated or approximated values are likely to err on
either side, i.e., if the chances of making an over-estimate
is almost same as the chance of making an under-estimate.
• 385, 415, 355, 445 rounded to the nearest complete unit
i.e., hundred
• Since these errors move in both the directions, the errors
in one direction are more or less neutralised by the errors
in the opposite direction and consequently the ultimate
result is not much affected.
• Compensatory Errors
Type I, Type II Errors

80
Example
For the following population, consider all SRSWOR samples of size 3

i 1 2 3 4 5
yi 5 8 3 11 9

1. Show that ȳ is unbiased estimator of Ȳ


2. Show that s2 is unbiased estimator of S2
3. Calculate sampling variation ȳ and show that it agrees with the formula
(N-n/nN) S2
4. Verify that Varsrswr(ȳ)>Varsrswor(ȳ)
N=5
(Population Mean)

Ȳ=(5+8+3+11+9)=7.2
(Population Mean Square)

Yi 1 2 3 4 5
(yi-Ȳ)2 (5-7.2) 2 (8-7.2) 2 (3-7.2) 2 (11-7.2) 2 (9-7.2) 2 40.80

S2=1/4*40.80=10.2
From a population of 5 a sample of size 3 can be drawn in
5C3 ways= 10 ways
No Values Mean S2 (yi-Ȳ) (yi-Ȳ)2
1 1,2,3 16/3 19/3 -1.87 3.48
2
3
4
5
6
7
8
9
10

mean=(5+8+3)=16/3
Yi 1 2 3
(yi-Ȳ)2 (5-16/3) 2 (8-16/3) 2 (3-16/3) 2 12.66

S2=1/2*12.66=6.33=19/3
(16/3-7.2)=-1.87
No Values Mean S2 (yi-Ȳ) (yi-Ȳ)2
1 1,2,3 16/3 19/3 -1.87 3.48
2
3
4
5
6
7
8
9
10
216/3 306/3 13.60

=1/10*216/3=7.2=Ȳ

=1/10*306/3=10.2= S2
In SRSWOR

=1/10*13.6=1.36

=(5-3/5*3) *10.2=(2/15)*10.2=1.36

In SRSWR
=(5-1/5*3) *10.2=(4/15)*10.2=2.72

Varsrswr(ȳ)>Varsrswor(ȳ)
• Explain why a random sample of size 25 is to
be preferred to a random sample of size 20 to
estimate population mean.

var(ȳ)=σ2/n
S.E.(ȳ)=σ/sqrt(n)
Larger the sample smaller is the error

You might also like