Unit 12
Unit 12
SYSTEMATIC SAMPLING
12.1 Introduction 5
Objectives
12.2 Sampling -What and Why? 6
12.3 Preliminaries 7
12.4 Simple Random Sampling 9
12.5 Estimation of Population Parameters 13
12.6 Systematic Sampling 18
Linear Systematic Sampling
Circular Systematic Sampling
Advantages and Limitations of Systematic Sampling.
12.7 Summary 21
12.8 Solutions/Answers 21
Appendix-A
12.1 INTRODUCTION
In our day to day routine we often have to make certain judgements about a large bulk
or population after studying a small portion, or a sample of it. For example, a house
wife tastes a spoonful of soup to see whether a little more salt is required before it is
served to guests; a quality control inspector inspects a sample from a tin of oil before
passing the whole tin as acceptable quality; or a doctor takes a few drops of blood
from a patient to decide if the patient has malarial infection. These are typical
examples where it is not practical to examine the entire lot or population and decision
making is done on the basis of sample information. Essentially this is sampling. It
saves time and if you have based your judgement on judicious observations, it serves
your purpose. Think of situations, when decisions are to be taken at macro levels, say,
national level. Sampling has become an effective tool for generating information for
policy formulation at different administrative levels. The above examples have the
special phenomenon that a spoon of soup, tinful of oil and the blood in the patient are
known to be pefiectly homogeneous material so that every part of the material
represents the material exactly. Often, however, we are not in this simple situation.
For example, suppose we are interested in knowing the average weight of an adult
Indian male. Obviously, it will not be satisfactory to measure just a few adult males,
as not all adult males are of the same weight. They show considerable
hetereogeneity. So how do we take a part of a large heterogeneous mass to draw
valid conclusions from it? However, careful considerations are needed for selection of
samples and making valid inferences from these samples. In the subject of 'sampling',
these considerations and criteria are developed on a scientific basis.
In this unit, we shall start by introducing the basic concepts of sampling. We shall
then discuss the simplest procedure for sample collection i.e., simple random sampling
(SRS). In the process you will get introduced to the concept of random numbers,
simple random sampling as a method of selection, estimation as well as considerations
for determination of sample size.
Further, the concept of systematic sampling, which is also a method of selection based
on random sampling procedure, will be introduced alongwith the estimation approach.
Objectives
After reading this unit, you should be able to
select samples using simple random sampling and systematic sampling
procedures;
estimate population mean, populatioTkvariance, proportion, alongwith their
efficiency in SRS;
select samples using systematic sampling by linear and circular systematic
procedures.
El) In the cases mentioned above identify the population and the individuals.
You may notice that observing or inspecting all the individuals in a population can be
very expensive in terms of time and money. In certain instances, like observing the
life of bulbs, inspection may be destructive, so that observing all the bulbs of a
population for calculating their average life is a meaningless exercise. so an
alternative approach is called for. Can you think of an alternative? One alternative is
to observe or measure only some individuals of a population, but estimate the average
forthe whole population based on the few measurements made. The idea is roughly
as follows: Suppose you want to find the total number of mangoes on a tree. First, it is
easy to count the nuniber of mangoes on a "typical" branch. Multiply the number
obtained by the number of branches and you get an estimate. Do you think this
method will yield a good enough estimate? The method adopted in this approach is
called the metbod of sampling. You may realise that in this methodology your final
estimate of the total may depend on the number of fruits on the selected branch. Of
course, if all the branches happen to bear the same number of fruits, it does not matter Simple Random Sampling
which branch you happen to select. You may therefore want to know if all the
branches bear nearly the same number of fruits or, if they differ then by how many
they may differ. In order to do this, again using the principle of sampling, you may
select another branch to get an idea of the difference. If you select the branches with a
priori known probabilities, (for example, by ensuring that any set of two branches
have the same probability of being selected in the sample as any other set), you may
be able to use the theory of probability and statistical inference you have learnt in the
earlier blocks to calculate the likely error in your estimate. The theory and method of
sampling deal with the issue of how to select the sample, how to estimate the
population total or average and how to estimate the error in observing only a sample
instead of the entire population. The error that results from estimation based only on a
sample of observations is called sampling error. In contrast, there may also be
nonsampling errors while observing or measuring an individual. A census approach,
while free from the sampling error may suffer from nonsampling errors. In fact, in
measuring a large number of individuals, due to 'inspection fatigue' the nonsampling
error may be large. The theory of sampling may allow one to estimate the sampling
error but will not be able to assess the extent of nonsampling errors.
In order to have a better grasp of what we have discussed above, you may try this
exercise.
E2) List the advantages and disadvantages of using a sampling approach instead of
a census approach for studying a characteristic.
You may also recall at this stage that in Block-2, you learnt about sampling
distributions or the derived distribution of functions of a sample of observations. The
issues studied in that block are to be distinguished from the issues we will be
considering in this block. In Block-2, you had a conceptual infinite population, such
as the drying time required for a formulation of paint, and you had ten realizations of
the random variable 'drying time' on ten pieces of wood on which this formulation of
paint was applied. These ten observations were assumed to be ten independent
realizations of the same random variable having a theoretical distribution and you
derived the distribution of their average and other functions. The issues to be studied
in this block relate to sampling from a finite population and no theoretical distribution
of the characteristic is assumed.
12.3 PRELIMINARIES
Let us assume that we wish to find out the proportion of votes a particular party A is
expected to get in an election in a particular constituency.
An element is a unit for which information is sought. In this example the element is
a registered voter in the constituency. The study variable will be measured as one if
the voter prefers to vote for the party A, otherwise, the measurement will be taken as
zero.
The population as you already know, is an aggregate of elements about which the
inference is to be made; the collection of all the registered voters of the constituency
in this case constitutes the population. A population is finite if it consists of a finite
number of elements. In this unit we shall consider the case of finite population only.
If the units in the sample are selected using some random mechanism then such a
procedure is called random sampling or probability sampling. In this method
samples are selected according to certain laws of probqbility in which each unit of the
population has some definite probability of being selected in the sample. All other
sampling procedures, which are not based on random procedures but are based on
subjective judgement or convenience of the sampler, are known as non-random
sampling or non-probability sampling. They are also termed as purposive sampling
or judgement sampling. Clearly, inferences drawn on the basis of a purposive sample
can often be subjective and biased.
Random sampling is preferred over non-random sampling for a variety of reasons.
Besides eliminating the subjectivity in selection, it provides a measure of reliability
associated with the estimates developed from the samples Thus, one can make
inferences from the sample with a known level of confidence. As stated earlier, in
random sampling procedure, every unit in the population is assigned definite
probability of selection. The randomness associated with the sampling procedure is the
key to make valid inferences from the sample.
Samples are often selected by adoptinghe procedure of one after the other draw
procedure or unit by unit selection. If the units'selected at one draw are replaced in
the population before the next draw then the procedure is called with replacement
(WR) procedure. If the units are not replaced in the population and the selection is
made from the remaining units then the selection is called without replacement
(WOR).If a population consists of N units and a sample of size n is to be selected,
number of possible samples for with replacement procedure is Nn.In case of without
replacement sample, if the order of sample units is ignored then there are
possible samples.
(3
You may now try to solve the following exercises to see whether you have grasped the
basic concepts of sampling discussed above.
Simple Random Sampling
83) Define population, sampling unit and sampling frame for conducting surveys
on each of the following subjects.
a) Measurement of the volume of timber available in a forest.
b) Annual yield of apple fruit in a hilly district.
c) Study of nutrient contents of food consumed by the residents in a city
E4) Consider a population consisting of 5 villages, the areas (in hectares) of
which are given below
Village A B C D E
Area 760 343 657 550 480
Enumerate all possible WOR samples of size 3. Also write the values of the study
variable (area) for the sampled units.
List all the WR samples of size 3 along with their area values.
Now, we consider the simplest of the random sampling procedures i.e., simple random
sampling.
In simple random sampling procedure with unit by unit selection, every unit has got
equal chance (probability) of selection at every draw. However, the converse is not
true i.e. there are sampling schemes in which every unit gets the same chance of
selection but they are not simple random sampling methods e.g. the systematic
sampling. You shall learn about such sampling methods later in this unit. We now try
to answer the question which may be occurring in your mind.
How to select a simple random sample (SRS)?
We consider the selection of a simple random sample through unit by unit selection
method. At every draw equal probabilities fro be assigned to the available sampling
units of the population. Thus, a pre-requisite for the selection is a random device by
which selections are to be made. The most commonly used procedures for selecting a
SRS are (1) lottery method, (2) through the use of random number tables. Let us
discuss these methods one by one..
Lottery tnethod
As the name suggests, units from the given frame (of size N) are selected using any
procedure of generating a number randomly through lottery procedures. The simplest
method may be writing down N numbers on identical slips of papers and drawing one
of the slips after thoroughly mixing the slips. The number on the selected slip
indicates the unit selected. For instance, suppose we have a population of 400
individuals and we wish to draw a random sample of 40 individuals. We can number
the individuals of the population serially from 1 to 400. We can than take 400
identical slips of paper, write numbers 1 to 400 on them, put them in a box, mix them
thoroughly and pick out 40 slips, one by one without looking. This gives us a random
sample of 40 individuals. In with replacement procedure, the slip is replaced before
the next draw while in case of without replacement it is not replaced. The sampling is
continued till desired number of units are selected.
Any other randomization device such as pack of cards or random disc etc, may be
used. However, the procedure becomes cumbersome if large number of selections is to
be made as numbering of the slips become inconvenient and one has to be careful to
see that the slips are thoroughly mixed after each draw. This method is not so common
in random selections. We now discuss another method which makes use of random
number tables.
Through random number tables
Before discussing the method based on the use of random number tables you may like
to know what random numbers are? Random numbers are numbers generated by a
random procedure involving repeated independent trials. Such numbers are generated
with the help of random digits 0 through 9. When we say random digits 0 through 9 it
is assumed that a trial of the procedure yields each of the ten digits with probability
0.1. One simple way of generating these random digits is to take ten cards of the same
size and write the digits 0 through 9 on the cards so that each card has a different digit.
Then take a large hat, say, toss in the cards and mix them well. Now choose a card at
random from the hat. Write down on a piece of paper the digit appearing on the card
you have chosen. Put the card back into the hat and mix the cards again. Repeat the
procedure by choosing a card at random, writing down the digit appearing on the card,
replacing, mixing, choosing again, and so on. The string of digits we write down
constitutes a string of random digits because it has been produced by a random device
supposed to yield each digit with probability 0.1 in independent trials. Random digits
can also be produced by using a modified roulette wheel in which the wheel is divided
into ten equal parts, each one corresponding to one of the ten digits.
Given random digits, we can get more complicated random numbers. Suppose we
have ge%erated the sequence 3217900597 of ten digits. Then each digit is random,
and also the two digit numbers 32, 17,90,05,97 obtained by taking the numbers two
at a time are random numbers because they have been produced by a random
procedure so that each of one hundred two-digit numbers 00 through 99 has the
probability 0.01 of appearing, and moreover selection of these two digit numbers are
independent. Taking the original ten digit sequence and choosing the two digits at a
time going backward to get 79,50,09,71,23 also gives random two-digit numbers. In
the same way you might think of some other ways to get two-digit numbers using the
generated string, as long as the method does not use the same selection more than
once.
E5) Give two more sequences of 5 two digit random numbers obtained by using
the string 3217900597 of ten digits.
In a similar manner, random numbers with three, four or even more digits can be
obtained by using the given string of random digits. There are several standard
random number tables available which give the arrangement of these numbers in a
rectangular manner. Some of these which are commonly used are prepared by
Tippett (1927), Fisher and Yates (1938), Kendall and Smith (1939), Rand Corporation
(1955), and Rao et al. (1974). One such random number tables is reproduced in
Appendix A. You may note that this random number table can be used as single
digited numbers or two digited numbers or, three or four digited numbers depending
on the size of the population you are sampling from. We shall now illustrate the use
. of random number tables for selecting samples.
We shall discuss here three commonly used methods of using random number tables
for selection of simple random samples.
Direct Approach. The first step in the method is to assign serial numb~rs1 to N to the
N population units. If the population size N is made up of K digit., then consider K
digit random numbers, either row wise or column wise, in the random number table.
The sample of required size is then selected by drawing, one by one, random numbers
from 1 to N, and including the units bearing these serial numbers in the sample.
Simple Random Sampling
Problem 1: Consider a population of 56 households. Select a simple random sample
of 10 households by with replacement as well as without replacement methods.
Solution: Here the sampling unit is a household. The first step is to serially arrange
the households if they are not already arranged so. Since, the population size is a
number consisting of two digits we have to use two digited random number table.
Alternatively, in the table sf random numbers (Appendix A) the first two digits of any
column can be used randomly. While using these tables, it is advisable to take a blind
start on the table by placing your finger on the table with closed eyes - let it be
column 7, row 6 of the first page of Appendix A. Then the first number is 20 and
going down the page subsequent random numbers between 1 to 56 are 12,03 etc.
By selecting first 10 random numbers from 1 to 56, without discarding repetitions for
with replacement procedure (WR), we obtain the serial numbers of the households
in the sample. These are given below:
For without replacement procedure (WOR), repetitions have to be avoided and the
number 15, when appears again at the tenth draw, should be dropped. The next two
digits are then chosen. Thus, a WOR sample will consist of:
--
20 12 03 16 30 15 24 37 01 07
This procedure may involve number of rejections of random numbers, since zero and
all the numbers greater than 56 appearing in the table are not considered for selection.
The use of random numbers has, therefore, to be modified. We now discuss two of
the commonly used modified procedures,
Quotient Approach. As before, let N be a K digit number and N' be the highest K
digit multiple of N, such that N' = Nm for some integer m. Select a random number r
from 0 to N'-1. Then the unit having serial number (Q+l) is included in the sample,
where Q is the quotient when r is divided by m.For instance, if N=24 then N'=96 and
m=4. Let a random number 1=49 be chosen from 0 to 95. Then dividing 49 by 4 one
gets the quotient Q=12. The unit bearing serial number (Q+1)=13 is then serected in
the sample. The process is repeated by selecting each time a new number r from 0 to
95 till a sample of required size is obtained.
As we have mentioned earlier while using the random number tables, any starting
point can be used, and one can move in any predetermined direction along the rows or
columns. However, normally as a convention, column-wise selection is followed. If
Sampling
more than one sample is to be selected in any problem, each should have an
independent starting point.
Besides the methods discussed above, some more methods for sample selection are
available in the literature. However, being operationally inconvenient, these are
usually not employed in practice.
In order to get conversant with the methods discussed above for selecting a simple
random sample, you may try the following exercise. While doing this exercise you
will also get convinced that the number of rejections in quotient and remainder
approach are much less as compared to the direct approach.
E6) Select a simple random sample of 10 households from the same population of 56
households by WR and WOR methods, using remainder approach and quotient
approach.
The whole purpose of sampling is to collect information about the population from
which the sample is drawn. It is used to study the unknown characteristics in the
population called parameters. However, a sample cannot tell us about the population
parameters exactly, it can only estimate parameters of the population. In the next
section we shall see how these estimations are done.
In order to infer about population parameters, we compute various quantities from the
sample. These computed quantities from the sample are called statistics. In general,
we can say that a statistic is any quantity computed from a sample. Values such as
mean, variance and standard deviations derived from samples are sample statistics
which are then used to estimate population parameters and hence are called
estimators.
2
- 1 N N
Y = - yi = N - ~ Y ,where Y = Total population and o2 = (yi - Y ) (1)
N i=1 i=l ,
respectively. The corresponding formulas for sample mean and variance are given
by
respectively for a sample of size n with variate values yl,yz, ...y,. . You may also
note here that a paramter is a fmed unknown quantity. For example, the average
height of the population of adult Indian males at a given time, has a single fixed value.
A statistic on the other hand is a variable quantity. The value of a statistic (or, an
estimator) computed from different samples would differ from sample to sample. We
know that the heights of adult Indian males vary. If two samples of the same size are
drawn from this population then it may happen that one sample has a few more taller
people than the other. Hence, the average height computed from one sample is likely
to be different from that computed from the other. Thus, there is a need to study this
variability in the statistic if it is to be used as an estimator. In other words, we can say Simple Random Sampling,
that there is a need to study the distribution known as sampling distribution of an
estimator.
Household 1 2 3 4 5
Enumerate all possible samples (WOR) of size 2 and show that the sample mean
gives an unbiased estimate of population mean.
All possible samples and their corresponding sample means in this case are presented
in Table-2.
Average 1 1580
Sampling It may be seen that the average of sample means (1580) is equal to the population
mean. This shows the unbiased nature of the sample mean as an estimator of
population mean.
In the same way unbiasedness can be shown in the case when all possible samples of
size 2 are drawn with replacement. We are leaving this for you to do it ~ourself.
E7) In Problem 2 above, enumerate all possible samples (WR) of size 2 and show
that the sample mean is an unbiased estimator of the population mean.
measures the divergence of the estimator from its expected value. If 8 is an estimator
of 0 then,
The positive square roof of sampling variance is termed standard error (SE).
Thus, standard error is the standard deviation of the sampling distribution. It measures
the precision of the estimator particularly in view of the fluctuations due to specific
sampling design.
V(Y)=;
sL
where, s2 = --x
1
n-li=l
(Yi --y)Z
To have a better understanding of the above formulas, you may try the following
exercises.
-,
Simple Random Sampling
E8) sippose we have a population of 5 students enrolled for statistics course and a
counsellor wants to find the average amount of time spent by each student in
preparing for classes each week. The amount of time (in hours) each student
spends per week is given by 7,3,6,10 and 4. If the counsellor takes a sample
of three students WOR, obtain the sampling distribution of the sample mean.
Compute the population mean and the mean and standard error of the
sampling distribution.
E 9) In the data of Example 2, show that the sample mean square (s2)is an
unbiased estimator of the population mean square (s2).
Estimation Of Proportion
Sometimes interest lies in estimating the population proportion. Examples, such as,
proportion of persons below poverty line or proportion of female members in a
particular group or proportion of persons getting degrees through distance education
etc. are very common. In all these examples the population is considered as divided in
two parts on the basis of an attribute.
For instance, a crop field may be irrigated or not irrigated. If it is irrigated, we say
that it possesses the characteristic 'irrigation'. If it is not irrigated, we say that it does
not posses the particular characteristic of irrigation. If we are interested in estimating
the proportion of irrigated fields, the population of N fields can be defined with variate
yi as
= 0,otherwise
N 1 N
Thus, Y = - Yi =I= P = proportion of irrigated fields.
Ni=l N
- 1 n "1
Thus, Y =; x ~ i,=P-
=
i=l
For SRSWR
Variance of p is
Sampling
For SRSWOR
Variance of p is
Problem 3: Obtain the sampling dismbution of the sample proportion and the
standard error of the proportion of households having monthly income more than
Rs.1550 in the population given in Problem 2 by considering simple random sample
(WOR) of two households.
Let us now work out the sampling distribution of the sample proportion by getting the
sample proportion of households with income exceeding Rs.1550 fiom each of 10
possible samples listed in Table 2. In each sample we score a household as 0 if its
income is Rs.1550 or less and as 1 if it exceeds Rs.1550. Then the mean score in each
sample shown in Table3 below gives the sample proportions.
>(
Why don't you try this exercise now
E10) IT facility committee of IGNOU has a total of eight members whose ages in
years are 27,32,33,26,43,52,28 and 25. The committee has a rule which
requires a minimum age of 33 for a member to be the chairperson. Assume
that a simple random sample of size 4 is selected to provide an estimate of the
population proportion eligible to be chairperson. Find the mean and standard
deviation of the sampling distribution.
Some related questions are pertinent here. How accurately does the anthropologist
wish to know the percentage of people with blood group O? Suppose, he answers that
he will be contented if the percentage is correct within a tolerance limit d of _+ 5%. It is
also to be understood that even with this specification of tolerable limit of error it is
not possible to ensure that the estimates are obtained in this margin in 100% of cases.
A level of confidence has therefore to be attached with the estimates. Let confidence
level 1 - a be 95% associated with the estimates.
Let us assume that p is normally distributed about P. It will then lie in the range
(P & 20,) where op is the standard deviation of p, apart from a one in twenty chance
(i.e. apart from a probability of a ) .
In case of SRSWR, ap F.
- - Hence we can put
At this stage some idea about P is needed in order to determine the sample size n.
Fortunately, we do not need very accurate estimate of P for this purpose. In fact, if P
lies between 0.3 to 0.5 then, the determined sample size lies between 336 to 400. To
be on the safe side, 400 may be taken as the initial estimate of n.
Note that the maximum value of PQ with 0 5 P I 1, Q = 1-P is attained at P = -5 and
its value is 0.25
In case of SRSWOR, for estimation of P, the formula for sample size is given as
where d is the margin of tolerable error and t is the abscissa of the normal curve that
cuts of an area of a at the tails, (1-a) being the confidence level of the estimate. If N is
large, a first approximation to n is no = t2p %.
In the example considered above tolerable error is within 5% so d = 0.05. We want
this at the level of confidence of 95%, or in other words a-0.05. Assume that P =
0.5. From the standard normal distribution, the value of the variate corresponding to
the two-sides tail of 5% is 1.96 or approximately 2 and hence
In simple random sampling, units are selected rando'mly at each draw. Now we shall
discuss a sampling technique which has a nice, feature of selecting the whole sample
, with just one random start.
The method is specially suitable in forestry where for estimating the volume of timber
this method is used for selection of area units. Some other applications are in
industries where items for sample checks are selected systematically in a production
process. The concept of systematic sampling is not only confined to spatial
distributions. It can also be done over time. In fact in one of the applications in
estimation of fish catch from marine resources, sampling of boats on landing centres is
carried out systematically over time. Boats arriving every two hourly on selected
landing centres are observed.
In the method described above, if N is not a multiple of n, then it may not be possible
to get samples of equal size. For example, if N = 14 and n = 3 then the method
described above would lead to following arrangement of units in a n x k table as
follows:
To overcome the difficult of varying sample size in a situation when N f 11k the
procedure is modified slightly by which a sample of constant size is always obtained.
This procedure is known as circular systematic sampling
In this method, the N units may be regarded as arranged round a circle. A random
start is taken between 1 to N and thereafter every !k unit, k being an integer nearest
N
to -,in a circular manner is selected until a sample of n units is chosen. Suppose
n
that a unit with random number i is selected. The sample will then consists of the
units corresponding to the serial numbers.
Sampling
Let the random start be 7. Then the selected sample is 7,10,13,2,5. If we start from 9
then the selected sample is 9, 12, 1,4,7. Like this we can have 12 more samples as
the total number of possible samples in this case is N = 14.
The above method has got the advantage of providing samples of the given size
irrespective of the random start. In case of linear systematic sampling the number of
possible samples is k while in case of circular systematic sampling it is N. When N is
a multiple of n then linear systematic sampling is normally preferred although one
could also go for circular systematic sampling. However, when N is not a multiple of
n then one should necessarily go for circular systematic sampling.
The systematic sampling has the nice feature of operational convenience because the
selection of the first unit determines the whole sample. This operation is easier to
understand and can be speedily executed in relation to simple random sampling.
Secondly, systematic samples are.wel1 spread over the population and there is no risk
that any large part of the population will be left u~epresented.For populations with
linear trend, systematic sampling is more efficient in comparison to simple random
sampling.
We now end this unit by giving a summary of what we have covered in it.
12.7 SUNIMARY
12.8 SOLUTIONSIANSWERS
) Average 1 558
b) Similarly make a table for all samples of size 3 with replacement.
There will be s3samples in all.
E5) One sequence could be 21,79,00,59,73. Similarly give another.
E6) Remainder approach: For N=56, the highest two digit multiple of N is N .
itself. Using Appendix A select a two digit random number r, s.t. 1 < r 5 56.
-
I = = is one possibility. Also r 1 N =
44
56
gives quotient as 0 and remainder is
44. Thus select the unit with serial number 44. Likewise you can select other
9 units also. One such simple random sample (WOR) of 10 households
selected could be a sample of households with serial numbers
Average 1580
I Total 10 1
Sampling
7 + 3 + 6 + 1 0 + 4 30
Population mean = p = -
-- = 6.
5 5
S.E = -
J;; 5-1
/--
2.45 5-3
= 1.00 (approximately)
S.E = b
J;T
\l
N-1
c
(Usin g formula (5))
1
= - x 19400 = 4850
4
Thus from (i) and (ii)
~ ( s ' )= S'
i.e. sample mean square provides an unbiased estimator of the population
mean square.
E10) The mean of the Sampling distribution is the population proportion P,
Source: Rao, CR,Mitra S.K., Maitthai, A. and Ramamurthy, K.G. (1974). Formulae and Tables for Statistical
Work. Statistical Publishing Society. Indian Statistical Institute Calcutta.