02 Simple Random Sampling
02 Simple Random Sampling
1
02 Simple Random Sampling
N 1 N 2 N r 1 1 1
pr . . .
N N 1 N r 2 N r 1 N
2
02 Simple Random Sampling
Lottery method:
The simplest method of selecting a simple random sample is the so called lottery method in
which each member of the population is identified by some means, such as by a marble, a disk, a
piece of paper and the like. The identifications are then placed in an urn or box and well mixed.
A sample of required size is then selected. The lottery method is illustrated bellow by means of
an example.
Suppose we want to select n candidates out of N . We assign the numbers 1 to N , one number to
each candidate and write these numbers on N slips, which are made as homogeneous as possible
in shape, size, color, etc. These slips are then put in a bag and thoroughly shuffled and then n
slips are drawn one by one. The n candidates corresponding to numbers on the slips drawn will
constitute a random sample.
This method of selection is quite independent of the properties of population. Generally in place
of slips, cards are used. We make one card correspond to one of the units of the population by
writing on it the number of the unit. The pack of card is a kind of miniature of the population for
sampling purposes. The cards are shuffled a number of times and then a card is drawn at random
from them. This is one of the most reliable methods of selecting a random sample.
Theoretically, the lottery method is free from human bias and thus ensures randomness.
However, the randomness of the lottery method depends on the assumption that the identifiers
(marble, disk or piece of paper) are thoroughly mixed so that the population can be regarded as
being arranged randomly. In practice, such satisfactory mixing is difficult to ensure and thus the
use of random numbers remains the only option for selecting sample.
In reality, a simple random sample is drawn unit by unit. If a list of the population units, that is, a
sampling frame is available, then the selection of a random sample may be easily accomplished
with the use of random numbers.
The units in the population are numbered from 1 to N . A series of random numbers between 1 to
N is then drawn by means of the random number table one after another. Once the first random
number is drawn, we may decide to proceed in any direction, vertically, horizontally, diagonally
or any other systematic way to obtain the remaining units in the sample.
At any draw, the process used must give equal chance of selection to any number between 1 to
N in the population. The units that bear these n numbers constitute our desired sample and we
technically call these n numbers a sample of size n .
It is important to keep in mind that whatever procedure is used, we must ensure that the numbers
so selected are all different and none are greater than the population size N .
3
02 Simple Random Sampling
The use of random number table involves a number of rejections since all numbers greater than
N appearing in the table are not considered for selection. The use of random numbers is,
therefore, modified and some of these modified procedures are:
1) Remainder method.
2) Quotient method.
Remainder method:
Suppose that a simple random sample of fixed size is to be drawn from a population comprising
N units. Let this N be a r digit number and let the highest r digit multiple of N is N / . A
random number k is chosen from 1 to N / and the unit with the serial number equal to the
remainder obtained on dividing k by N is selected. The second and the subsequent units are
selected in a similar manner. If the remainder is zero, the last unit is selected.
For example, suppose that a random sample of size 5 is to be selected from a population of size
150 units. 150 is a 3 digit number and the highest 3 digit multiple of 150 is: 150 6 900 . A
random number 277 is chosen from 001 to 900. Divide 277 by 150. The remainder is 127. The
unit labeled 127 in the population is selected.
To select the second unit, choose the next random number. This number is 130, which is less
than 150. We directly choose this number as our second unit in the sample.
The next random number is 802, which results in a remainder 52. The unit corresponding to this
number is our third selected unit.
Continuing this process, we arrive at the next two numbers. These are 108 and 91. So, the
random numbers thus chosen are 52, 91, 108, 127 and 130. Had there been any number larger
than 900, we would have ignored it.
Note that the selection did not lead to any rejection of the random numbers. That is, all the first 5
random numbers had been possible to be included in the sample without any rejection.
Quotient method:
Suppose that a simple random sample of fixed size is to be drawn from a population comprising
N units. Let this N be a r digit number and let the highest r digit multiple of N is N / such that
N/
N
q. A random number k is chosen from 1 to N / 1 and the unit with the serial number
equal to the quotient 1 obtained on dividing k by q is selected. The second and the subsequent
units are selected in a similar manner.
For example, suppose that a random sample of size 2 is to be selected from a population of size
16 units. 16 is a 2 digit number and the highest 2 digit multiple of 16 is: 16 6 96 such that
96
6 . A random number 65 is chosen from 01 to 95. Divide 65 by 6. The quotient is 10. The
16
unit labeled 9 in the population is selected. The second unit is selected in a similar manner.
4
02 Simple Random Sampling
N n
Y y i Population total y y i Sample total
i 1 i 1
N n
yi Y
yi y
Y i 1 Population mean y i 1 Sample mean
N N n n
Y N Y y n y Yˆ N y
N
yi Y
2
2
i 1
N
N n
yi Y yi y
2 2
S2 i 1 s2 i 1
N 1 n 1
N 1 S 2 N 2
Theorem:
The sample mean y for a simple random sample of size n is an unbiased estimator of the
population mean Y . Symbolically, E y Y .
Proof:
n N n N
1
y i p y i
n n n
yi E yi
i 1 i 1
y i
N Y
i 1 i 1 nY
y i 1 E y i 1 i 1 Y
n n n n n n
Yˆ N y
E Yˆ N E y N Y Y
Theorem:
In simple random sampling of n units without replacement from a population of N units, the
N n S
2
variance of the sample mean is given by: var y .
N n
Proof:
2 2
2 n n n n
var y E y E y E y n Y E y i n Y E y i Y E y i Y E y i Y y j Y
2 2
i 1 i 1 i 1 i j
5
02 Simple Random Sampling
n
var y n 2 E y i Y y j Y
i j
Now, consider the second term in the right hand side of the above equation. When the sampling
is drawn without replacement, the probability of obtaining x k on the i th draw is 1 and the
N
th 1
probability of obtaining xm on the j draw knowing that xk has been drawn is . Hence, the
N 1
1
probability of obtaining xk on the i th draw and xm on the j th draw is . Hence, the
N N 1
second term implies to
E yi Y y j Y E y i y j Y E y i Y E y j Y 2 E y i y j Y Y Y Y Y 2 E y i y j Y 2
N N
E yi y j Y 2 y i y j p y i y j Y 2 y i y j N N1 1 Y 2
i j i j
N 2
2
N N
N Y y i N N 1 Y
1 y y 2 Y 1 2
2
2
N N 1 i 1 i 1 i N N 1
i
i 1
N
2
yi
N 2 N
2
N
y 2 i 1
1 1 1 2
y NY y Y
N N 1 i 1 i N N 1 i 1 i N N N 1 i 1 i
1 2
N 2
N N 1 N 1
n n
2
2
E y i Y y j Y N 1
n n 1
N 1
i j i j
N n 2 N n N 1 2 N n
var y n n S nS
2
N 1 N 1 N N
y 1 1 N n 2 N n 2
var y var 2 var y 2 n
n n n N 1 N 1 n
1 N n 2 N n S
2
var y n S
n2 N N n
When the sampling is done with replacement, we are left with only the first term, since
E y i Y y j Y 0 and consequently,
2
var y n 2 var y
n
6
02 Simple Random Sampling
The variance of y above based on samples without replacement differs from those with
N n
replacement by the term . In other words, var y in sampling without replacement is only
N 1
N n
times its value in sampling with replacement.
N 1
N n n n
Provided that N is large compared with n , 1 1 f , where f and is less than 1 for
N 1 N N
any n such that 1 n N . Therefore, the variance of y without replacement is less than the
N n 2 2
variance of y with replacement. That is, .
N 1 n n
In other words, for the same sample size, the simple estimator tends to vary less around the
population characteristics under sampling without replacement than that under with replacement.
In this sense, sampling without replacement should be preferred over sampling with replacement.
The factor 1 f is a correction factor for the finite size of the population and is called finite
n
population correction (fpc). The sampling fraction f is small when either the sample is small
N
or the population is large. In either case, the factor 1 f approaches 1 and can be ignored. In such
cases, the variance of y does not depend on N and there is little or no practical difference
between the two methods.
Theorem:
In simple random sampling of n units without replacement from a population of N units, the
sample variance s 2 is an unbiased estimator of population variance S 2 . That is, E s 2 S 2 .
Proof:
1 n
1 n 1 n 2
yi y E yi y
E yi Y y Y
2 2
s2 E s 2
n 1 i 1 n 1 i 1 n 1 i 1
1 n
n
E y i Y
n E y Y 2 E y i Y y Y
2 2
E s 2
n 1 i 1 i 1
n
1
E y i Y n E y Y 2 E y Y n y nY
2 2
n 1 i 1
1 n 2 1 n 2
E y i Y n E y Y 2 n E y Y E y i Y n Ey Y
2 2 2
n 1 i 1 n 1 i 1
E s2
1
n 1
n 2 n var y
N n
2
Now, for sampling without replacement, var y . So, we have from the above that
N 1 n
1 N n
N n 2 n N nN n
2 2
E s2 n n n
2
n 1 N 1 n n 1 N 1 n 1 N 1
N n 1
2
N sin ce , N
2
S2 2
N 1 S 2
n 1 N 1 N 1
7
02 Simple Random Sampling
2
Again, for sampling with replacement, var y . So, we have from the above that
n
1 2
E s2 n 2 n
n 1
1
n 2 2
2
n n 1
Theorem:
The covariance between the sample means x and y in a simple random sample of n units from
1 f
a population of N units without replacement is: cov x , y S xy . Also, the correlation
n
S xy
coefficient between x and y is: xy xy . Here,
SxSy
1 N N
S xy E x i X E y i Y
x i X E y i Y N 1
N 1 i 1
xy
Proof:
Let u i x i y i so that u xy. The corresponding population mean of ui is U X Y . So, we
have that
1 f 2 1 f 1 1 f 1
N N
E u U var u u i U xi yi X Y
2 2 2
S u
n n N 1 i 1 n N 1 i 1
1 f 1 1 f S 2 S 2 2cov x, y
N 2
n N 1 i 1
xi X yi Y x
n
y
Now, we have that
2
E u U E x y X Y E x X y Y E x X E y Y 2 E x X y Y
2 2 2 2
1 f 2
var x var y 2cov x , y
n
S x S y 2cov x , y
2
So, we have from the above that
1 f 2
n
S x S y
2
2cov x , y 1 n f S x2 S y2 2cov x, y
1 f 1 f
cov x , y cov x, y S xy
n n
Again, we have that
8
02 Simple Random Sampling
1 f
cov x , y S xy S xy
xy n xy
var x var y 1 f 2 1 f 2
SxSy
S Sy
n x n
Thus, in simple random sampling, the correlation coefficient between the sample means is
independent of the sample size and is equal to the correlation between individual observations.
Relative error of the estimators:
In sampling theory, standard errors serve as absolute measures of precisions of sample
estimators. The errors in the sample estimators can also be assessed in relative terms. We call
this measure the coefficient of variation. In simple random sampling, the coefficient of variation
for the sample mean is given by:
1 f 2
var y S y S y 1 f
n 1 f
CV y CV y
Y Y Y n n
CV y CV y
2 2
1 f 1
for l arg e N
CV
y n CV n
y
Again, we have that
var Yˆ var N y var y
CV Yˆ
Y Y
Y
CV y
So, we can say that in simple random sampling, the coefficient of variation of an estimated total
is the same as that of an estimated mean.
9
02 Simple Random Sampling
10