Chapter4 Sampling Stratified Sampling
Chapter4 Sampling Stratified Sampling
Stratified Sampling
An important objective in any estimation problem is to obtain an estimator of a population parameter
that can take care of the salient features of the population. If the population is homogeneous with
respect to the characteristic under study, then the method of simple random sampling will yield a
homogeneous sample, and in turn, the sample mean will serve as a good estimator of the population
mean. Thus, if the population is homogeneous with respect to the characteristic under study, then the
sample drawn through simple random sampling is expected to provide a representative sample.
Moreover, the variance of the sample mean not only depends on the sample size and sampling fraction
but also on the population variance. To increase the precision of an estimator, we need to use a
sampling scheme that can reduce the heterogeneity in the population. If the population is
heterogeneous with respect to the characteristic under study, then one such sampling procedure is
stratified sampling.
Example: In order to find the average height of the students in a school of class 1 to class 12, the
height varies a lot as the students in class 1 are of age around 6 years, and students in class 10 are of
age around 16 years. So, one can divide all the students into different subpopulations or strata, such as
Students of classes 1, 2, and 3: Stratum 1
Students of classes 4, 5, and 6: Stratum 2
Students of classes 7, 8, and 9: Stratum 3
Students of classes 10, 11, and 12: Stratum 4
Now draw the samples by SRS from each of the strata 1, 2, 3 and 4. All the drawn samples combined
together will constitute the final stratified sample for further analysis.
Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur
Page 1
Notations:
We use the following symbols and notations:
N : Population size
k : Number of strata
Ni : Number of sampling units in ith strata
k
N = Ni
i =1
Population (N units)
Sample Sample k
n = ni
Sample
1 2 ……… k i =1
n1 units n2 units nk units
…
• Strata are constructed such that they are non-overlapping and homogeneous with respect to the
k
characteristic under study such that N
i =1
i = N.
• Draw a sample of size ni from ith ( i = 1, 2,..., k ) stratum using SRS (preferably WOR)
In cluster sampling, the clusters are constructed such that they are
• within heterogeneous and
• among homogeneous.
[Note: We discuss the cluster sampling later.]
Note that there are k independent samples drawn through SRS of sizes n1 , n2 ,..., nk from each of the
strata. So, one can have k estimators of a parameter based on the sizes n1 , n2 ,..., nk respectively. Our
interest is not to have k different estimators of the parameters, but the ultimate goal is to have a single
estimator. In this case, an important issue is how to combine the different sample information together
into one estimator, which is good enough to provide information about the parameter.
We now consider the estimation of population mean and population variance from a stratified sample.
ni
1
yi =
ni
y
j =1
ij : sample mean from ith stratum
1 k k
NiYi = wY
Ni
Y= i i : population mean where wi = .
N i =1 i =1 N
unbiased estimator of Y . Consider the stratum mean, which is defined as the weighted arithmetic mean
of strata sample means with strata sizes as weights given by
1 k
yst = Ni yi .
N i =1
Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur
Page 4
Now
1 k
E ( yst ) = Ni E ( yi )
N i =1
1 k
= Ni Y i
N i =1
= Y.
Variance of yst
k k ni
Var ( yst ) = w Var ( yi ) +
2
i w w Cov( y , y ).
i j i j
i =1 i ( j ) =1 j =1
Since all the samples have been drawn independently from each of the strata by SRSWOR so
Cov( yi , y j ) = 0, i j
Ni − ni 2
Var ( yi ) = Si
Ni ni
where
1 Ni
Si2 =
Ni − 1 j =1
(Yij − Y i ) 2 .
Thus
k
Ni − ni 2
Var ( yst ) = wi2 Si
i =1 Ni ni
k
ni Si2
= w 1 − .
2
i
i =1 Ni ni
Observe that Var ( yst ) is small when Si2 is small. This observation suggests how to construct the strata.
If Si2 is small for all i = 1,2,...,k, then Var ( yst ) will also be small.
The total variation in the population is fixed and can be orthogonally partitioned into between and
within strata variations, i.e.,
Total variation = Between strata variation + Within strata variation ( Si2 ).
Since Si2 is small, so obviously “Between strata variation” has to be large. That is why it was
mentioned earlier that the strata are to be constructed such that they are within homogeneous, i.e., Si2
is small and among heterogeneous (“Between strata variation” is large).
Estimate of Variance
Since the samples have been drawn by SRSWOR, so
E ( si2 ) = Si2
1 ni
where si2 = ( yij − yi )2
ni − 1 j =1
N i − ni 2
and Var ( yi ) = si
N i ni
k
so Var ( yst ) = wi2 Var ( yi )
i =1
k
N −n 2
= wi2 i i si .
i =1 N i ni
Note: If SRSWR is used instead of SRSWOR for drawing the samples from each stratum, then in this
case
k
yst = wi yi
i =1
E ( yst ) = Y
k
N −1 k
2
Var ( yst ) = wi2 i Si2 = wi2 i
i =1 N i ni i =1 ni
k
w2 s 2
Var ( yst ) = i i
i =1 ni
Ni
1
where i2 =
ni
(y
j =1
ij − yi ) 2 .
effectively?
There are two aspects of choosing the sample sizes:
(i) Minimize the cost of the survey for a specified precision.
(ii) Maximize the precision for a given cost.
Note: The sample size cannot be determined by minimizing both the cost and variability
simultaneously. The cost function is directly proportional to the sample size, whereas variability is
inversely proportional to the sample size.
Based on different ideas, some allocation procedures are as follows:
1. Equal allocation
Choose the sample size ni to be the same for all the strata.
2. Proportional allocation
For fixed k, select ni such that it is proportional to stratum size N i , i.e.,
ni Ni
or ni = CN i
where C is the constant of proportionality.
k k
n = CN
i =1
i
i =1
i
or n = CN
n
C = .
N
n
Thus ni = N i .
N
Such allocation arises from considerations like operational convenience.
n = C N S
i =1
i
i =1
*
i i
k
or n = C * Ni Si
i =1
n
or C * = k
.
N S
i =1
i i
nNi Si
Thus ni = k
.
N S
i =1
i i
k
This allocation arises when the Var ( yst ) is minimized subject to the constraint n
i =1
i (prespecified).
There are some limitations to the optimum allocation. The knowledge of Si (i = 1, 2,..., k ) is needed to
know ni . If there are more than one characteristic, then they may lead to conflicting allocation.
Choice of sample size based on the cost of the survey and variability
The cost of the survey depends upon the nature of the survey. A simple choice of the cost function is
k
C = C0 + Ci ni
i =1
where
C : total cost
C 0 : overhead cost, e.g., setting up the office, training people, etc.
To find ni under this cost function, consider the Lagrangian function with a Lagrangian
multiplier as
k
w2 S 2 k k
w2 S 2
= i i + 2 Ci ni − i i
i =1 ni i =1 i =1 Ni
2
kw S
= i i − Ci ni + terms independent of ni .
i =1
ni
How to determine ?
There are two ways to determine .
(i) Minimize variability for a fixed cost.
(ii) Minimize cost for given variability.
We consider both cases.
Ci wi Si
or = i =1
.
C0*
1 wi Si
Substituting in the expression for ni = , the optimum ni is obtained as
Ci
wi Si C0*
ni =
*
.
Ci
k
i =1
Ci wi Si
k
wS i i
w S Ci
ni = i i i =1 k 2 2 .
Ci wi Si
V0 + N
i =1 i
So the required sample size to estimate Y such that cost C is the minimum for a
k
prespecified variance V0 is n = ni .
i =1
Sample size under proportional allocation for fixed cost and for fixed variance
k
(i) If cost C = C0 is fixed then C0 = C n .
i =1
i i
n
Under proportional allocation, ni = Ni = nwi
N
k
C0
So C0 = n wC
Co wi
i i or n = . Thus ni = .
i =1
k
wC i i
wiCi
i =1
k
The required sample size to estimate Y in this case is n = ni .
i =1
w S 2 2
i i
or n = i =1
k
wi2 Si2
V0 +
i =1 Ni
k
w S 2 2
i i
or ni = wi i =1
.
w2 S 2 k
V0 + i i
i =1 Ni
This is known as Bowley’s allocation.
n
k Ni − Ni 2
Ni 2
Varprop ( y ) st = N
Si
Ni Ni N
i =1
n
N
N − n k N i Si 2
=
Nn i =1 N
N −n k
=
Nn i =1
wi Si2 .
N S
i =1
i i
k
1 1
Vopt ( yst ) = − wi2 Si2
i =1 ni Ni
2 2
k
wi Si k
wi2 Si2
= −
i =1 ni i =1 Ni
k
k N i Si k w2 S 2
= wi Si i =1
2 2
− i i
i =1 nN i Si i =1 N i
k
1 N S k k w2 S 2
= . i 2 i N i Si − i i
i =1 n N i =1 i =1 N i
2 2
1 k N S k
w2 S 2 1 k 1 k
= i i − i i = wi Si − wS i i
2
.
n i =1 N i =1 N i n i =1 N i =1
In order to compare VSRS ( y ) and Vprop ( yst ), first we attempt to express S 2 as a function of Si2 .
Consider
k Ni
( N − 1) S = 2
(Y ij − Y )2
i =1 j =1
k Ni 2
= (Yij − Yi ) + (Yi − Y )
i =1 j =1
k Ni k Ni
= (Y ij − Yi ) +
2
(Y − Y )
i
2
i =1 j =1 i =1 j =1
k k
= ( N i − 1) Si2 + N (Y − Y ) i i
2
i =1 i =1
Ni − 1 N −1
1 and 1.
Ni N
Thus
k
Ni 2 k Ni
S2 = Si + (Yi − Y ) 2
i =1 N i =1 N
N −n 2 N −n k
Ni 2 N − n k
Ni N -n
or
Nn
S =
Nn
i =1 N
Si +
Nn
i =1 N
(Yi − Y ) 2 (Premultiply by
Nn
on both sides)
N −n k
VarSRS (Y ) = V prop ( y st ) +
Nn
w (Y − Y )
i =1
i i
2
k
Since w (Y − Y )
i =1
i i
2
0,
Consider
N − n k 2
1 k
2
1 k
V prop ( yst ) − Vopt ( yst ) = i i i i −
− w S
2
w S w S i i
Nn i =1 n i =1 N i =1
1 k
2
k
= wi Si − wi Si
2
n i =1 i =1
1 k 1
= wi Si2 − S 2
n i =1 n
k
1
= wi ( Si − S ) 2
n i =1
k
where S = wi Si and the larger gain in efficiency is achieved when S i differs from S more.
i =1
Combining the results in (a) and (b), we have Varopt ( yst ) Varprop ( yst ) VarSRS ( y ) .
1 ni
si2 =
ni − 1 j =1
( yij − yi )2 .
In stratified sampling,
k
Ni − ni 2
Var ( yst ) = wi2 Si .
i =1 Ni ni
assuming yst is normally distributed and Var ( yst ) is well determined so that t can be read from
normal distribution tables. If only few degrees of freedom are provided by each stratum, then t values
are obtained from the table of student’s t-distribution.
The distribution of Var ( yst ) is generally complex. An approximate method of assigning an effective
Ni ( Ni − ni ) k
where gi = and Min(ni − 1) ne (ni − 1) assuming yij are normally distributed.
ni i =1
n1 = N1
and
(n − N1 )wi Si
ni = k
; i = 2,3,..., k
w S
i =2
i i
Suppose in revised allocation, we find that n2 N 2 then the revised allocation would be
n1 = N1
n2 = N 2
(n − N1 − N 2 ) wi Si
ni = k
; i = 3, 4,..., k .
wS
i =3
i i
In such cases, the formula for the minimum variance of yst need to be modified as
( *wi Si )2 *
wi Si2
Min Var ( y st ) = −
n* N
where *
denotes the summation over the strata in which ni N i and n* is the revised total sample
where Qi = 1 − Pi .
k
Ni − ni 2 2
Also Var ( yst ) = wi Si .
i =1 Ni ni
1 k Ni2 ( Ni − ni ) PQ
So Var ( pst ) = N − 1 ni i .
N 2 i =1 i i
N − n 1 k N i2 PQ
Varprop ( pst ) = i i
N Nn i =1 N i − 1
N −n k
= wi PQ
Nn i =1
i i
The best choice of ni such that it minimizes the variance for fixed total sample size is
N i PQ
ni N i i i
Ni − 1
= Ni PQ
i i
N i PQ
Thus ni = n k
i i
.
N
i =1
i PQ
i i
k
Similarly, the best choice of ni such that the variance is minimum for fixed cost C = C0 + Ci ni is
i =1
PQ
i i
nN i
Ci
ni = k
.
PQ
N
i =1
i
i i
Ci
k Ni 2
= (Yij − Yi ) + (Yi − Y )
i =1 j =1
k Ni k
= (Yij − Y ) 2 + Ni (Yi − Y ) 2
i =1 j =1 i =1
k k
= ( Ni − 1) Si2 + Ni (Yi − Y ) 2
i =1 i =1
k
k 2
= ( Ni − 1) Si2 + N wY i i − Y .
2
i =1 i =1
In order to estimate S 2 , we need to estimates of Si2 , Yi 2 and Y 2 . We consider their estimation one by
one.
E(si2 ) = Si2
So Sˆi2 = si2 .
Var ( yi ) = E ( yi 2 ) − [ E ( yi )]2
= E ( yi 2 ) − Yi 2
or Yi 2 = E ( yi 2 ) − Var ( yi ).
An unbiased estimate of Yi 2 is
k
k
( N − 1) S 2 = ( N i − 1) Si2 + N wi Yi 2 − Y 2
i =1 i =1
k
N k
w iYˆi 2 − Yˆ 2
1
as Sˆ 2 =
N − 1 i =1
( N i − 1) Sˆi2 +
N − 1 i =1
1 k 2 N k N −n 2 2 k N i − ni 2 2
=
N − 1 i =1
( i ) i
N − 1 s + wi yi2 − i i
N − 1 i =1
si − yst − wi si
N i ni i =1 N n
i i
1 k 2 N k k
N −n
=
N − 1 i =1
( N i − 1) si +
N − 1 i =1
wi ( yi − y st ) 2
− wi (1 − wi ) i i si2 .
i =1 N i ni
Thus
N − n ˆ2
Var SRS ( y ) = S
Nn
N −n k 2 N ( N − n) k k
N −n
=
N ( N − 1)n i =1
( N i − 1) si + i i
nN ( N − 1) i =1
w ( y − y st ) 2
− wi (1 − wi ) i i si2
i =1 N i ni
and
k
Ni − ni 2 2
Var ( yst ) = wi si .
i =1 Ni ni
If any other particular allocation is used, then substituting the appropriate ni under that allocation,
such gain can be estimated.
The subsamples need not necessarily be independent. The assumption of independent subsamples
helps in obtaining an unbiased estimate of the variance of the composite estimator. This is even helpful
if the sample design is complicated and the expression for variance of the composite estimator is
complex.
Let there be g independent interpenetrating subsamples and t1, t2 ,..., tg be g unbiased estimators of
E(ˆ) = E(t ) =
and
g
1
Var (ˆ) = Var ( t ) =
g ( g − 1) j =1
(t j − t )2 .
Note that
1 g
E Var ( t ) = E (t j − ) 2 − g ( t − ) 2
g ( g − 1)
j =1
1 g
= Var (t j ) − g Var ( t )
g ( g − 1) j =1
1
= ( g 2 − g )Var ( t ) = Var ( t ).
g ( g − 1)
If the distribution of each estimator tj is symmetric about , then the confidence interval of can be
obtained by
g −1
1
P Min(t1 , t2 ,..., t g ) Max(t1 , t2 ,..., t g ) = 1 − .
2
Let Yˆij (tot ) be an unbiased estimator of the total of jth stratum based on the ith subsample ,
i = 1,2,...,L; j = 1,2,...,k.
Note: This topic is to be read after the next module on ratio method of estimation. Since it is related to
the stratification, so it is given here.
In post-stratification,
• draw a sample by simple random sampling from the population and carry out the survey.
• After the completion of the survey, stratify the sampling units to increase the precision of the
estimates.
Assume that the stratum size N i is fairly accurately known. Let
m = n.
i =1
i
Note that mi is a random variable (and that is why we are not using the symbol ni as earlier).
Assume n is large enough or the stratification is such that the probability that some mi = 0 is negligibly
small. In case, mi = 0 for some strata, two or more strata can be combined to make the sample size
1 1
To find E − , proceed as follows :
mi Ni
Consider the estimate of the ratio based on the ratio method of estimation as
n N
y
y j
Y
Y j
Rˆ = = j =1
n
, R= = j =1
N
.
x X
x X
j j
j =1 j =1
We know that
N − n RS X2 − S XY
E ( Rˆ ) − R = . .
Nn X2
1 if j th unit belongs to i th stratum
Let x j =
0 otherwise
and
y j = 1 for all j = 1,2,...,N.
y j
n
Rˆ =
j =1
n
=
x
ni
j
j =1
N
Yj =1
j
N
R= N
=
X
Ni
j
j =1
1 N 2 2 1 N i2 1 Ni 2
S = X j − NX = Ni − N 2 = Ni −
2
N − 1 j =1 N −1 N N −1
x
N
1 N 1 N N
S xy = X jY j − NXY = N i − i 2 = 0.
N − 1 j =1 N −1 N
n N N ( N − n)( N − N i )
E ( Rˆ ) − R = E − = .
ni Ni nN i2 ( N − 1)
Thus
1 1 N N ( N − n)( N − N i ) 1
E − = + −
ni Ni nN i n 2 N i2 ( N − 1) Ni
( N − n) N N 1
= 1 + − .
n( N − 1) N i N i n n
1 1 ( N − n) N N 1
E − = 1 + −
mi Ni n( N − 1) Ni Ni n n
Now substitute this in the expression of Var ( y post ) as
k 1 1
Var ( y post ) = wi2 E − Si2
i =1 mi N i
k N −n N N 1
= wi2 Si2 . 1 + −
i =1 ( N − 1)n N i nN i n
N −n k 2 2 1 1 1
=
n( N − 1) i =1
wi Si 1 + −
wi nwi n
N −n k 1
= 2
n ( N − 1) i =1
wi Si2 n − 1 +
wi
N −n k
=
n ( N − 1) i =1
2
(nwi + 1 − wi ) Si2
N −n k N −n k
= i i n2 ( N − 1)
n( N − 1) i =1
w S 2
+
i =1
(1 − wi ) Si2 .
Assuming N −1 N.
N −n n N −n n
V ( y post ) =
Nn i =1
wi Si2 + 2 (1 − wi ) Si2
n N i =1
N −n n
= V prop ( yst ) + (1 − wi )Si2 .
Nn 2 i =1
The second term is the contribution to the variance of y post due to mi ' s not being proportionately
distributed.
N −n k N −n 2 k
2 w = 1)
(1 − wi ) S w2 = S w (k − 1) (Since i
Nn i =1 Nn2 i =1
k − 1 N − n 2
= Sw
n Nn
k −1
= Var ( yst ).
n
n
The increase in the variance over Varprop ( yst ) is small if the average sample size n = per stratum is
2
reasonably large.
Thus, a post-stratification with a large sample produces an estimator that is almost as precise as an
estimator in the stratified sampling with proportional allocation.