Chapter4 Stratified Sampling
Chapter4 Stratified Sampling
Stratified Sampling
An important objective in any estimation problem is to obtain an estimator of a population parameter
which can take care of the salient features of the population. If the population is homogeneous with
respect to the characteristic under study, then the method of simple random sampling will yield a
homogeneous sample and in turn, the sample mean will serve as a good estimator of population mean.
Thus, if the population is homogeneous with respect to the characteristic under study, then the sample
drawn through simple random sampling is expected to provide a representative sample. Moreover, the
variance of sample mean not only depends on the sample size and sampling fraction but also on the
population variance. In order to increase the precision of an estimator, we need to use a sampling
scheme which can reduces the heterogeneity in the population. If the population is heterogeneous with
respect to the characteristic under study, then one such sampling procedure is stratified sampling.
Example: In order to find the average height of the students in a school of class 1 to class 12, the
height varies a lot as the students in class 1 are of age around 6 years and students in class 10 are of
age around 16 years. So one can divide all the students into different subpopulations or strata such as
Population (N units)
1 2 … … … n = ∑ ni
k i =1
n1 units n2 units nk
it
2
Procedure of stratified sampling
Divide the population of N units into k strata. Let the ith stratum has N1 , i = 1, 2,..., k number of units.
• Strata are constructed such that they are nonoverlapping and homogeneous with respect to the
k
characteristic under study such that ∑N
i =1
i = N.
• Draw a sample of size ni from ith ( i = 1, 2,..., k ) stratum using SRS (preferably WOR)
independently from each stratum.
• All the sampling units drawn from each stratum will constitute a stratified sample of size
k
n = ∑ ni .
i =1
In cluster sampling, the clusters are constructed such that they are
• within heterogeneous and
• among homogeneous.
[Note: We discuss the cluster sampling later.]
Note that there are k independent samples drawn through SRS of sizes n1 , n2 ,..., nk from each of the
strata. So ,one can have k estimators of a parameter based on the sizes n1 , n2 ,..., nk respectively. Our
interest is not to have k different estimators of the parameters but the ultimate goal is to have a single
estimator. In this case, an important issue is how to combine the different sample information together
into one estimator which is good enough to provide the information about the parameter.
We now consider the estimation of population mean and population variance from a stratified sample.
3
Estimation of population mean and its variance
Let
Y : characteristic under study,
yij : value of jth unit in ith stratum j = 1,2,…,n i , i = 1,2,...,k,
Ni
1
Yi =
Ni
∑y
j =1
ij : population mean of ith stratum
ni
1
yi =
ni
∑y
j =1
ij : sample mean from ith stratum
1 k k
1 k
y= ∑ ni yi
n i =1
as a possible estimator of Y .
4
and y turns out to be a biased estimator of Y . Based on this, one can modify y so as to obtain an
unbiased estimator of Y . Consider the stratum mean which is defined as the weighted arithmetic mean
of strata sample means with strata sizes as weights given by
k
1
y st =
N
∑N y.
i =1
i i
Now
k
1
E ( yst ) =
N
∑ N E( y )
i =1
i i
k
1
=
N
∑N Yi =1
i i
=Y
Variance of yst
k k ni
=
Var ( yst ) ∑ w Var ( y ) + ∑ ∑ w w
i= 1
2
i i
i ( ≠ j )= 1 j = 1
i j Cov( yi , y j ).
Since all the samples have been drawn independently from each of the strata by SRSWOR so
Cov( yi , y=
j) 0, i ≠ j
N i − ni 2
Var ( yi ) = Si
N i ni
where
1 Ni
=Si2 ∑ (Yij − Y i )2 .
N i − 1 j =1
Thus
k
N i − ni 2
Var ( yst ) = ∑ wi2 Si
i =1 N i ni
2 ni Si2
k
= ∑ wi 1 − .
i =1 N i ni
Observe that Var ( yst ) is small when Si2 is small. This observation suggests how to construct the
strata . If Si2 is small for all i = 1,2,...,k, then Var ( yst ) will also be small . That is why it was
5
mentioned earlier that the strata are to be constructed such that they are within homogeneous, i.e., Si2
is small and among heterogeneous.
For example, the units in geographical proximity will tend to be more closer. The consumption pattern
in the households will be similar within a lower income group housing society and within a higher
income group housing society whereas they will differ a lot between the two housing societies based
on income.
Estimate of Variance
Since the samples have been drawn by SRSWOR, so
E ( si2 ) = Si2
1 ni
=
where si2 ∑ ( yij − yi )2
ni − 1 j =1
( y ) = N i − ni s 2
and Var i i
N i ni
k
( y ) = w2 Var
so Var st ∑ i ( yi ) i =1
k
N −n 2
= ∑ wi2 i i si
i =1 N i ni
Note: If SRSWR is used instead of SRSWOR for drawing the samples from each stratum, then in this
case
k
yst = ∑ wi yi
i =1
E ( yst ) = Y
N −1
k k
σ i2
=Var ( y ) ∑
st
2
i
i
= 2
w
i S ∑w 2
i
=i 1 = i i Nn
i 1 ni
k 2 2
(y ) = wi si
Var st ∑
i =1 ni
Ni
1
=
where σ i2
ni
∑(y
j =1
ij − yi ) 2 .
6
Advantages of stratified sampling
1. Data of known precision may be required for certain parts of the population.
This can be accomplished with a more careful investigation to few strata.
Example: In order to know the direct impact of hike in petrol prices, the population can be
divided into strata like lower income group, middle income group and higher income group.
Obviously, the higher income group is more affected than the lower income group. So more
careful investigation can be made in the higher income group strata.
2. Sampling problems may differ in different parts of the population.
Example: To study the consumption pattern of households, the people living in houses, hotels,
hospitals, prison etc. are to be treated differently.
3. Administrative convenience can be exercised in stratified sampling.
Example: In taking a sample of villages from a big state, it is more administratively convenient
to consider the districts as strata so that the administrative setup at district level may be used for
this purpose. Such administrative convenience and the convenience in organization of field
work are important aspects in national level surveys.
4. Full cross-section of population can be obtained through stratified sampling. It may be possible
in SRS that some large part of the population may remain unrepresented. Stratified sampling
enables one to draw a sample representing different segments of the population to any desired
extent. The desired degree of representation of some specified parts of population is also
possible.
5. Substantial gain in the efficiency is achieved if the strata are formed intelligently.
6. In case of skewed population, use of stratification is of importance since larger weight may
have to be given for the few extremely large units which in turn reduces the sampling
variability.
7. When estimates are required not only for the population but also for the subpopulations, then
the stratified sampling is helpful.
8. When the sampling frame for subpopulations is more easily available than the sampling frame
for whole population, then stratified sampling is helpful.
9. If population is large, then it is convenient to sample separately from the strata rather than the
entire population.
10. The population mean or population total can be estimated with higher precision by suitably
providing the weights to the estimates obtained from each stratum.
7
Allocation problem and choice of sample sizes is different strata
Question: How to choose the sample sizes n1 , n2 ,..., nk so that the available resources are used in an
effective way?
There are two aspects of choosing the sample sizes:
(i) Minimize the cost of survey for a specified precision.
(ii) Maximize the precision for a given cost.
Note: The sample size cannot be determined by minimizing both the cost and variability
simultaneously. The cost function is directly proportional to the sample size whereas variability is
inversely proportional to the sample size.
2. Proportional allocation
For fixed k, select ni such that it is proportional to stratum size N i , i.e.,
ni ∝ N i
or ni = CN i
where C is the constant of proportionality.
k k
∑ ni = ∑ CNi
=i 1 =i 1
or n = CN
n
⇒ C =.
N
n
Thus ni = N i .
N
Such allocation arises from the considerations like operational convenience.
8
3. Neyman or optimum allocation
This allocation considers the size of strata as well as variability
ni ∝ N i Si
ni = C * N i Si
where C* is the constant of proportionality.
k k
i ∑ n = ∑C N S
=i 1 =i 1
*
i i
k
or n = C * ∑ N i Si
i =1
n
or C * = k
∑N S
i =1
i i
nN i Si
Thus ni = k
.
∑N S
i =1
i i
k
This allocation arises when the Var ( yst ) is minimized subject to the constraint ∑n
i =1
i (prespecified).
There are some limitations of the optimum allocation. The knowledge of Si (i = 1, 2,..., k ) is needed to
know ni . If there are more than one characteristics, then they may lead to conflicting allocation.
where
C : total cost
C0 : overhead cost, e.g., setting up of office, training people etc
To find ni under this cost function, consider the Lagrangian function with Lagrangian
multiplier λ as
9
φ= Var ( yst ) + λ 2 (C − C0 )
2 1 1 2
k k
∑
= wi − i
S + λ 2
∑ Ci ni
i 1= i
n N i i 1
2 2
k
wS k k
w2 S 2
= ∑ i i + λ 2 ∑ Ci ni − ∑ i i
=i 1 ni =i 1 =i 1 N i
2
kw S
= ∑ i i − λ Ci ni + terms independent of ni .
ni
i =1
How to determine λ ?
There are two ways to determine λ .
(i) Minimize variability for fixed cost .
(ii) Minimize cost for given variability.
We consider both the cases.
∑ Ci wi Si
or λ = i =1
.
C0*
10
1 wi Si
Substituting λ in the expression for ni = , the optimum ni is obtained as
λ Ci
wS C0*
ni* = i i k .
Ci
∑ Ci wi Si
i =1
The required sample size to estimate Y such that the variance is minimum for given cost C = C0* is
k
n = ∑ ni* .
i =1
k
wi Si ∑
wi Si Ci
ni = i =1 .
Ci k
wi2 Si2
0 ∑ N
V +
i =1 i
So the required sample size to estimate Y such that cost C is minimum for a
k
prespecified variance V0 is n = ∑ ni .
i =1
11
Sample size under proportional allocation for fixed cost and for fixed variance
k
(i) If cost C = C0 is fixed then C0 = ∑C n .
i =1
i i
n
Under proportional allocation,=ni = N i nwi
N
k
So C0 = n ∑ wi Ci
i =1
C0
or n = k .
∑wC
i =1
i i
Co wi
Thus ni = .
∑ wiCi
k
The required sample size to estimate Y in this case is n = ∑ ni .
i =1
∑w S 2
i i
2
or n = i =1
k
wi2 Si2
V0 + ∑
i =1 Ni
k
∑w S i i
2
or ni = wi i =1
.
w2 S 2 k
V0 + ∑ i i
i =1 Ni
This is known Bowley’s allocation.
12
Variances under different allocations
Now we derive the variance of yst under proportional and optimum allocations.
∑N S
i =1
i i
2 2 k
1 1
Vopt=
( yst )
i =1 i
∑ n
wi Si −
Ni
2 2
k
wS k
w2 S 2
= ∑ i i −∑ i i
=i 1 = ni i 1 Ni
k
k ∑ N i Si k w2 S 2
∑ wi2 Si2 i =1
−∑ i i
=i 1 = nN i i
S i 1 Ni
1 N i Si k
k
k wi2 Si2
∑ n . N 2 ∑
=i 1 =
N i Si − ∑
i 1= i 1 Ni
2 2
1 k N i Si k
wi2 Si2 1 k 1 k
= ∑ − ∑ = ∑ wi S i − ∑ wi Si2 .
=n i 1=
N i1 N = i n i 1 = N i1
13
Comparison of variances of sample mean under SRS with stratified
mean under proportional and optimal allocation:
(a) Proportional allocation:
N −n 2
VSRS ( y ) = S
Nn
N − n k N i Si2
V p r op ( yst ) = ∑ .
Nn i =1 N
In order to compare VSRS ( y ) and V prop ( yst ), first we attempt to express S 2 as a function of Si2 .
Consider
k Ni
( N − 1)=
S2 ∑ ∑ (Y
=i 1 =j 1
ij − Y )2
k Ni 2
= ∑ ∑ (Y
=i 1 =j 1
ij − Yi ) + (Yi − Y )
k Ni k Ni
= ∑
=i 1 =j 1
∑ (Yij − Yi )2 + ∑ =i 1 =j 1
∑ (Y − Y ) i
2
k k
i i
2
=i 1 =i 1
= ∑ (N − 1) S + ∑ N (Y − Y )
i i
2
N − 1 2 k Ni − 1 2 k
N
S = ∑ Si + ∑ i (Yi − Y ) 2 .
=N N
i 1= i 1 N
Ni − 1 N −1
≈ 1 and ≈1.
Ni N
Thus
k
Ni 2 k Ni
N
=i 1 =
Si + ∑
S2 = ∑
i 1 N
(Yi − Y ) 2
N − n 2 N − n k Ni 2 N − n k Ni N -n
or
Nn
=
S = ∑
Nn i 1 =
N
Si + ∑
Nn i 1 N
(Yi − Y ) 2 (Premultiply by
Nn
on both sides)
N −n k
VarSRS (Y ) =
V prop ( y st ) +
Nn
∑ w (Y − Y )
i =1
i i
2
k
Since ∑ w (Y − Y )
i =1
i i
2
≥ 0,
N − n k 2
1 k
2
1 k
V prop ( yst ) −=
Vopt ( yst ) ∑ i i
w S − ∑ wi i
S − ∑ wi Si2
= Nn i 1 = n i 1 = N i1
1 k
2
k
= ∑ wi Si − ∑ wi Si
2
=n i 1 = i1
1 k 1
= ∑
n i =1
wi Si2 − S 2
n
1 k
= ∑ wi (Si − S )2
n i =1
where
k
S = ∑ wi Si
i =1
1 ni
=si2 ∑
ni − 1 j =1
( yij − yi ) 2 .
In stratified sampling,
k
N i − ni 2
Var ( yst ) = ∑ wi2 Si .
i =1 N i ni
15
(y ) =
k
N −n
Var st ∑
i =1
wi2 i i si2
N i ni
wi2 si2 k wi2 si2
k
= ∑ −∑
=i 1 =ni i 1 Ni
wi2 si2 1 k
k
=
=i 1 =ni
∑ − ∑ wi si2
N i1
The second term in this expression represents the reduction due to finite population correction.
The confidence limits of Y can be obtained as
(y )
yst ± t Var st
assuming yst is normally distributed and ( y ) is well determined so that t can be read from
Var st
normal distribution tables. If only few degrees of freedom are provided by each stratum, then t values
are obtained from the table of student’s t-distribution.
( y ) is
number of degrees of freedom (ne ) to Var st
2
k 2
∑ gi si
ne = i =k1 2 4
gi si
∑
i =1 ni − 1
N i ( N i − ni )
where gi =
ni
k
and Min(ni − 1) ≤ ne ≤ ∑ (n − 1)
i =1
i
16
Modification of optimal allocation
Sometimes in the optimal allocation, the size of subsample exceeds the stratum size. In such a case,
replace ni by N i
n1 = N1
and
(n − N1 ) wi Si
=ni = k
; i 2,3,..., k
∑ wi Si
i =2
Suppose in revised allocation, we find that n2 > N 2 then the revised allocation would be
n1 = N1
n2 = N 2
(n − N1 − N 2 ) wi Si
=ni = k
; i 3, 4,..., k .
∑ wi Si
i =3
In such cases, the formula for minimum variance of yst need to be modified as
(∑ * wi Si ) 2 ∑ *
wi Si2
=
Min Var ( y st ) −
n* N
where ∑ *
denotes the summation over the strata in which ni ≤ N i and n* is the revised total sample
where Qi = 1 − Pi .
k
N i − ni 2 2
Also Var ( yst ) = ∑ wi Si .
i =1 N i ni
1 k
N i2 ( N i − ni ) PQ
So Var ( pst ) = 2
N
∑
i =1 N − 1
i i
n
.
i i
18
N − n 1 k N i2 PQ
Varprop ( pst ) = ∑ i i
N Nn i =1 N i − 1
N −n k
= ∑ wi PQ
Nn i =1
i i
prop ( p ) = N − n w pi qi .
k
Var st ∑ i
Nn i =1 ni − 1
The best choice of ni such that it minimizes the variance for fixed total sample size is
N i PQ
ni ∝ N i i i
Ni − 1
= N i PQ
i i
N i PQ
Thus ni = n k
i i
.
∑N
i =1
i PQ
i i
k
= C0 + ∑ Ci ni is
Similarly, the best choice of ni such that the variance is minimum for fixed cost C
i =1
PQ
i i
nN i
Ci
ni = k
.
PQ
∑N
i =1
i
i i
Ci
k Ni 2
= ∑∑ (Y
=i 1 =j 1
ij − Yi ) + (Yi − Y )
k Ni k
= ∑∑ (Yij − Y )2 + ∑ Ni (Yi − Y )2
=i 1 =j 1 =i 1
k k
i i
2
=i 1 =i 1
= ∑ (N − 1) S + ∑ N i (Yi − Y ) 2
k k
i i
2
=i 1 =i 1
= ∑ (N − 1) S + N ∑ wiYi 2 − Y 2 .
In order to estimate S 2 , we need to estimates of Si2 , Yi 2 and Y 2 . We consider their estimation one by
one.
E ( si2 ) = Si2
So Sˆi2 = si2 .
=
Var ( yi ) E ( yi 2 ) − [ E ( yi )]2
= E ( yi 2 ) − Yi 2
=
or Yi 2 E ( yi 2 ) − Var ( yi ).
An unbiased estimate of Yi 2 is
Yˆi=
2 (y )
yi2 − Var i
N −n 2
= yi2 − i i si .
N i ni
Var=
( yst ) E ( yst2 ) − [ E ( yst )]2
= E ( yst2 ) − Y 2
⇒ Y 2= E ( yst2 ) − Var ( yst )
So an estimate of Y 2 is
20
Yˆ=
2 (y )
yst2 − Var st
k
N −n 2 2
= yst2 − ∑ i i wi si .
i =1 N i ni
=i 1 =i 1
∑ (N
− 1) S + N ∑ wi Yi 2 − Y 2
1 k N k
as =
=
Sˆ 2 ∑ ( N i − 1) Sˆi2 +
N −1 i 1= ∑
N −1 i 1
w iYˆi 2 − Yˆ 2
1 k 2 N k 2 N i − ni 2 2 k N i − ni 2 2
= ∑ (
N − 1 i 1 =
N i − 1) si +
N −1
∑ w
i 1
i yi − si − yst − ∑
N i ni
wi si
= = i 1 N i ni
1 k 2 N k k
N − ni 2
= ∑
N −1 i 1
( N i − 1) si + ∑ i iw (
N − 1 i 1 =i 1
y − y st ) 2
− ∑ wi (1 − wi ) i si .
= = N i ni
Thus
SRS ( y ) = N − n Sˆ 2
Var
Nn
N −n k 2 N ( N − n) k k
N −n
= ∑
N ( N − 1)n i 1
( N i − 1) si
=
+ ∑ i iw (
nN ( N − 1) i 1 =i 1
y − y st ) 2
− ∑ wi (1 − wi ) i i si2
= N i ni
and
( y ) = N i − ni 2 2
k
Var st ∑
i =1 N i ni
wi si .
Interpenetrating subsampling
Suppose a sample consists of two or more subsamples which are drawn according to the same
sampling scheme. The samples are such that each subsample yields an estimate of parameter. Such
subsamples are called interpenetrating subsamples.
21
The subsamples need not necessarily be independent. The assumption of independent subsamples
helps in obtaining an unbiased estimate of the variance of the composite estimator. This is even
helpful if the sample design is complicated and the expression for variance of the composite estimator
is complex.
(t ) 1 g
E Var
= E ∑ (t j − θ ) 2 − g ( t − θ ) 2
g ( g − 1)
j =1
1 g
= ∑ Var (t j ) − g Var ( t )
g ( g − 1) j =1
1
= ( g 2 − g )Var ( t =) Var ( t )
g ( g − 1)
If the distribution of each estimator t j is symmetric about θ , then the confidence interval of θ can be
obtained by
g −1
1
P Min(t1 , t2 ,..., t g ) < θ < Max(t1 , t2 ,..., t g ) =1 − .
2
Implementation of interpenetrating subsamples in stratified sampling
Consider the set up of stratified sampling. Suppose that each stratum provides an independent
interpenetrating subsample. So based on each stratum, there are L independent interpenetrating
subsamples drawn according to the same sampling scheme.
22
Let Yˆij (tot ) be an unbiased estimator of the total of jth stratum based on the ith subsample ,
i = 1,2,...,L; j = 1,2,...,k.
Post Stratifications
Sometimes the stratum to which a unit belongs may be known after the field survey only. For
example, the age of persons, their educational qualifications etc. can not be known in advance. In such
cases, we adopt the post stratification procedure to increase the precision of the estimates.
23
Note: This topic is to be read after the next module on ratio method of estimation. Since it is related to
the startification, so it is given here.
In post stratification,
• draw a sample by simple random sampling from the population and carry out the survey.
• After the completion of survey, stratify the sampling units to increase the precision of the
estimates.
Assume that the stratum size N i is fairly accurately known. Let
∑m
i =1
i = n.
Note that mi is a random variable (and that is why we are not using the symbol ni as earlier).
Assume n is large enough or the stratification is such that the probability that some mi = 0 is
negligibly small. In case, mi = 0 for some strata, two or more strata can be combined to make the
sample size non-zero before evaluating the final estimates.
24
1 1
To find E − , proceed as follows :
mi N i
Consider the estimate of ratio based on ratio method of estimation as
n N
∑ yj ∑Y j
=j 1 =j 1ˆ y=
R= ,
Y
R= = .
n N
∑x ∑X
x X
j j
=j 1 =j 1
We know that
N − n RS X2 − S XY
E ( Rˆ ) − R = . .
Nn X2
1 if j th unit belongs to i th stratum
Let xj =
0 otherwise
and
y j = 1 for all j = 1,2,...,N.
∑y
j =1 n
j
=Rˆ = n
∑ xj i
n
j =1
N
∑y
j =1 N
j
=R =N
∑ xj i
N
j =1
1 N 2 2 1 N i2 1 Ni 2
S= ∑ j − = − = −
2
x Nx N N i
N
N − 1 j =1 N −1 N 2 N −1
x
N
1 N 1 N N
S= ∑ x j y j − Nx y=
Ni − i 2 = 0.
N − 1 j =1 N −1 N
xy
25
n N N ( N − n)( N − N i )
E ( Rˆ ) −=
R E −= .
ni N i nN i
2
( N − 1)
Thus
1 1 N N ( N − n)( N − N i ) 1
E − = + −
ni N i nN i n 2 N i2 ( N − 1) Ni
( N − n) N N 1
= 1 + − .
n( N − 1) N i N i n n
1 1 ( N − n) N N 1
E = − 1 + −
mi N i n( N − 1) N i N i n n
Now substitute this in the expression of Var ( y post ) as
1 1 2 k
= E − Si
Var ( y post ) ∑w 2
i
i =1 mi N i
k N −n N N 1
= ∑ wi2 Si2 . 1 + −
i =1 ( N − 1)n N i nN i n
N −n k 2 2 1 1 1
= ∑
n( N − 1) i =1
wi Si 1 + −
wi nwi n
N −n k 1
= 2 ∑
n ( N − 1) i =1
wi Si2 n − 1 +
wi
N −n k
= ∑
n ( N − 1) i =1
2
(nwi + 1 − wi ) Si2
N −n k N −n k
n ( N −
=
1)
=i 1 =i 1
∑ w S
i i
2
+
n 2
( N − 1)
∑ (1 − wi ) Si2 .
Assuming N − 1 ≈ N .
N −n n 2 2 N −n n
V ( y post=
=
) ∑
Nn i 1 =
wi Si + 2 ∑ (1 − wi ) Si2
n N i1
N −n n
=V prop ( yst ) + ∑ (1 − wi )Si2 .
Nn 2 i =1
The second term is the contribution in the variance of y post due to mi ' s not being proportionately
distributed.
If Si2 ≈ S w2 , say for all i, then the last term in the expression is
26
N −n k N −n 2 k
∑ (1
Nn 2 i 1 =
− wi =
) S 2
w
Nn 2
S w ( k − 1) (Since ∑
i 1
=wi 1)
k − 1 N − n 2
= Sw
n Nn
k −1
= Var ( yst ).
n
n
The increase in the variance over Varprop ( yst ) is small if the average sample size n = per stratum is
2
reasonably large.
Thus a post stratification with a large sample produces an estimator which is almost as precise as an
estimator in the stratified sampling with proportional allocation.
27