0% found this document useful (0 votes)
444 views10 pages

Design Analys Sample Survey

1) Stratified random sampling divides a heterogeneous population into homogeneous strata before sampling to improve precision. 2) The population is divided into mutually exclusive strata, and a simple random sample is drawn from each stratum. 3) The overall population mean can be estimated as a weighted average of the stratum means, where the weights are the stratum sizes. This overall estimator is unbiased.

Uploaded by

EPAH SIRENGO
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
444 views10 pages

Design Analys Sample Survey

1) Stratified random sampling divides a heterogeneous population into homogeneous strata before sampling to improve precision. 2) The population is divided into mutually exclusive strata, and a simple random sample is drawn from each stratum. 3) The overall population mean can be estimated as a weighted average of the stratum means, where the weights are the stratum sizes. This overall estimator is unbiased.

Uploaded by

EPAH SIRENGO
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

STA 3105 Design and Analysis of Sample Survey

3. STRATIFIED RANDOM SAMPLING


The objective of any sampling method is usually to estimate the unknown population pa-
rameters with the highest precision i.e the variance of the estimators should be minimized.
If the population is heterogeneous as will be in most situations then a sample taken via SRS
might yield high levels of variability . As a result in a survey where precision is a main factor
to be considered, then a strategy that addresses heterogeneity must be found. One way of
achieving higher precision is to divide the population which is originally heterogeneous into
sub population which are to a big extent homogeneous with respect to survey characteristics.
These sub population are known as strata. Once the strata have been formed, we proceed to
take a SRS from each stratum. From the sample taken from each of the stratum we compute
sample means and sample variances thus we obtain estimators with better precision.
Notations used:
Let population consist of N units. Divide the population into k strata in which the
ith stratum consists of Ni units P
Let the strata be mutually disjoint so that i Ni = N1 + N2 + · · · + Nk = N
After formation of the strata, let the sample sizes from each stratum be ni such that
Department of Actuarial

n1 + n2 + · · · + nk = n
where n was the required size for the entire size for the entire study.
Let Yij be the value of the characteristic for the j th unit in the ith stratum.
Science and Statistics

PNi
let ȳi be the ith stratum population mean this implies thatȲi = N1i j=1 Yij
ni
let ȳi be the sample mean of the ith stratum, ȳi = n1i j=1
P
Yij
Pni 2
let Si2 be the population variance in the ith stratum, Si2 = ni1−1 j=1 (yij − ȳi )
JKUAT is ISO:2008 certified

Now the description from stratified random sampling, it is clear that.


E(ȳi ) = Ȳi
E(s2i ) = Si2
var(ȳi ) = NNi i−n 2
ni Si
i

proposition
ȳw defined as
k
X Ni
ȳw = ȳi
i=1
N
k
X
= wi ȳi
i=1

can be used as an estimator of the overall population meanȲ


Theorem
ȳw is an unbiased estimator of Ȳ in the context of stratified random sampling
P 
k
Proof: if ȳw is unbiased for Ȳ this implies E(ȳw ) = Ȳ . Now E(ȳw ) = E i=1 wi ȳi where
wi = NN which
i
is the stratum weight.
JJ II Pk Pk Pk
E(ȳw ) = i=1 wi E(ȳi ) = i=1 wi Ȳi = i=1 N 1
Pk PNi
N Ȳi = N j=1 Yij = Ȳ
i
i=1
J I which completes the proof.
J Doc Doc I Next suppose we are interested in knowing the precision involved in using ȳ then we
require var(ȳw )
Back Close

Dr. Orwa G. O. Department of Actuarial Science and Statistics 13


STA 3105 Design and Analysis of Sample Survey

k
!
X
var(ȳw ) = var wi ȳi
i=1
k
X
= wi2 var(ȳi )
i=1
k
Ni − ni s2i
X  
= wi2 ·
i=1
Ni ni
k  
1 X Ni − ni
= Ni s2i
N 2 i=1 ni
k
1 X S2
= Ni (Ni − ni ) i
N i=1 ni

and we estimate Si2 by s2i .


Department of Actuarial

We note that V ar(ȳw ) depends on s2i which is the variance of theith stratum the impli-
cation of this is that the more homogenous the strata within the greater the precision of the
weighted mean.
Science and Statistics

3.1. ALLOCATION OF SAMPLE SIZES IN STRATIFIED SAMPLING


It has been seen that the sample sizes contributes to the overall precision when estimating
JKUAT is ISO:2008 certified

the population mean of stratified sampling because of this, procedures of allocating sample
sizes must be statistically sound. We have established that the variance of var(ȳw ) =
2
Pk Ni −ni Si
i=1 wi ( Ni ) ni which depends on ni , usually ni can be fixed by the sampler at will, but
there exist two way main ways of fixing this sample size.
• Proportional allocation
• Optimal allocation

PROPORTIONAL ALLOCATION
Under this allocation scheme the idea is to allocate the sample in such a way that the sample
ni
fraction for each stratum is constant, i.e fi = N i
is constant
n1 n2 nk
where i = 1, 2, . . . , k this implies that. N1 = N = ··· = N is constant
Pk PN 2 k

ni = CNi and accordingly i=1 ni = c i=1 Ni


n
n = CN , thus c = N this is usual sampling fraction . So it is clear that ni ∝ Ni
and hence the name proportional allocation. in other words each stratum is represented
according to its size and we may use this description to have that:
k
JJ II X Ni − ni Si2
var(ȳw ) = wi2 ( )
J I i=1
Ni ni
J Doc Doc I But ni = CNi and we may therefore substitute right hand side into
Back Close   k
Ni − n i X
var(ȳw )prop = wi Si2
Ni i=1

Dr. Orwa G. O. Department of Actuarial Science and Statistics 14


STA 3105 Design and Analysis of Sample Survey

OPTIMUM ALLOCATION
Here there are two approaches
• minimize variance subject to a fixed sample size or subject to a fixed cost.
• minimize the total cost subject to a fixed variance.
The allocation of ni to different strata in accordance with the approaches is usually referred
as optimum
Pk allocation. In both schemes we consider the simplest cost function defined as
c = a + i=1 ci ni where a is the standing or fixed cost where ci is the cost of sampling
within stratum i.
COMPARING SRS WITH STRATIFIED RANDOM SAMPLING WITH PRO-
PORTIONAL ALLOCATION
 Pk
We have that var(ȳw )prop = n1 − N1 1 1
2
 2
i=1 wi Si and var(ȳ)SRSW OR = n − N S Now
from the first principles, in a stratified scheme the overall variance should be computed as
follows.
Department of Actuarial

N k
1 XX 2
S2 = Yij − Ȳ
N − 1 i=1 j=1
Science and Statistics

N k
1 XX 2
= Yij − Ȳi + Ȳi − Ȳ
N − 1 i=1 j=1
N k
JKUAT is ISO:2008 certified

1 XXh 2   2 i
= Yij − Ȳi + 2 Yij − Ȳi Ȳi − Ȳ + Ȳi − Ȳ
N − 1 i=1 j=1
 
N k N
1 X X 2 X 2
= Yij − Ȳi + Ni Ȳi − Ȳ 
N − 1 i=1 j=1 i=1

N N
X X 2
⇒ (N − 1) S 2 = (Ni − 1) Si2 + Ni Ȳi − Ȳ
i=1 i=1
In the above equation we assume that Ni and N are sufficiently large such that Ni − 1 '
Ni and that N − 1 ' N so we may write,
N N
X X 2
N S2 = Ni Si2 + Ni Ȳi − Ȳ
i=1 i=1
This implies
N N
2
X X 2
S = wi Si2 + wi Ȳi − Ȳ
i=1 i=1
JJ II but  
1 1
J I var(ȳ)SRSW OR = − S2
n N
J Doc Doc I and so
  "X
N N
#
Back Close 1 1 X 2
var(ȳ)SRSW OR = − wi Si2 + wi Ȳi − Ȳ
n N i=1 i=1
 N
X  N
X
1 1 1 1 2
= − wi Si2 + − wi Ȳi − Ȳ
n N i=1
n N i=1
= var(ȳw )prop + · · ·

Dr. Orwa G. O. Department of Actuarial Science and Statistics 15


STA 3105 Design and Analysis of Sample Survey

We conclude that
var(ȳw )prop ≤ var(ȳ)SRSW OR
Neyman Vs Proportional allocation
P 2
N
i=1 wi Si PN
wi Si2
i=1
var(ȳ)N eyman = −
n N

PN N
!2
wi Si2
i=1 1 X
var(ȳw )prop − var(ȳ)N eyman = − wi Si
N N i=1
N
1 X
wi Si2 − wi Si S̄

=
N i=1
N
1 X  2
wi Si − 2Si S̄ + S̄ 2 + Si S̄ − S̄ 2

=
N i=1
Department of Actuarial

N N
1 X 2 1 X
wi Si S̄ − S̄ 2

= wi Si − S̄ −
N i=1 N i=1
Science and Statistics

It can be shown that


N
1 X 2
wi Si − S̄ ≥ 0
N i=1
JKUAT is ISO:2008 certified

N
1 X 2
var(ȳw )prop − var(ȳ)N eyman = wi Si − S̄
N i=1
⇒ var(ȳw )prop ≥ var(ȳ)N eyman
Then we can conclude that Neyman allocation is more efficient strategy than proportional
allocation.
Exercise 1.
A sample of 30 students is to be drawn from a population of 300 students belonging to
two colleges A and B. The means and standard deviations of their marks are given below;
Total ȳi Si
number of
students
College A 200 30 10
College B 100 60 40
Use the information to confirm that Neyman’s allocation scheme is a more efficient
scheme when compared to proportional allocation.
JJ II Exercise 2.
J I Investigate between var(ȳw )N eyman and var(ȳw )SRSW OR
J Doc Doc I
Back Close

Dr. Orwa G. O. Department of Actuarial Science and Statistics 16


STA 3105 Design and Analysis of Sample Survey

3.2. MINIMIZING VARIANCE


MINIMIZING VARIANCE SUBJECT TO FIXED SAMPLE SIZES
First we consider a case where the sample size is fixed. Then the entire procedure is due
to Neyman (Hence minimizing variance subject to fixed sample sizes is known as Neyman
allocation)
In this case, the task will be to minimize
k
1 X Si2
var(ȳw ) = N i (N i − n i )
N 2 i=1 ni
Pk
subject to the condition that i=1 ni = n (i.e predetermined or fixed prior sampling)
We wish to minimize unconditionally a function which could be built up from var(ȳw )
and given condition. To achieve
P this we make  use of Lagrangian procedure. In this we may
k
write that φ = var(ȳw ) + λ i=1 in − n where λis the Lagrange multiplier.
Accordingly for the minimum.
∂φ
∂ni = 0
Department of Actuarial

such that
( k k
!)
∂φ ∂ 1 X Si2 X
= Ni (Ni − ni ) +λ ni − n
Science and Statistics

∂ni ∂ni N 2 i=1 ni i=1


S12 S22 Si2 Sk2
 
∂ 1
= N1 (N1 − n1 ) + N2 (N2 − n2 ) + · · · + Ni (Ni − ni ) + · · · Nk (Nk − nk )
∂ni N 2 n1 n2 ni nk
JKUAT is ISO:2008 certified


+ (λ (n1 + n2 + · · · + ni · · · nk − n))
∂ni
−Ni2 Si2
 
1
= +λ
N2 n2i
−wi2 Si2
⇒ n2i
+λ =0
2 2
w S
⇒ n2i = iλ i
⇒ ni = w√i Sλi
2
∂ φ
Now we observe that ∂n 2 ¿0
i
Which means that this minimizes var(ȳw ). The only remaining task is to find the value
of λwhich is still an unknown constant. To compute λ we may take the summations of both
sides of ni = w√i Sλi
i.e
k k
X X wi Si
ni = √
i=1 i=1
λ
k
JJ II X wi Si
n= √
J I i=1
λ
J Doc Doc I Pk !2
i=1 wi Si
Back Close λ=
n
Pk
√ wi Si
λ = i=1
n

Dr. Orwa G. O. Department of Actuarial Science and Statistics 17


STA 3105 Design and Analysis of Sample Survey

which may be referred to the equation of ni yielding


wi Si
ni = √
λ
wi Si
= Pk
i=1wi Si
n
n (wi Si )
= Pk
i=1 wi Si
substituting this result in ni into var(ȳw ) we obtain

k
1 X Si2
var(ȳw )opt = N i (N i − n i )
N 2 i=1 ni
k  
1 X Ni
= 2
Ni − 1 Si2
N i=1 ni
Department of Actuarial

 
k
1 X  Ni
= Ni − 1 Si2
N 2 i=1 n(w S )
Pk i i
Science and Statistics

i=1 wi Si
( k )2 k
1 X 1 X
= wi Si − wi Si2
n i=1 N i=1
JKUAT is ISO:2008 certified

MINIMIZING VARIANCE FOR A FIXED COST


Pk
Here we consider the fixed cost to be C = a + i=1 ci ni here we require the minimum
Pk
variance such that C = a + i=1 ci ni we convert this conditional minimization case to an
unconditional case so that we write.
k
!
X
φ = var(ȳw ) + µ a + ci ni − C
i=1
where µis a Lagrange multiplier. So we minimize unconditionally:

k  2  k
!
1 Si2

X Ni 1 X
φ = − +µ a+ ci n i − C
i=1
N ni Ni ni i=1
k  
X wi Si √
= √ + µci ni − 2wi Si µci + ci . . .
i=1
ni
k 
X wi Si √ 2
JJ II = √ − µci ni + · · ·
i=1
ni
J I
w √
J Doc Doc I Now φwould be minimum when √i Si = µci ni and accordingly ni = wi Si

ni µci
Back Close
so
k k
X X wi Si
ni = √
i=1 i=1
µci
k
X wi Si
n= √
i=1
µci

Dr. Orwa G. O. Department of Actuarial Science and Statistics 18


STA 3105 Design and Analysis of Sample Survey

k
√ 1 X wi Si
µ= √
n i=1 ci
so that
w
√i Si
ci
ni = 1
Pk wi Si

n i=1 ci
N
√i Si
n· ci
= Pk N
√i Si
i=1 ci
N
√i Si
meaning that ni ∝ ci

• The larger the size of the stratum the larger should be the size of the sample selected
from.

• The larger the variability within the stratum the larger should be the size of the sample
Department of Actuarial

selected from that stratum.


• The cheaper the cost of sampling within the ith stratum the larger should be the size
Science and Statistics

of the sample from that stratum.

Example.
A stratified population has 5 strata. The stratum sizes Ni and means Ȳi and Si2 of some
JKUAT is ISO:2008 certified

variable Y are as follows;


Stratum Ni Ȳi Si2
1 117 7.3 1.31
2 98 6.9 2.03
3 74 11.2 1.13
4 41 9.1 1.96
5 45 9.6 1.74
calculate the overall population mean and variance
For a stratified simple random sample of size 80, determine the appropriate stratum
sample sizes under Proportional allocation and Neyman allocation.
Solution: P
5
ȳstr = h=1 wh ȳh

5
X
N = Nh
h=1
= 117 + 98 + 74 + 41 + 45
JJ II
= 375
J I
ȳ1 = 7.3, ȳ2 = 6.9,ȳ3 = 11.2, ȳ4 = 9.1,ȳ5 = 9.6
J Doc Doc I
Back Close
Nh 117 98 74 41 45
wh = N , w1 = 375 ,w2 = 375 , w3 = 375 ,w4 = 375 ,w5 = 375

Overall population mean:

Dr. Orwa G. O. Department of Actuarial Science and Statistics 19


STA 3105 Design and Analysis of Sample Survey

5
X
ȳstr = wh ȳh
h=1
     
117 98 74
= 7.3 + 6.9 + 11.2
375 375 375
   
41 45
+ 9.1 + 9.6
375 375
= 8.4378
≈ 8.44
S2
P5  
Nh −nh
var (ȳstr ) = h=1 wh Nh
nh
h

Pn
n Pn yi
S 2 = N 1−1 1
P 2 2

i=1 yi − N ȳ but ȳ = i=1 yi , ⇒ n =
i=1
n ȳ

117 98 74 41 45
n1 = ≈ 16,n2 = ≈ 14, n3 = ≈ 7,n4 = ≈ 5,n1 = ≈5
Department of Actuarial

7.3 6.9 11.2 9.1 9.6

Then population variance:


Science and Statistics

 2    2  
117 117 − 16 1.31 98 98 − 14 2.03
var (ȳstr ) = +
375 117 16 375 98 14
 2    2  
74 − 7 1.13 41 − 5 1.96
JKUAT is ISO:2008 certified

74 41
+ +
375 74 7 375 41 5
 2  
45 45 − 5 1.74
+
375 45 5
= 0.02962
Under proportional allocation:

80 80 80
n1 = 375 (117) = 24.96 n2 = 375 (98) = 20.9 n3 = 375 (74) = 15.78

80 80
n4 = 375 (41) = 8.74n5 = 375 (45) = 9.6

Under Neyman allocation:


wh Sh
nh = 1
P5
n h=1 wh Sh
JJ II
5
117 √ 98 √ 74 √

    
J I X
wh Sh = 1.31 + 2.03 + 1.13
J Doc Doc I 375 375 375
h=1
41 √ 45 √
   
Back Close
+ 1.96 + 1.74
375 375
= 1.23

Dr. Orwa G. O. Department of Actuarial Science and Statistics 20


STA 3105 Design and Analysis of Sample Survey

1 1
(1.23) = ∗ 1.23
n 80
= 0.015375

√ √ √
117
375 1.31)
( 98
375 (2.03) 74
375 1.13)
(
n1 = 0.015375 = 21.57, n2 = 0.015375 = 24.2, n3 = 0.015375 = 13.64,

√ √
41
375 (
1.96) 45
375 (1.74)
n4 = 0.015375 = 9.96 n5 = 0.015375 = 10.29


Exercise 3.
All the farms in a country are stratified by farm size and mean number of hectares of
wheat per farm in each stratum, with the following results.
Farm size No. of farms Mean wheat standard
Department of Actuarial

(hectares) (hectares) deviation


0-20 368 2.7 2.1
21-40 425 8.1 3.6
Science and Statistics

41-60 389 12.1 3.9


61-80 316 16.9 5.1
81-100 174 20.8 6.1
JKUAT is ISO:2008 certified

101-120 98 25.2 6.5


121+ 138 31.8 9.1
For a sample of 100 farms, compute the sizes in each stratum under stratified simple
random sampling with;
Proportional allocation
Neyman allocation.

JJ II
J I
J Doc Doc I
Back Close

Dr. Orwa G. O. Department of Actuarial Science and Statistics 21


STA 3105 Design and Analysis of Sample Survey

4. RATIO METHOD OF ESTIMATION


In developing the theory of sample surveys most cases have considered only estimates based
on simple averages of sample values. There are other methods however which make use of
auxiliary information and which under certain situations give more reliable estimates of the
population parameters.
One of such methods is the ratio method of estimation which forms a basis for all other
methods that use auxiliary information.
Yi is the survey measurement for the ith unit of the population.
Xi is the value of the auxiliary information or measurement for the ith unit (i = 1, 2, . . . , N ).
Note Xi are assumed known for all the units in the population.
yi is the measurement for the ith unit in the sample.
xi is the value of the auxiliary information for the ith unit in the sample.
n n
yi 1 X yi 1X
ri = ⇒ r̄ = = ri
xi n i=1 xi n i=1
Yi
Ri =
Department of Actuarial

Xi


RN =
Science and Statistics


PN
i=1 Yi
= PN
i=1 Xi
JKUAT is ISO:2008 certified

which gives the population ratio.


usually we estimate RN using Rn which is defined as:
Pn
ȳ yi
Rn = = Pni=1
x̄ x
i=1 i

Rn will provide an estimate of the population value and the product of Rn and X̄ provides
an estimation of the population meanȲ .
ȳR = X̄Rn = X̄ xynn
ȳR will be used as an estimator of Ȳ and throughout we shall assume the knowledge of
X̄. In ratio estimation, we shall assume that Xi0 s and Yi0 s are always positively correlated.
Exercise 4.
Check why in ratio estimation the survey measurement and the auxiliary measurement
should always be positively correlated for all i = 1, 2, . . . , N .

EXPECTED VALUE OF ȳR We shall 1st that ȳR 6= 0 and that Bias(ȳR ) 6= 0. Observe
that Bias(ȳR ) = E ȳR − Ȳ since ȳR is used as an estimator of Ȳ . But
 
JJ II E ȳR − Ȳ = E X̄Rn − X̄RN
= X̄E (Rn − RN )
J I
= X̄ (E (Rn ) − RN )
J Doc Doc I
E(ȳn ) ȳ
Back Close Now RN = E(x̄n ) = x̄
Thus
 
 E (ȳ)
E ȳR − Ȳ = X̄ E (Rn ) −
E (x̄)
 
E (Rn x̄)
= X̄ E (Rn ) −
E (x̄)

Dr. Orwa G. O. Department of Actuarial Science and Statistics 22

You might also like