Design Analys Sample Survey
Design Analys Sample Survey
n1 + n2 + · · · + nk = n
where n was the required size for the entire size for the entire study.
Let Yij be the value of the characteristic for the j th unit in the ith stratum.
Science and Statistics
PNi
let ȳi be the ith stratum population mean this implies thatȲi = N1i j=1 Yij
ni
let ȳi be the sample mean of the ith stratum, ȳi = n1i j=1
P
Yij
Pni 2
let Si2 be the population variance in the ith stratum, Si2 = ni1−1 j=1 (yij − ȳi )
JKUAT is ISO:2008 certified
proposition
ȳw defined as
k
X Ni
ȳw = ȳi
i=1
N
k
X
= wi ȳi
i=1
k
!
X
var(ȳw ) = var wi ȳi
i=1
k
X
= wi2 var(ȳi )
i=1
k
Ni − ni s2i
X
= wi2 ·
i=1
Ni ni
k
1 X Ni − ni
= Ni s2i
N 2 i=1 ni
k
1 X S2
= Ni (Ni − ni ) i
N i=1 ni
We note that V ar(ȳw ) depends on s2i which is the variance of theith stratum the impli-
cation of this is that the more homogenous the strata within the greater the precision of the
weighted mean.
Science and Statistics
the population mean of stratified sampling because of this, procedures of allocating sample
sizes must be statistically sound. We have established that the variance of var(ȳw ) =
2
Pk Ni −ni Si
i=1 wi ( Ni ) ni which depends on ni , usually ni can be fixed by the sampler at will, but
there exist two way main ways of fixing this sample size.
• Proportional allocation
• Optimal allocation
PROPORTIONAL ALLOCATION
Under this allocation scheme the idea is to allocate the sample in such a way that the sample
ni
fraction for each stratum is constant, i.e fi = N i
is constant
n1 n2 nk
where i = 1, 2, . . . , k this implies that. N1 = N = ··· = N is constant
Pk PN 2 k
OPTIMUM ALLOCATION
Here there are two approaches
• minimize variance subject to a fixed sample size or subject to a fixed cost.
• minimize the total cost subject to a fixed variance.
The allocation of ni to different strata in accordance with the approaches is usually referred
as optimum
Pk allocation. In both schemes we consider the simplest cost function defined as
c = a + i=1 ci ni where a is the standing or fixed cost where ci is the cost of sampling
within stratum i.
COMPARING SRS WITH STRATIFIED RANDOM SAMPLING WITH PRO-
PORTIONAL ALLOCATION
Pk
We have that var(ȳw )prop = n1 − N1 1 1
2
2
i=1 wi Si and var(ȳ)SRSW OR = n − N S Now
from the first principles, in a stratified scheme the overall variance should be computed as
follows.
Department of Actuarial
N k
1 XX 2
S2 = Yij − Ȳ
N − 1 i=1 j=1
Science and Statistics
N k
1 XX 2
= Yij − Ȳi + Ȳi − Ȳ
N − 1 i=1 j=1
N k
JKUAT is ISO:2008 certified
1 XXh 2 2 i
= Yij − Ȳi + 2 Yij − Ȳi Ȳi − Ȳ + Ȳi − Ȳ
N − 1 i=1 j=1
N k N
1 X X 2 X 2
= Yij − Ȳi + Ni Ȳi − Ȳ
N − 1 i=1 j=1 i=1
N N
X X 2
⇒ (N − 1) S 2 = (Ni − 1) Si2 + Ni Ȳi − Ȳ
i=1 i=1
In the above equation we assume that Ni and N are sufficiently large such that Ni − 1 '
Ni and that N − 1 ' N so we may write,
N N
X X 2
N S2 = Ni Si2 + Ni Ȳi − Ȳ
i=1 i=1
This implies
N N
2
X X 2
S = wi Si2 + wi Ȳi − Ȳ
i=1 i=1
JJ II but
1 1
J I var(ȳ)SRSW OR = − S2
n N
J Doc Doc I and so
"X
N N
#
Back Close 1 1 X 2
var(ȳ)SRSW OR = − wi Si2 + wi Ȳi − Ȳ
n N i=1 i=1
N
X N
X
1 1 1 1 2
= − wi Si2 + − wi Ȳi − Ȳ
n N i=1
n N i=1
= var(ȳw )prop + · · ·
We conclude that
var(ȳw )prop ≤ var(ȳ)SRSW OR
Neyman Vs Proportional allocation
P 2
N
i=1 wi Si PN
wi Si2
i=1
var(ȳ)N eyman = −
n N
PN N
!2
wi Si2
i=1 1 X
var(ȳw )prop − var(ȳ)N eyman = − wi Si
N N i=1
N
1 X
wi Si2 − wi Si S̄
=
N i=1
N
1 X 2
wi Si − 2Si S̄ + S̄ 2 + Si S̄ − S̄ 2
=
N i=1
Department of Actuarial
N N
1 X 2 1 X
wi Si S̄ − S̄ 2
= wi Si − S̄ −
N i=1 N i=1
Science and Statistics
N
1 X 2
var(ȳw )prop − var(ȳ)N eyman = wi Si − S̄
N i=1
⇒ var(ȳw )prop ≥ var(ȳ)N eyman
Then we can conclude that Neyman allocation is more efficient strategy than proportional
allocation.
Exercise 1.
A sample of 30 students is to be drawn from a population of 300 students belonging to
two colleges A and B. The means and standard deviations of their marks are given below;
Total ȳi Si
number of
students
College A 200 30 10
College B 100 60 40
Use the information to confirm that Neyman’s allocation scheme is a more efficient
scheme when compared to proportional allocation.
JJ II Exercise 2.
J I Investigate between var(ȳw )N eyman and var(ȳw )SRSW OR
J Doc Doc I
Back Close
such that
( k k
!)
∂φ ∂ 1 X Si2 X
= Ni (Ni − ni ) +λ ni − n
Science and Statistics
∂
+ (λ (n1 + n2 + · · · + ni · · · nk − n))
∂ni
−Ni2 Si2
1
= +λ
N2 n2i
−wi2 Si2
⇒ n2i
+λ =0
2 2
w S
⇒ n2i = iλ i
⇒ ni = w√i Sλi
2
∂ φ
Now we observe that ∂n 2 ¿0
i
Which means that this minimizes var(ȳw ). The only remaining task is to find the value
of λwhich is still an unknown constant. To compute λ we may take the summations of both
sides of ni = w√i Sλi
i.e
k k
X X wi Si
ni = √
i=1 i=1
λ
k
JJ II X wi Si
n= √
J I i=1
λ
J Doc Doc I Pk !2
i=1 wi Si
Back Close λ=
n
Pk
√ wi Si
λ = i=1
n
k
1 X Si2
var(ȳw )opt = N i (N i − n i )
N 2 i=1 ni
k
1 X Ni
= 2
Ni − 1 Si2
N i=1 ni
Department of Actuarial
k
1 X Ni
= Ni − 1 Si2
N 2 i=1 n(w S )
Pk i i
Science and Statistics
i=1 wi Si
( k )2 k
1 X 1 X
= wi Si − wi Si2
n i=1 N i=1
JKUAT is ISO:2008 certified
k 2 k
!
1 Si2
X Ni 1 X
φ = − +µ a+ ci n i − C
i=1
N ni Ni ni i=1
k
X wi Si √
= √ + µci ni − 2wi Si µci + ci . . .
i=1
ni
k
X wi Si √ 2
JJ II = √ − µci ni + · · ·
i=1
ni
J I
w √
J Doc Doc I Now φwould be minimum when √i Si = µci ni and accordingly ni = wi Si
√
ni µci
Back Close
so
k k
X X wi Si
ni = √
i=1 i=1
µci
k
X wi Si
n= √
i=1
µci
k
√ 1 X wi Si
µ= √
n i=1 ci
so that
w
√i Si
ci
ni = 1
Pk wi Si
√
n i=1 ci
N
√i Si
n· ci
= Pk N
√i Si
i=1 ci
N
√i Si
meaning that ni ∝ ci
• The larger the size of the stratum the larger should be the size of the sample selected
from.
• The larger the variability within the stratum the larger should be the size of the sample
Department of Actuarial
Example.
A stratified population has 5 strata. The stratum sizes Ni and means Ȳi and Si2 of some
JKUAT is ISO:2008 certified
5
X
N = Nh
h=1
= 117 + 98 + 74 + 41 + 45
JJ II
= 375
J I
ȳ1 = 7.3, ȳ2 = 6.9,ȳ3 = 11.2, ȳ4 = 9.1,ȳ5 = 9.6
J Doc Doc I
Back Close
Nh 117 98 74 41 45
wh = N , w1 = 375 ,w2 = 375 , w3 = 375 ,w4 = 375 ,w5 = 375
5
X
ȳstr = wh ȳh
h=1
117 98 74
= 7.3 + 6.9 + 11.2
375 375 375
41 45
+ 9.1 + 9.6
375 375
= 8.4378
≈ 8.44
S2
P5
Nh −nh
var (ȳstr ) = h=1 wh Nh
nh
h
Pn
n Pn yi
S 2 = N 1−1 1
P 2 2
i=1 yi − N ȳ but ȳ = i=1 yi , ⇒ n =
i=1
n ȳ
117 98 74 41 45
n1 = ≈ 16,n2 = ≈ 14, n3 = ≈ 7,n4 = ≈ 5,n1 = ≈5
Department of Actuarial
2 2
117 117 − 16 1.31 98 98 − 14 2.03
var (ȳstr ) = +
375 117 16 375 98 14
2 2
74 − 7 1.13 41 − 5 1.96
JKUAT is ISO:2008 certified
74 41
+ +
375 74 7 375 41 5
2
45 45 − 5 1.74
+
375 45 5
= 0.02962
Under proportional allocation:
80 80 80
n1 = 375 (117) = 24.96 n2 = 375 (98) = 20.9 n3 = 375 (74) = 15.78
80 80
n4 = 375 (41) = 8.74n5 = 375 (45) = 9.6
1 1
(1.23) = ∗ 1.23
n 80
= 0.015375
√ √ √
117
375 1.31)
( 98
375 (2.03) 74
375 1.13)
(
n1 = 0.015375 = 21.57, n2 = 0.015375 = 24.2, n3 = 0.015375 = 13.64,
√ √
41
375 (
1.96) 45
375 (1.74)
n4 = 0.015375 = 9.96 n5 = 0.015375 = 10.29
Exercise 3.
All the farms in a country are stratified by farm size and mean number of hectares of
wheat per farm in each stratum, with the following results.
Farm size No. of farms Mean wheat standard
Department of Actuarial
JJ II
J I
J Doc Doc I
Back Close
Xi
Ȳ
RN =
Science and Statistics
X̄
PN
i=1 Yi
= PN
i=1 Xi
JKUAT is ISO:2008 certified
Rn will provide an estimate of the population value and the product of Rn and X̄ provides
an estimation of the population meanȲ .
ȳR = X̄Rn = X̄ xynn
ȳR will be used as an estimator of Ȳ and throughout we shall assume the knowledge of
X̄. In ratio estimation, we shall assume that Xi0 s and Yi0 s are always positively correlated.
Exercise 4.
Check why in ratio estimation the survey measurement and the auxiliary measurement
should always be positively correlated for all i = 1, 2, . . . , N .
EXPECTED VALUE OF ȳR We shall 1st that ȳR 6= 0 and that Bias(ȳR ) 6= 0. Observe
that Bias(ȳR ) = E ȳR − Ȳ since ȳR is used as an estimator of Ȳ . But
JJ II E ȳR − Ȳ = E X̄Rn − X̄RN
= X̄E (Rn − RN )
J I
= X̄ (E (Rn ) − RN )
J Doc Doc I
E(ȳn ) ȳ
Back Close Now RN = E(x̄n ) = x̄
Thus
E (ȳ)
E ȳR − Ȳ = X̄ E (Rn ) −
E (x̄)
E (Rn x̄)
= X̄ E (Rn ) −
E (x̄)