0% found this document useful (0 votes)

184 views14 pages

Multistage Sampling

This document describes multistage sampling, using a two-stage sampling example of estimating highway conditions. In two-stage sampling, primary units (here highways) are randomly selected in the first stage, then secondary units (highway segments) are randomly selected within each primary unit. The document provides notation, formulas for estimating population totals and means, and variance calculations for two-stage sampling estimates. It also notes ratio estimation may provide better estimates if primary unit size relates to response values.

Uploaded by

Neko Plus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

184 views14 pages

Multistage Sampling

Uploaded by

Neko Plus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Multistage Sampling (Chapter 13)

Multistage sampling refers to sampling plans where the sampling is carried out in stages
using smaller and smaller sampling units at each stage. In a two-stage sampling design, a
sample of primary units is selected and then a sample of secondary units is selected within
each primary unit. This handout outlines the development of estimators under the general
setting of two-stage sampling, considers the allocation question under the setting of equal
sized primary and secondary units, and briefly examines three-stage sampling.
The simplest version of two-stage sampling is to use simple random sampling at each stage
an SRS of primary units, and an SRS of secondary units within each selected primary unit.
The primary units do not need to be the same size and you do not need to select the same
number of secondary units within each primary unit.
Stratified random sampling and cluster sampling can be viewed as special cases of twostage sampling. A stratified random sample is a census of the primary units (the strata)
followed by an SRS of the secondary units within each primary unit. A cluster sample
is an SRS of the primary units (the clusters) followed by a census of the secondary
units within each selected primary unit.
We can use any probability sampling plan at each stage of a multistage plan and the
plans can be different at each stage. The formulas developed below are only for an SRS
at each stage. Its possible to derive formulas for other situations.
Example: In order to estimate the condition of highways under its jurisdiction and the cost
of urgent repairs, the state Department of Transportation selected a number of highway
miles in two stages. In the first stage, a number of highways were selected at random and
without replacement from the list of all highways maintained by the Department. In the second stage, a number of one-mile segments were selected at random and without replacement
from the total length of each selected highway; for example, if the length of highway 101 is
73 miles, it is seen as consisting of 73 one-mile segments (highway miles), from which a
number are selected at random. Highway engineers then visit the selected segments, inspect
the pavement condition, rate the condition of the segment, and estimate the cost of urgently
needed repairs.
For the purpose of this problem, assume there are 352 highways in the state, with a total length of 28,950 miles. A simple random sample of five highways was selected without
replacement. From each selected highway, approximately 10% of its one-mile segments were
then selected. The inspection results were as follows:

Highway Length Selected One-Mile Number Rated

Number (miles)
Segments
Excellent
155
85
10
2
489
120
15
1
283
47
5
0
698
98
10
0
311
34
5
1

Cost of Urgent
Repairs (in $1,000)
90
110
60
100
30

For example, Highway 155 has a length of 85 miles. Ten of its 85 one-mile segments were
selected and inspected. Two of these segments were rated Excellent. The total cost of urgent
repairs on the 10 selected segments was $90,000.
(a) Estimate the proportion and number of state highway miles that are in Excellent condition.
(b) Estimate the average cost per highway mile and the total cost of urgently needed repairs.
First, why would a two-stage sampling plan be adopted for this highway problem in
the first place? Why not an SRS?

Multistage samples are used primarily for cost or feasibility (practicality) reasons. For
example, to select an SRS of households in the U.S. would be extremely difficult because
no list of all households exists. However, we could proceed in stages: an SRS of counties
in the U.S., an SRS of blocks within each county, and an SRS of households within
each block. You would then only need to have a list of households within each block
that was selected. Two-stage sampling also has the flexibility to sample more intensely
in primary units which are larger or more variable. The disadvantage of two-stage
sampling is that the variance of the resulting estimators are likely to be larger than
for an SRS of the same total number of secondary units. This may well be more than
offset by the cost efficiency of two-stage sampling.
Note that a two-stage sample can never be better than a cluster sample with the same
number of primary units selected because a census within each primary unit is the best
you can do.

Notation for Two-Stage Sampling:

N = the number of primary units in the population,
n = the number of primary units in the sample,
Mi = the number of secondary units in the ith primary unit,
mi = the size of the sample in the ith primary unit,
yij = the response of the j th secondary unit within the ith primary unit,
yi =

Mi
X

yij = the total in the ith primary unit,

j=1

1
yi = the mean response in the ith primary unit,
Mi
mi
1 X
=
yij = the sample mean response in the ith primary unit
mi j=1

i =
yi

In two-stage problems, we are generally interested in:

N
X
i=1

yi =

Mi
N X
X

yij total of the y-values for the population,

i=1 j=1

N
X

, where M =
Mi = total # of secondary units in the population.
M
i=1

For stratified random sampling: n = N (census of primary units).

For cluster sampling: mi = Mi (census of secondary units).

Unbiased Estimation of the Population Total and Mean .

First, an unbiased estimator of the total in the ith cluster (yi ) is ybi =
With these estimated cluster totals, estimators for the population total and mean
are given by:
b =

n
NX
Mi y i ,
n i=1

b =

n
b
N X
=
Mi y i .
M
nM i=1

R code to compute the estimated total for the highway example follows:
>
>
>
>
>
>
>
>

N <- 352; n <- 5; M <- 28950

Mi <- c(85,120,47,98,34) # total no. segments on the highways sampled
mi <- c(10,15,5,10,5)
# no. of segments sampled
yi <- c(2,1,0,0,1)
# no. of excellent segments
# Unbiased estimation of total number of segments rated Excellent
# =============================================================
yhati <- (Mi/mi)*yi
92

> yhati
# estimated no. excellent segments on each highway
[1] 17.0 8.0 0.0 0.0 6.8
>
> tauhat <- (N/n)*sum(yhati) # estimated total no. excellent
> tauhat
[1] 2238.72
So an unbiased estimate of the total number of highway segments rated as Excellent is
b
b = 2238.7. What is ?

It can be shown that in two-stage sampling, the variance of the estimator of is given
as:
b =
Var()

N2
M2

2
N
N n u2
i
N X
2 Mi mi
+
M
,
i
N
n
nM 2 i=1
Mi
mi

where: u2 =
i2 =

N
1 X
(yi 1 )2 ,
N 1 i=1
Mi
1 X
(yij i )2 .
Mi 1 i=1

b
Also, Var(b) = M 2 Var().
b for stratified random
If n = N , then the first term = 0, and the second term = Var()
sampling.
b for cluster sampling.
If mi = Mi , then the second term = 0, and the first term = Var()

The estimated variance of b is given by:

N2
d )
b =
Var(
M2
s2u =

2
n
N n s2u
N X
si
2 Mi mi
+
Mi
, where:
2
N
n
nM i=1
Mi
mi

mi
n
1 X
1 X
(ybi b 1 )2 , s2i =
(yij y i )2 .
n 1 i=1
mi 1 j=1

There are two levels of approximation in s2u : we use n for N and ybi for yi (primary unit
total).
d b) = M 2 Var(
d ).
b
Similarly, Var(

b
(since b = M )

In the highway example, we are counting the number of 1-mile segments rated as
Excellent; hence, we have binary data (yij = 0 or 1). So the mean in this example is
93

the proportion of one-mile segments rated as Excellent. Here then, the within-primary
unit sample variance is:
mi
s2i =
pbi (1 pbi ) (the binomial variance).
mi 1
So, to finish answering part (a) of the highway problem (where we already estimated the
total number of segments in Excellent condition to be b = 2238.7), the estimated proportion
of highway miles in Excellent condition, as well as standard errors for both this proportion
and the total are given via the R code below:

!
> su2 <- var(yhati)
n
1 X
2
2
su =
(ybi b 1 )
> su2
n 1 i=1
[1] 49.248
> pi <- yi/mi
# Proportion of segments rated excellent on each highway
(pbi = yi /mi )
> pi
[1] 0.20000000 0.06666667 0.00000000 0.00000000 0.20000000
> si2 <- (mi/(mi-1))*pi*(1-pi) # Estimated
variance within each primary
unit

mi
2
>
si =
pbi (1 pbi ), i = 1, 2, 3, 4, 5
mi 1
> si2
[1] 0.17777778 0.06666667 0.00000000 0.00000000 0.20000000
> var1 <- (N*(N-n)*su2)/n # Term 1 of variance
> var2 <- (N/n)*sum((Mi*(Mi-mi)*si2)/mi) # Term 2 of variance

!
> c(var1,var2)
n
s2u N X
s2i
d
Var(b) = N (N n) +
Mi (Mi mi )
[1] 1203069.54
14697.64
n
n i=1
mi
>
> var.tauhat <- var1 + var2
> SE.tauhat <- sqrt(var.tauhat)
# SE of estimate of total
> SE.tauhat
[1] 1103.525
> c(tauhat-qt(.975,n-1)*SE.tauhat,tauhat+qt(.975,n-1)*SE.tauhat) # 95% CI
[1] -825.1563 5302.5963
>

b
> phat <- tauhat/M
= b
pb =
M
> phat
# estimate of proportion Excellent
[1] 0.07733057
> SE.phat <- SE.tauhat/M
# SE of
s

estimate of proportion
q
d b)
1
1
Var(
>

SE(p)
b =
SE(b) =
Var(b) =
M
M
M2
> SE.phat

[1] 0.0381183
> c(phat-qt(.975,n-1)*SE.phat,phat+qt(.975,n-1)*SE.phat) # 95% CI
[1] -0.02850281 0.18316395
Note that the confidence interval extends below 0. Since the estimated proportions within
each highway are near 0, our sample sizes are too small to assume a normal sampling distrib We might consider bootstrapping.
bution for p.
94

Ratio Estimation in Two-Stage Sampling: If the sizes of the primary units (highways) are
linearly related (through the origin) with the values of the response (number rated as excellent, or cost of urgent repairs), a ratio estimator may provide a better estimator of the
population total or mean.
n
X
Mi y i
i=1
The estimators are given by: b r = X
, br =
n
Mi
i=1

The variance of b r is given by:

Var(b r ) =

N2
M2

N n 1
N
n

N
1 X
(yi Mi )2
N 1 i=1

!
}

variability between clusters (var. in

cluster (highway) totals) where the
difference in highway lengths is accounted for

2
N
N X
i
2 Mi mi
+
Mi
.
2
nM i=1
Mi
mi
|

Within-cluster variability
(same as earlier)

If Mi (cluster i size) is related to yi (cluster i sample total) in the first term above
((yi Mi )2 ), this term should be small. This is the situation where ratio estimation
d
b r ) is given on page 147 of the text.
should be used. The approximate variance Var(
The estimators and corresponding SEs for ratio estimation are computed via R:
> # Ratio Estimation of total segments and proportion rated Excellent
P
P

> # ===================================
ybi
Mi y i
r = b = P
= P
> rhat <- sum(yhati)/sum(Mi)
Mi
Mi
> rhat
# estimate of the proportion
[1] 0.0828125
> tauhat.r <- M*rhat
(br = M b r )
> tauhat.r
# estimate of the total
[1] 2397.422
> sr2 <- (1/(n-1))*sum((yhati - Mi*rhat)^2)

!
n
> sr2
1 X
2
2
sr =
(ybi Mi b r )
n 1 i=1
[1] 49.96548
> var.tauhat.r <- (N*(N-n)*sr2)/n + (N/n)*sum((Mi*(Mi-mi)*si2)/mi)
> sqrt(var.tauhat.r)
# SE of estimate of total
!
n
s2r
N X
s2i
d
[1] 1111.438
+
Var(br ) = N (N n)
Mi (Mi mi )
n
n i=1
mi
> sqrt(var.tauhat.r/M^2)
# SE of estimate of proportion
[1] 0.03839164

SE(b r ) = .038, which is about the same as with the earlier unbiased estimator (.038), as
there was no real relationship between the highway length and the number of Excellent
segments on the highway.

If appropriate, a ratio estimator will generally be better if there is a lot of variation in

the highway lengths (same idea as before).
Part (b) of the Highway Example: Estimate the average cost per highway mile and the total
cost of urgently needed repairs. Here, we have N = 352 total highways, n = 5 sampled
highways, and M = 28950 total 1-mile segments, and:
Mi = the number of segments for highway i,
mi = the number of sampled segments for highway i,
yi = the cost of repairs for highway i
(This was the # of segments rated Excellent for highway i earlier)
y i = the average cost per segment for highway i,
Mi y i = ybi = the estimated total cost for highway i.
Again, we can use either the unbiased estimators of the mean and total or the ratio estimators. The ratio estimators would be expected to be better if theres a linear relationship
(through the origin) between the length of a highway and the estimated total repair costs,
or, equivalently, if theres little variation between the average repair costs per mile on the
different highways.
Theres an important piece of information missing in the data table on p. 91 which we
need to calculate the SEs of our estimates for the cost data. Can you see what it is?

In the R analysis below, some values are assumed for the missing information, but well
also see that what values we assume makes little difference in the SEs.
> N <- 352; n <- 5; M <- 28950
> Mi <- c(85,120,47,98,34) # total no. segments on the highways sampled
> mi <- c(10,15,5,10,5)
# no. of segments sampled
> yi <- c(90,110,60,100,30) # total cost on sampled segments
> si <- c(3.1,3.5,4.8,2.9,2.5)
> si2 <- si^2
>
> # Unbiased estimation of total cost of repairs and mean cost per segment
> # =============================================================
> yhati <- (Mi/mi)*yi
> yhati
# estimated total cost on each highway
[1] 765 880 564 980 204
>
> tauhat <- (N/n)*sum(yhati) # estimated total cost
> tauhat
[1] 238867.2
96

>
> su2 <- var(yhati)
> su2
[1] 94311.8
> var1 <- (N*(N-n)*su2)/n # Term 1 of variance of tauhat
> var2 <- (N/n)*sum((Mi*(Mi-mi)*si2)/mi) # Term 2 of variance of tauhat
> c(var1,var2)
[1] 2303924100
2393449
> var.tauhat <- var1 + var2
> SE.tauhat <- sqrt(var.tauhat)
# SE of estimate of total
> SE.tauhat
[1] 48024.14
> c(tauhat-qt(.975,n-1)*SE.tauhat,tauhat+qt(.975,n-1)*SE.tauhat) # 95% CI
[1] 105530.8 372203.6
>
> muhat <- tauhat/M
> muhat
# estimate of mean cost per segment
[1] 8.251026
> SE.muhat <- SE.tauhat/M
# SE of estimate of proportion
> SE.muhat
[1] 1.658865
> c(muhat-qt(.975,n-1)*SE.muhat,muhat+qt(.975,n-1)*SE.muhat) # 95% CI
[1] 3.645279 12.856773
>
>
> # Ratio Estimation of total cost and mean cost per segment
> # ===================================
> rhat <- sum(yhati)/sum(Mi)
> tauhat.r <- M*rhat
> tauhat.r
# estimate of the total cost
[1] 255800.4
> sr2 <- (1/(n-1))*sum((yhati - Mi*rhat)^2)
> sr2
[1] 19283.25
> var.tauhat.r <- (N*(N-n)*sr2)/n + (N/n)*sum((Mi*(Mi-mi)*si2)/mi)
> sqrt(var.tauhat.r)
# SE of estimate of total cost
[1] 21759.14
>
> rhat# estimated mean cost per segment
[1] 8.835938
> sqrt(var.tauhat.r/M^2)# SE of estimated mean cost per segment
[1] 0.751611
97

We were not given the standard deviations of the costs for each sampled highway and
we were not given the individual data values (the cost for each of the sampled sections
on a highway) from which to compute them. However, we can also see that the within
highway variability contributed little to the estimated variance of our estimators. If we
had assumed a standard deviation of $10,000 on each highway (very high, considering
that the average costs ranged from $6,000 to $12,000), the SE of the unbiased estimate
of the total cost would have increased from $48,024 to only $48,214 (and would have
decreased to $47,999 if all the standard deviations were 0). This points out something
important in two-stage designs: it is generally the variability between the primary units
and the sample size of the primary units that determines the accuracy of the estimators.
Note that for the cost data, the ratio estimator decreased the SEs of the estimates of
the total and mean by over half. This was because there was a relationship between
the lengths of the highways and the estimated total cost of repairs.

d b): Recall:
Comparison of Var(b) and Var(

Var(b) = N (N n)

N
2
u2 N X
Mi (Mi mi ) i
+
n
n i=1
mi

d b) = N (N n)
Var(

n
s2u N X
s2
+
Mi (Mi mi ) i .
n
n i=1
mi

d b) is an unbiased estimator of Var(b).

It was mentioned earlier that Var(
d b)
Although s2i is an unbiased estimator of i2 , the second term in the expression for Var(
n
2
X
s
N
Mi (Mi mi ) i is not an unbiased estimator of the second term in Var(b),
above,
n i=1
mi
N
X
2
but is an unbiased estimator of
Mi (Mi mi ) i without the N/n constant.
mi
i=1
d b) underestimates the corresponding 2nd piece in Var(b).
So the 2nd piece in Var(
d b) overestimates the corresponding 1st piece in Var(b). Why?
And the 1st piece in Var(

It is easy to show with an example that s2u u2 . How?

So s2u overestimates u2 because it includes both the variability between primary units
and the variability within primary units.

Allocation in Two-Stage Sampling: A practical question in developing a two-stage sampling

plan is how to allocate resources to the sampling of primary units versus secondary units.
Here, we consider the special case of:
1. Equal-sized primary units: M1 = = MN = M .
2. Equal-sized samples within primary units: m1 = mn = m.
The total sample size then is mn, and M N = M where:
M = the number of secondary units per primary unit,
N = the total number of primary units,
M = the total number of secondary units.
Under the allocation assumptions of equal-sized primary units and equal-sized samples
within primary units, the unbiased and ratio estimators of the total are identical, where:
Pn

n
n
n
Mi y i
M X
NX
N X
=
Mi y i =
Mi y i = M
yi
n i=1
n
nM i=1
i=1 Mi
i=1

br = M b r = M Pi=1
n

= M

n
1X
y = M y = b.
n i=1 i

{z
b

So, we take the average of all responses and multiply it by the number of secondary
units.
b
= y.
Also: b =
M
b
Working with the variance of :
b
Var()

N
Var(b)
i2
N (N n) u2
N X
M
(M

m)
=
+
M2
M2
n
M 2 n i=1
m

Note: u2 =

1
N 1
2
M
N 1

1 PN
2
i=1 (yi 1 ) = N 1
i=1 (M i
PN
2 2
2
i=1 (i ) = M b ,

=
2
where: b = variability between primary units

M )2

N
N (N n) M b2 N M (M m) X
+
i2
=

2 2
2 2
n
M N
M N nm i=1

2
N
N n b
M m
1
1 X
=
+
2
N
n
nm N i=1 i
M

(1 f1 )

2
b2
+ (1 f2 ) w ,
n
mn

where: w2 = the average within primary unit variability, and f1 and f2 are the sampling
fractions at the first and second stages, respectively.
Note: If we increase m (the # of secondary units), we can drive the 2nd term in the variance
above to zero, but this will have no effect on the 1st term.
99

b Suppose we fix
Goal: We want to find those values of n and m which minimize the Var().
nm = c ; this assumes the cost is the same for all possible choices of n and m.

n
b = 1
First, note that: Var()
N

m
b2
+ 1
n
M

w2
2 2
2
2
= b b + w w ,
nm
n
N
nm nM

where the middle two termsare fixed !for any choice of n, m (nm = c).
1
2
So, we want to minimize
b2 w with respect to n.
n
M
If b2 >

w2
, then we should:
M

So if the primary unit size, M , is large (it usually is), then b2 will be larger.
If b2 <

w2
, then we should:
M

All of this ignores any differences in cost.

Cost Considerations in Allocation: Suppose we let C = c0 + c1 n + c2 nm where:
c0 = the startup cost,
c1 = the cost of selecting a primary unit (travel, time, etc.),
c2 = the cost of sampling a secondary unit once weve selected a primary unit.
Suppose we fix the total cost C. Then, the optimal allocation for m (that which minimizes
b for fixed C) is:
Var()
v
u
u
mopt = u
t

c1 w2

c2 b2 w2 /M

Note that this optimal choice for m does not depend in any way on the total cost C.
If c1 increases relative to c2 , it makes sense that mopt will increase, because the cost of
sampling primary units increases.
2
Often, if M is large, then w 0, and mopt
M
the relative costs and relative variabilities.

c1 w2
. In this case, we need only know
c2 b2

Back to the Highway Example: Suppose it takes 1/2 hour to actually sample a 1-mile segment (c2 ). It might be much more costly to select a primary unit, and suppose we guess:
c1
25.
c2

100

Suppose we have preliminary data (say on 4 highways with 5 segments each) and we
conduct an analysis of variance (ANOVA) to estimate the two variance components b2
and w2 , given below:

Recall that s2b =

And s2w =

b2 =

N
1 X
(i )2
N 1 i=1

w2 =

N
M
1 X
1 X
i2 , where: i2 =
(yij i )2 .
N i=1
M 1 j=1

n
1 X
(y y)2 overestimates b2 .
n 1 i=1 i

n
1X
s2 is an unbiased estimate of w2 (since s2i is unbiased for i2 ).
n i=1 i

Conducting an ANOVA on the yij s with the primary units (highways) as factors yields
the partitioning:
n X
m
n
n X
m
X
X
X
2
2
(yij y) = m (y i y) +
(yij y i )2 .
i=1 j=1

i=1

i=1 j=1

This gives the following ANOVA table:

Source of Degrees of
of Variance Freedom
Between

Sums of
Squares

Mean
Squares

E(MS)

Within
Total

n
m
n
1X
1 X
1X
E(MSW) = E
(yij y i )2 = E
s2 = w2 .
n i=1 m 1 j=1
n i=1 i

We want to use the expected mean squares above to estimate b2 . How?

R Code to Estimate b2 and w2 via ANOVA: Reconsider the highway repair example with
hypothetical data on repair costs.

101

> hwy <- rep(c("A","B","C","D"),rep(5,4))

> hwy
[1] "A" "A" "A" "A" "A" "B" "B" "B" "B" "B" "C" "C" "C" "C" "C"
[16] "D" "D" "D" "D" "D"
> repcost <- c(3,6,7,9,4,12,8,14,9,10,6,8,10,7,10,5,4,8,6,6)
> repcost
[1]
3 6 7 9 4 12 8 14 9 10 6 8 10 7 10 5 4 8 6 6
> d <- data.frame(hwy,repcost)
> a <- aov(repcost~hwy,data=d) # Conducts an ANOVA of repair costs
> summary(a)
#
on the highway ID
Df Sum Sq Mean Sq F value
Pr(>F)
hwy
3 79.200 26.400 6.2485 0.005184 **
Residuals
16 67.600
4.225
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1

1
> sigb2 <- (1/5)*(26.4 - 4.225)
b2 (s2b s2w )
m
> sigb2
[1] 4.435
# Estimate of sigma-b-sq
> 4.225/90
[1] 0.04694444
# Estimate of sigma-w-sq / Mbar

Since w2 /M = 0.0469 is small relative to b2 , and hence effectively negligible, then the
approximate optimal allocation for m is given by:
v
u
u
mopt = t

We had guessed that

per highway.

v
u

u c1 2
u c1 (4.225)
c1 w2
w
t
t

=
.
2
2
2
c2 b
c2 (4.435)
c2 (b w /M )

c1
= 25. Then mopt = 4.8, so we might use 5 one-mile segments
c2

c1
= 10, then mopt = 3.09 and we would have used 3 one-mile
c2
segments per highway.

Had we guessed that

The value of n is now determined by the overall budget (or cost). Recall that the total
cost was given by: C = c0 + c1 n + c2 nm
= C = c0 + c1 n + 5c2 n =

102

C c0
.
c1 + 5c2

PPS Sampling in Two-Stage Problems: As with cluster sampling, Hansen-Hurwitz estimation can be employed in two-stage sampling if selection of the primary units is made proportional to size with replacement. For details of the forms of the resulting unbiased estimators,
see pages 148-149 in Chapter 13.

Horvitz-Thompson Estimator in Two-Stage Sampling: Recall that the Horvitz-Thompson

estimator can be applied in virtually any sampling problem. Recall also that this estimator
depends on the inclusion probabilities for the units in the population. How do we find these
(ij) , the inclusion probabilities?
The inclusion probability of the j th secondary unit within the ith primary unit is:

(ij) =

n
N

mi
Mi

With these inclusion probabilities, the Horvitz-Thompson estimator of the population total
is:
b =

mi
n X
X
yij
i=1 j=1

(ij)

mi
n
n
NX
Mi X
NX
Mi y i ,
yij =
n i=1 mi j=1
n i=1

which is what we found earlier as the SRS estimator.

Three-Stage Sampling: Suppose now that we sample:
n out of N primary units
m out of M secondary units
t out of T tertiary units

(3 Stages)

For this three-stage sampling plan, the variance of the estimated mean is:
2
2
2
b
Var()
= (1 f1 ) 1 + (1 f2 ) 2 + (1 f3 ) 3
{z n} |
{z mn} |
{z nmt}
|

d )
b
Var(
= (1 f1 )

n
t
m
f1 = , f 2 =
, f3 =
N
M
T

s2
s2
s21
+ f1 (1 f2 ) 2 + f1 f2 (1 f3 ) 3 .
n
nm
nmt

d )
b
Again, s21 overestimates 12 , s22 overestimates 22 , and s23 underestimates 32 , but Var(
b
is unbiased for Var().

103

William G. Cochran Sampling Techniques
83% (12)
William G. Cochran Sampling Techniques
442 pages
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
From Everand
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
Joseph George Caldwell
No ratings yet
Statistics and Probability Performance Task Binomial Distribution
No ratings yet
Statistics and Probability Performance Task Binomial Distribution
7 pages
Stratified
No ratings yet
Stratified
17 pages
Stat 475 Notes 8: y B X y B X y BX N SEB NNX N X Is Unknown, Then We Substitute The Sample Mean X For It
No ratings yet
Stat 475 Notes 8: y B X y B X y BX N SEB NNX N X Is Unknown, Then We Substitute The Sample Mean X For It
13 pages
Objective - STA 5313 01F Theory of Sample Surveys With Applications
No ratings yet
Objective - STA 5313 01F Theory of Sample Surveys With Applications
5 pages
Online Lecture On Two Stagel Cluster Sampling
No ratings yet
Online Lecture On Two Stagel Cluster Sampling
15 pages
2 Complex Sampling Concepts: PSU PSU PSU Usus CS SRS
No ratings yet
2 Complex Sampling Concepts: PSU PSU PSU Usus CS SRS
19 pages
7.1 Basic Concepts
No ratings yet
7.1 Basic Concepts
28 pages
Talk 4
No ratings yet
Talk 4
35 pages
stk4610 s3 Ex Sheet
No ratings yet
stk4610 s3 Ex Sheet
5 pages
Stratified Sampling
No ratings yet
Stratified Sampling
17 pages
Sampling Theory Sampling Theory: Two Stage Sampling Two Stage Sampling (Sub Sampling)
No ratings yet
Sampling Theory Sampling Theory: Two Stage Sampling Two Stage Sampling (Sub Sampling)
13 pages
ST 318 Test 2-3
No ratings yet
ST 318 Test 2-3
24 pages
Ss Notes
No ratings yet
Ss Notes
34 pages
SMA 4.1 Sampling and Estimation
No ratings yet
SMA 4.1 Sampling and Estimation
27 pages
Creative Commons Attribution-Noncommercial-Sharealike License
No ratings yet
Creative Commons Attribution-Noncommercial-Sharealike License
50 pages
Notes On Sample Survey
No ratings yet
Notes On Sample Survey
34 pages
Lec. Note E5
No ratings yet
Lec. Note E5
7 pages
Ch6 Sampling and Estimation
No ratings yet
Ch6 Sampling and Estimation
24 pages
Two Stage Cluster Sampling
No ratings yet
Two Stage Cluster Sampling
42 pages
Stat-304 Lecture 2
No ratings yet
Stat-304 Lecture 2
14 pages
Methods in Sample Surveys: Cluster Sampling
No ratings yet
Methods in Sample Surveys: Cluster Sampling
14 pages
Basic Univariate Statistics For Engineers 2019
No ratings yet
Basic Univariate Statistics For Engineers 2019
32 pages
Esa - QP - Ue19-20cs203 - SDS
No ratings yet
Esa - QP - Ue19-20cs203 - SDS
11 pages
Sampling Unit 7
No ratings yet
Sampling Unit 7
6 pages
Lec. Note E4
No ratings yet
Lec. Note E4
5 pages
Introduction To Survey Sampling7 PDF
No ratings yet
Introduction To Survey Sampling7 PDF
10 pages
Scilab Guide
No ratings yet
Scilab Guide
69 pages
Introduction To Probabilistic Sampling
No ratings yet
Introduction To Probabilistic Sampling
39 pages
Sampling Distribution: Estimation and Testing of Hypothesis
No ratings yet
Sampling Distribution: Estimation and Testing of Hypothesis
34 pages
Lecture 2 SRS
No ratings yet
Lecture 2 SRS
25 pages
MR Sampling
No ratings yet
MR Sampling
5 pages
C.I. Estimation For Single Population
No ratings yet
C.I. Estimation For Single Population
20 pages
Ca09 Pitblado Handout
No ratings yet
Ca09 Pitblado Handout
28 pages
Introduction To Sampling: Situo Liu Spry, Inc. 10/25/2013
No ratings yet
Introduction To Sampling: Situo Liu Spry, Inc. 10/25/2013
22 pages
Lect W3m05ab s101jsr f23
No ratings yet
Lect W3m05ab s101jsr f23
8 pages
Obsolete Formulas For Two-Phase Variances: Thomas Lumley November 20, 2013
No ratings yet
Obsolete Formulas For Two-Phase Variances: Thomas Lumley November 20, 2013
2 pages
Sampling
No ratings yet
Sampling
20 pages
04 Estimation+hyp Test
No ratings yet
04 Estimation+hyp Test
32 pages
Edgeworth
No ratings yet
Edgeworth
23 pages
Stratified Sampling 2012
No ratings yet
Stratified Sampling 2012
17 pages
Week9 BAM
No ratings yet
Week9 BAM
26 pages
Cochran 1977 Sampling Techniques Third E
No ratings yet
Cochran 1977 Sampling Techniques Third E
442 pages
Cochran 1977 Sampling Techniques Third E
No ratings yet
Cochran 1977 Sampling Techniques Third E
442 pages
Cochran 1977 Sampling Techniques Third Edition
100% (1)
Cochran 1977 Sampling Techniques Third Edition
442 pages
William - G. - Cochran - Sampling - Techniques - Third - EdBookFi - Org
No ratings yet
William - G. - Cochran - Sampling - Techniques - Third - EdBookFi - Org
442 pages
Stat8101 L3 25
No ratings yet
Stat8101 L3 25
43 pages
Sample
No ratings yet
Sample
23 pages
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
Exercises of Function Study
From Everand
Exercises of Function Study
Simone Malacrida
No ratings yet
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Elementary Statistics
From Everand
Elementary Statistics
jay prakash Maheshwari
5/5 (1)
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Exercises of Combinatory Calculus
From Everand
Exercises of Combinatory Calculus
Simone Malacrida
No ratings yet
Robot Manipulators: Modeling, Performance Analysis and Control
From Everand
Robot Manipulators: Modeling, Performance Analysis and Control
Etienne Dombre
No ratings yet
Topics on Tournaments in Graph Theory
From Everand
Topics on Tournaments in Graph Theory
John W. Moon
No ratings yet
Signal, Audio and Image Processing
From Everand
Signal, Audio and Image Processing
Dr. Hidaia Mahmood Alassouli
No ratings yet
Spectral method for fatigue damage estimation with non-zero mean stress
From Everand
Spectral method for fatigue damage estimation with non-zero mean stress
Pedro H. Alves Corrêa
No ratings yet
A Complete Course in Physics (Graphs) - First Edition
From Everand
A Complete Course in Physics (Graphs) - First Edition
Rajat Kalia
No ratings yet
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet
Some Basic Ideas of Sampling PDF
No ratings yet
Some Basic Ideas of Sampling PDF
24 pages
Some Basic Ideas of Sampling PDF
No ratings yet
Some Basic Ideas of Sampling PDF
24 pages
Prerequisite: LSE - 07 Organization Theory: An Interdisciplinary Approach About This Unit
No ratings yet
Prerequisite: LSE - 07 Organization Theory: An Interdisciplinary Approach About This Unit
2 pages
Chap 04
No ratings yet
Chap 04
29 pages
Perceptions of Sustainable Marketing Management by Export Companies in Serbia
No ratings yet
Perceptions of Sustainable Marketing Management by Export Companies in Serbia
14 pages
Small and Medium Enterprises As A Base For Restructuring Serbian Economy
No ratings yet
Small and Medium Enterprises As A Base For Restructuring Serbian Economy
8 pages
An Exploratory Study of Marketing, Logistics, and Ethics in Packaging Innovation
No ratings yet
An Exploratory Study of Marketing, Logistics, and Ethics in Packaging Innovation
22 pages
Thesis Framework Sample
100% (3)
Thesis Framework Sample
10 pages
Britannia Industries Ltd. Rudrapur: A Project
No ratings yet
Britannia Industries Ltd. Rudrapur: A Project
14 pages
Guyo - Abdi - Jatani
No ratings yet
Guyo - Abdi - Jatani
33 pages
Guba
No ratings yet
Guba
13 pages
Research Methodology - Unit - 4
No ratings yet
Research Methodology - Unit - 4
23 pages
Pharmaceutical Validation
No ratings yet
Pharmaceutical Validation
8 pages
Marketing Research That Won T Break The Bank A Practical Guide To Getting The Information You Need 2nd Edition Alan R. Andreasen
No ratings yet
Marketing Research That Won T Break The Bank A Practical Guide To Getting The Information You Need 2nd Edition Alan R. Andreasen
86 pages
STPR Format
No ratings yet
STPR Format
34 pages
Writing Your Thesis Practical Guide For Students Philippines
No ratings yet
Writing Your Thesis Practical Guide For Students Philippines
50 pages
Project On Small Scale Industries
88% (57)
Project On Small Scale Industries
29 pages
Sample Size Estimation and Sampling Techniques For Selecting A Representative Sample
No ratings yet
Sample Size Estimation and Sampling Techniques For Selecting A Representative Sample
7 pages
Literature Review Economics Sample
100% (2)
Literature Review Economics Sample
8 pages
Summary, Conclusion, Implications and Recommendation
No ratings yet
Summary, Conclusion, Implications and Recommendation
6 pages
AQL (Accepted Quality Level)
No ratings yet
AQL (Accepted Quality Level)
12 pages
Rural and Agricultural Marketing
No ratings yet
Rural and Agricultural Marketing
52 pages
GMP Quality Control Module
100% (3)
GMP Quality Control Module
68 pages
Prob and Statistics 3rd Lecture 12-9-2023 1
No ratings yet
Prob and Statistics 3rd Lecture 12-9-2023 1
25 pages
BIERNACKI, P WALDORF, D. Snowball Sampling. Problems and Techniques of Chain Referral Sampling
0% (1)
BIERNACKI, P WALDORF, D. Snowball Sampling. Problems and Techniques of Chain Referral Sampling
23 pages
PRPY 121A Psychological Statistics - 7
No ratings yet
PRPY 121A Psychological Statistics - 7
12 pages
02 - Baseline Study On Awareness of Rights
No ratings yet
02 - Baseline Study On Awareness of Rights
55 pages
Chapter 3 Research Methodology
No ratings yet
Chapter 3 Research Methodology
7 pages
De La Hoz Schilling 2018 Exploring The New Nor
No ratings yet
De La Hoz Schilling 2018 Exploring The New Nor
70 pages
Business Statistics and Research Methodology Theory
No ratings yet
Business Statistics and Research Methodology Theory
39 pages
2 6 Sampling WHO Guidelines
No ratings yet
2 6 Sampling WHO Guidelines
22 pages
10 1108 - Meq 01 2014 0006
No ratings yet
10 1108 - Meq 01 2014 0006
15 pages
CED Unit 3 Circuit Training
No ratings yet
CED Unit 3 Circuit Training
2 pages
Study On Customer Satisfaction Towards Swiggy
No ratings yet
Study On Customer Satisfaction Towards Swiggy
38 pages
Hyatt Newdelhi
No ratings yet
Hyatt Newdelhi
3 pages
MM2224364 Research Paper
No ratings yet
MM2224364 Research Paper
20 pages

Multistage Sampling

Uploaded by

Multistage Sampling

Uploaded by

Multistage Sampling (Chapter 13)

Highway Length Selected One-Mile Number Rated

Notation for Two-Stage Sampling:

yij = the total in the ith primary unit,

In two-stage problems, we are generally interested in:

yij total of the y-values for the population,

For stratified random sampling: n = N (census of primary units).

Unbiased Estimation of the Population Total and Mean .

N <- 352; n <- 5; M <- 28950

The estimated variance of b is given by:

The variance of b r is given by:

variability between clusters (var. in

If appropriate, a ratio estimator will generally be better if there is a lot of variation in

d b) is an unbiased estimator of Var(b).

It is easy to show with an example that s2u u2 . How?

Allocation in Two-Stage Sampling: A practical question in developing a two-stage sampling

All of this ignores any differences in cost.

Recall that s2b =

This gives the following ANOVA table:

We want to use the expected mean squares above to estimate b2 . How?

> hwy <- rep(c("A","B","C","D"),rep(5,4))

We had guessed that

Had we guessed that

Horvitz-Thompson Estimator in Two-Stage Sampling: Recall that the Horvitz-Thompson

which is what we found earlier as the SRS estimator.

You might also like