Lec. Note E5
Lec. Note E5
Sometimes it is impossible to develop a frame for the elements that we would like to sample. We might
be able to develop a frame for clusters of elements, though, such as city blocks rather than households
or clinics rather than patients. In such situations, a random sample of clusters is taken from the total
number of clusters and all the units in each cluster sampled are surveyed. This sampling method is called
“cluster Sampling”.
A cluster sample is a probability sample in which each sampling unit is a collection, or cluster, of
elements. Cluster sampling is generally employed because of cost-effectiveness or because no adequate
frame for elements is available. However, cluster sampling may be better than either simple or stratified
random sampling if the measurements within clusters are heterogeneous and the cluster means are nearly
equal. The ideal situation for cluster sampling is, then, to have each cluster contain measurements as
different as possible but to have the cluster means equal. This condition is in contrast to that for stratified
random sampling in which strata are to be homogeneous but stratum means are to differ. For example,
suppose we wish to estimate the average income per household in a large city. If we use simple random
sampling, we will need a frame listing all households (elements) in the city, which would be difficult
and costly to obtain. We cannot avoid this problem by using stratified random sampling because a frame
is still required for each stratum in the population. Rather than draw a simple random sample of elements,
we could divide the city into regions such as blocks (or clusters of elements) and select a simple random
sample of blocks from the population. This task is easily accomplished by using a frame that lists all city
blocks. Then the income of every household within each sampled block could be measured. Cluster
sampling is an effective design for obtaining a specified amount of information at a minimum cost under
the following conditions:
1. A good frame listing population element either is not available or is very costly to obtain, while
a frame listing cluster is easily obtained.
2. The cost of obtaining observations increases as the distance separating the elements increases.
Elements other than people are often sampled in clusters. An automobile forms a nice cluster of four
tires for studies of tire wear and safety. A circuit board manufactured for a computer forms a cluster of
semiconductors for testing. An orange tree forms a cluster of oranges for investigating an insect
infestation. A plot in a forest contains a cluster of trees for estimating timber volume or proportions of
diseased trees.
Notice the main difference between the optimal construction of strata and the construction of clusters.
Strata are to be as homogeneous (alike) as possible within, but one stratum should differ as much as
possible from another with respect to the characteristic being measured. Clusters, on the other hand,
should be as heterogeneous (different) as possible within, and one cluster should look very much like
another in order for the economic advantages of cluster sampling to pay off.
34
One Stage (Single Stage) Cluster Sampling
Cluster sampling is simple random sampling with each sampling unit containing a collection or cluster
of elements. If each element within a sampled cluster is measured, the result is a single-stage cluster
sample. Hence, the estimators of the population mean, total, and proportion are similar to those for
simple random sampling. In particular, the sample mean is a good estimator of the population mean.
Estimation Procedure for Cluster Sampling
Let M = Number of clustersin the population
m = Number of clustersin the sample
m n
Sampling Fraction = f = = , N = ML , n = mL
M N
Estimator for the Population Mean
m Ni m L
y = = N = ML, n = mL
yij yij
n n ,
i =1 j =1 i =1 j =1
L yij
L
m m
j =1
y = mi =
y
where y i is the cluster mean
i =1 i =1 m
S .E.( y ) =
1− f
m
(y − y) i
2
=
(1 − f ) m y 2 − m y 2
i
m i =1 m −1 m(m − 1) i =1
n mL m
Ni = N = L i.e. N 1 = N 2 = N 3 = ... = N m = L f = = =
N ML M
S .E.(Yˆ ) = S .E.( yT ) =
(
N 2 (1 − f ) m yi − y ) 2
=
N 2 (1 − f ) m 2 2
yi − my
m i =1 m − 1 m(m − 1) i =1
35
Estimator for the Population Proportion
m
Pˆ = mpi , where pi is the proportion for the i th cluster
i =1
(1 − f ) m
(p − Pˆ ) 2
(1 − f ) m p 2 − mPˆ 2
S .E.( Pˆ ) =
m
i =1
i
m −1
= i
m(m − 1) i =1
Example: A city is to be divided into 415 clusters. Twenty-five of the clusters will be sampled, and
interviews are conducted at every household in each of the 25 blocks sampled. The data on incomes are
presented in the table below. Use the data to estimate the per-capita income in the city and place a bound
on the error of estimation.
M Random Selection m
Clusters Clusters
Estimator for the Population Mean
There are two ways; (a) the group mean ( y G ) (b) the element mean ( y E )
36
Standard Error of the Estimator for the Population Mean
(1 − f ) m y 2 − m y 2 , n
S .E.( y G ) = i
m(m − 1) i =1
G
f =
N
yij y ij m
n = Ni
i =1 j =1 i =1 j =1
yE = m
= ,
N
n i =1
i
i =1
(1 − f )m m (y − yN )2 , Ni
S .E.( y E ) = i
n 2 (m − 1) i =1
i where yi = yij
j =1
(1 − f )m m y 2 + y 2 m N 2 − 2 y m y N
S .E.( y E ) = i E
n 2 (m − 1) i =1 i =1
i E i
i =1
i
(1 − f ) m p 2 − mPˆ 2
S .E.( Pˆ ) = i
m(m − 1) i =1
(b) Method - B
m
Ni Ni
Pˆ = wi pi , wi = m
=
N
i =1 n
i
i =1
(1 − f )m m N 2 (p − Pˆ )2
S .E.( Pˆ ) = i i
n 2 (m − 1) i =1
y ij
y Ni Ni
Rˆ = y i = y ij , xi = xij
i =1 j =1
Nj
= ,
m
x x j =1 j =1
ij
i =1 j =1
37
Standard Error of the Estimator for the Population Ratio
(1 − f )m m y 2 + Rˆ 2 m x 2 − 2Rˆ m y x 2
S .E.( Rˆ ) = i i i i
n x (m − 1)
2
2 i =1 i =1 i =1
Two-Stage Cluster Sampling – Unequal-sized clusters in both the first stage and second stage
First Stage
Selection of “m” clusters out of a total of “M” clusters (e.g. Schools) using a suitable sampling
procedure.
Second Stage
If the cluster is a school, select one or more classes from a school as second-stage clusters using
a suitable sampling procedure.
Then, all units within the selected second-stage clusters are surveyed.
M Random Selection m
Clusters Clusters
N1 , N 2 , N 3 ,..., N i ,..., N m
n1 , n2 , n3 ,..., ni ,..., nm
1 m
y = Ni yi where y is the estimator of the average number of students per
m i =1
cluster( school)
ni N i j =1 ni − 1 m −1
i
m i =1 m M i =1
38
To estimate the mean by ratio method, we estimate the total number of students Yˆ in the whole
population and the total number of classes X̂ in the whole population.
m
M
Xˆ = xT =
m
N
i =1
i
Then, the estimated average number of students per class in the educational division is
M m
Yˆ
Ni yi
m i =1
ˆ
R= = = Average number of students per class
Xˆ M m
Ni
m i =1
Standard Error of the Estimator
M m
1 1 ni yij − y i( )
2
2
(
M 1 1 m N y − N i Rˆ )
2
S .E.( Rˆ ) =
mXˆ 2
N n N n − 1
i
2
− + 2 − i i
Xˆ m M i =1 m −1
i =1 i i j =1 i
Example
To estimate the number of preschool children in a town, a sample survey was conducted adopting a two-
stage cluster sample with unequal-sized clusters. A first-stage sample of ten census blocks (clusters) was
selected randomly (using a simple random sampling procedure) from a total of 120 census blocks. Each
of the selected census blocks was divided into small groups or clusters of approximately 10 households.
A secondary sample of one such cluster from each of the selected census blocks was selected again using
a simple random sampling procedure. The following table gives the number of households (Ni) in each
of the selected census blocks, the number of households in each cluster selected (ni), and the number of
preschool children (yij) in each household in the selected clusters of households.
39
8 63 9 3 1 2 0 1 0 0 0 1 - - -
9 70 10 1 1 2 1 0 1 3 4 1 1 - -
10 67 12 5 2 1 0 1 2 1 2 1 2 3 1
40