0% found this document useful (0 votes)
2 views7 pages

Lec. Note E5

The document discusses cluster sampling, a method used when a complete frame of elements is unavailable, allowing researchers to sample clusters instead. It outlines the advantages of cluster sampling, including cost-effectiveness and situations where elements are heterogeneous within clusters. The document also details estimation procedures, standard errors, and examples of one-stage and two-stage cluster sampling.

Uploaded by

Hansi Anjula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views7 pages

Lec. Note E5

The document discusses cluster sampling, a method used when a complete frame of elements is unavailable, allowing researchers to sample clusters instead. It outlines the advantages of cluster sampling, including cost-effectiveness and situations where elements are heterogeneous within clusters. The document also details estimation procedures, standard errors, and examples of one-stage and two-stage cluster sampling.

Uploaded by

Hansi Anjula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Lecture Notes - 05 Cluster Sampling

Sometimes it is impossible to develop a frame for the elements that we would like to sample. We might
be able to develop a frame for clusters of elements, though, such as city blocks rather than households
or clinics rather than patients. In such situations, a random sample of clusters is taken from the total
number of clusters and all the units in each cluster sampled are surveyed. This sampling method is called
“cluster Sampling”.
A cluster sample is a probability sample in which each sampling unit is a collection, or cluster, of
elements. Cluster sampling is generally employed because of cost-effectiveness or because no adequate
frame for elements is available. However, cluster sampling may be better than either simple or stratified
random sampling if the measurements within clusters are heterogeneous and the cluster means are nearly
equal. The ideal situation for cluster sampling is, then, to have each cluster contain measurements as
different as possible but to have the cluster means equal. This condition is in contrast to that for stratified
random sampling in which strata are to be homogeneous but stratum means are to differ. For example,
suppose we wish to estimate the average income per household in a large city. If we use simple random
sampling, we will need a frame listing all households (elements) in the city, which would be difficult
and costly to obtain. We cannot avoid this problem by using stratified random sampling because a frame
is still required for each stratum in the population. Rather than draw a simple random sample of elements,
we could divide the city into regions such as blocks (or clusters of elements) and select a simple random
sample of blocks from the population. This task is easily accomplished by using a frame that lists all city
blocks. Then the income of every household within each sampled block could be measured. Cluster
sampling is an effective design for obtaining a specified amount of information at a minimum cost under
the following conditions:
1. A good frame listing population element either is not available or is very costly to obtain, while
a frame listing cluster is easily obtained.
2. The cost of obtaining observations increases as the distance separating the elements increases.
Elements other than people are often sampled in clusters. An automobile forms a nice cluster of four
tires for studies of tire wear and safety. A circuit board manufactured for a computer forms a cluster of
semiconductors for testing. An orange tree forms a cluster of oranges for investigating an insect
infestation. A plot in a forest contains a cluster of trees for estimating timber volume or proportions of
diseased trees.
Notice the main difference between the optimal construction of strata and the construction of clusters.
Strata are to be as homogeneous (alike) as possible within, but one stratum should differ as much as
possible from another with respect to the characteristic being measured. Clusters, on the other hand,
should be as heterogeneous (different) as possible within, and one cluster should look very much like
another in order for the economic advantages of cluster sampling to pay off.

34
One Stage (Single Stage) Cluster Sampling
Cluster sampling is simple random sampling with each sampling unit containing a collection or cluster
of elements. If each element within a sampled cluster is measured, the result is a single-stage cluster
sample. Hence, the estimators of the population mean, total, and proportion are similar to those for
simple random sampling. In particular, the sample mean is a good estimator of the population mean.
Estimation Procedure for Cluster Sampling
Let M = Number of clustersin the population
m = Number of clustersin the sample

N i = Number of units or elements in the i th cluster


M m
i = 1,2,..., M N =  Ni n =  Ni
i =1 i =1

(i) If the clusters are of equal size


Ni = N = L , N 1i = N 2 = N 3 = ... = N m

m n
Sampling Fraction = f = = , N = ML , n = mL
M N
Estimator for the Population Mean
m Ni m L
y =  =  N = ML, n = mL
yij yij
n n ,
i =1 j =1 i =1 j =1

 L yij 
 L 
m m
 j =1 
y =  mi = 
y
 where y i is the cluster mean
i =1 i =1 m
 
 

Standard Error of the Estimator for the Population Mean

S .E.( y ) =
1− f

m
(y − y) i
2

=
(1 − f )  m y 2 − m y 2 
 i
m i =1 m −1 m(m − 1)  i =1 
n mL m
Ni = N = L i.e. N 1 = N 2 = N 3 = ... = N m = L f = = =
N ML M

Estimator for the Population Total


m L m L m L m L m L
N M 1
Yˆ = yT = N  nij = M  mij =  yij =  yij =  y
y y
ij
i =1 j =1 i =1 j =1 n i =1 j =1 m i =1 j =1 f i =1 j =1

Standard Error of the Estimator for the Population Total

S .E.(Yˆ ) = S .E.( yT ) = 
(
N 2 (1 − f ) m yi − y ) 2

=
N 2 (1 − f )  m 2 2
 yi − my 
m i =1 m − 1 m(m − 1)  i =1 

35
Estimator for the Population Proportion
m
Pˆ =  mpi , where pi is the proportion for the i th cluster
i =1

Standard Error of the Estimator for the Population Proportion

(1 − f ) m
(p − Pˆ ) 2
(1 − f )  m p 2 − mPˆ 2 
S .E.( Pˆ ) =
m

i =1
i

m −1
=  i
m(m − 1)  i =1 

Example: A city is to be divided into 415 clusters. Twenty-five of the clusters will be sampled, and
interviews are conducted at every household in each of the 25 blocks sampled. The data on incomes are
presented in the table below. Use the data to estimate the per-capita income in the city and place a bound
on the error of estimation.

(ii) If the clusters are of different sizes


M
Ni  N , N1 , N 2 , N 3 ,..., N M , N
i =1
i =N

M Random Selection m
Clusters Clusters
Estimator for the Population Mean

There are two ways; (a) the group mean ( y G ) (b) the element mean ( y E )

(a) Group Mean ( y G )


m
y G =  myi where yi is the cluster mean
i =1

36
Standard Error of the Estimator for the Population Mean

(1 − f )  m y 2 − m y 2  , n
S .E.( y G ) =  i
m(m − 1)  i =1
G

f =
N

(b) Element Mean ( y E )


m Ni m Ni

 yij  y ij m
n =  Ni
i =1 j =1 i =1 j =1
yE = m
= ,
N
n i =1
i
i =1

(1 − f )m m (y − yN )2 , Ni
S .E.( y E ) =  i
n 2 (m − 1) i =1
i where yi =  yij
j =1

(1 − f )m  m y 2 + y 2 m N 2 − 2 y m y N 
S .E.( y E ) =  i E
n 2 (m − 1)  i =1 i =1
i E i
i =1
i

Estimator for the Population Proportion


(a) Method - A
m
Pˆ =  mpi , where pi is the proportion for the i th cluster
i =1

Standard Error of the Estimator for the Population Proportion

(1 − f )  m p 2 − mPˆ 2 
S .E.( Pˆ ) =  i
m(m − 1)  i =1 
(b) Method - B
m
Ni Ni
Pˆ =  wi pi , wi = m
=
N
i =1 n
i
i =1

Standard Error of the Estimator for the Population Proportion

(1 − f )m m N 2 (p − Pˆ )2
S .E.( Pˆ ) =  i i
n 2 (m − 1) i =1

Estimator for the Population Ratio


m Ni

 y ij
y Ni Ni
Rˆ = y i =  y ij , xi =  xij
i =1 j =1
Nj
= ,
m

 x x j =1 j =1
ij
i =1 j =1

37
Standard Error of the Estimator for the Population Ratio

(1 − f )m  m y 2 + Rˆ 2 m x 2 − 2Rˆ m y x 2 
S .E.( Rˆ ) =  i  i  i i
n x (m − 1)
2
2  i =1 i =1 i =1 

Two-Stage Cluster Sampling – Unequal-sized clusters in both the first stage and second stage
First Stage
Selection of “m” clusters out of a total of “M” clusters (e.g. Schools) using a suitable sampling
procedure.
Second Stage
If the cluster is a school, select one or more classes from a school as second-stage clusters using
a suitable sampling procedure.
Then, all units within the selected second-stage clusters are surveyed.

M Random Selection m
Clusters Clusters

N1 , N 2 , N 3 ,..., N i ,..., N m
    
n1 , n2 , n3 ,..., ni ,..., nm

N i = Number of units in the i th cluster

ni = Number of units in the selected i th cluster

If we are going to estimate the number of students in an educational division,


y ij = Number of students in the j th class in the i th sampled cluster
ni
1
yi =
ni
y
j =1
ij where yi is the average class size

1 m
y =  Ni yi where y is the estimator of the average number of students per
m i =1
cluster( school)

Total Number of Students in the Educational Division


m
M
Yˆ = yT =
m
N i =1
i yi

Standard Error of the Estimator


2
 m

m  Ni yi 
( )  N y −
2 1
1 1  ni y ij − y i
i i
1 1 
+ M 2  −   
m m
S .E.(Yˆ ) = S .E.( yT ) =
M
N 2
 −  i =1

 ni N i  j =1 ni − 1 m −1
i
m i =1  m M  i =1

38
To estimate the mean by ratio method, we estimate the total number of students Yˆ in the whole
population and the total number of classes X̂ in the whole population.

Since we took a sample of m clusters from M clusters


m
M
Yˆ = yT =
m
N
i =1
i yi

m
M
Xˆ = xT =
m
N
i =1
i

Then, the estimated average number of students per class in the educational division is
M m

 Ni yi
m i =1
ˆ
R= = = Average number of students per class
Xˆ M m
 Ni
m i =1
Standard Error of the Estimator

M m
1 1  ni yij − y i( )
2
2
(
M  1 1  m N y − N i Rˆ )
2

S .E.( Rˆ ) =
mXˆ 2
 N  n N  n − 1
 i
2
− + 2  −  i i
Xˆ  m M  i =1 m −1
i =1  i i  j =1 i

Example
To estimate the number of preschool children in a town, a sample survey was conducted adopting a two-
stage cluster sample with unequal-sized clusters. A first-stage sample of ten census blocks (clusters) was
selected randomly (using a simple random sampling procedure) from a total of 120 census blocks. Each
of the selected census blocks was divided into small groups or clusters of approximately 10 households.
A secondary sample of one such cluster from each of the selected census blocks was selected again using
a simple random sampling procedure. The following table gives the number of households (Ni) in each
of the selected census blocks, the number of households in each cluster selected (ni), and the number of
preschool children (yij) in each household in the selected clusters of households.

Number of pre-school children in the jth household in ith census block


i Ni ni
1 2 3 4 5 6 7 8 9 10 11 12
1 82 10 2 1 3 0 3 4 1 2 1 1 - -
2 79 9 3 0 1 2 0 1 0 1 2 - - -
3 69 10 5 2 0 3 1 0 2 1 3 2 - -
4 70 10 4 2 1 3 4 2 2 1 1 1 - -
5 92 11 3 1 2 3 0 1 0 1 0 1 2 -
6 80 10 0 0 3 2 1 0 2 1 1 0 - -
7 91 11 1 0 0 0 1 2 1 0 1 4 1 -

39
8 63 9 3 1 2 0 1 0 0 0 1 - - -
9 70 10 1 1 2 1 0 1 3 4 1 1 - -
10 67 12 5 2 1 0 1 2 1 2 1 2 3 1

Advantages of Cluster Sampling


1. It introduces flexibility in the sampling method
2. It is helpful in the large-scale survey where the preparation of a list is difficult, time-consuming,
or expensive
3. It is valuable in underdeveloped countries, where no detailed and accurate framework is
available.
Disadvantages of Cluster Sampling
1. It is less accurate than other methods.
Exercise
In a survey on unemployment conducted in a particular district, residential houses were considered as
clusters and households as elements. The clusters were selected by simple random sampling out of 684
clusters. The following data show the number of unemployed persons in sampled households. Estimate
the average number of unemployed persons per household and its standard error.
Cluster Number of unemployed Cluster Number of unemployed
persons persons
1 2, 1, 3 9 2, 2, 3
2 5, 3, 1 10 2
3 2, 3 11 3, 2, 0
4 1, 2, 4 12 4, 0
5 2, 4, 0, 0, 1 13 2, 2
6 0, 0, 3, 2, 2 14 1, 2, 1
7 2, 2, 1, 0, 3 15 2
8 1, 1, 2, 4, 5, 0

40

You might also like