Generalization of Multistage Cluster Sampling Using Finite Population
Generalization of Multistage Cluster Sampling Using Finite Population
1 ISSN 2305-8269
International Journal of Engineering and Applied Sciences
© 2012 EAAS & ARF. All rights reserved
www.eaas-journal.org
ABSTRACT
This paper generalizes the use of multistage cluster sampling design in estimating the population total
where all units within the clusters are considered. The focus is on a special design where certain number of
visits is considered for estimating the population size and a weighted factor is introduced. The
Keywords: Unequal probability sampling, Two-stage sampling, Hansen-Hurwitz estimator and Horvitz-
Thompson estimator
INTRODUCTION
Many estimation procedures have been primary units, then different samples of
developed in multistage cluster sampling secondary units within primary units and then
designs. Some of these procedures are very different samples of tertiary units within
famous for example, Cochran (1977); Kalton secondary units.
(1983); Henry (1990); Thompson (1992); Fink (iv) In general, if there are stages of sub
(2002); Okafor (2002); and Tate and Hudgens sampling, there will be sources of variability.
(2007). Of recent, is the work of Nafiu (2012) on Thus, variances and variance estimators for
comparison of estimates arising from one-, two- multistage cluster sampling with stage will
and three- stage; and that of Nafiu et al. (2012) contain the sum of components of variability.
on alternative estimation procedure for a three-
stage cluster sampling design. AIM AND OBJECTIVES OF THIS STUDY
Variability in multistage sampling includes the The aim of this research is to generalize the
following: estimation procedure for multistage sampling
(i) In one-stage cluster sampling, the estimate scheme. The main objectives are to:
varies due to one source: different samples of (i) investigate some of the existing estimators
primary units yield different estimates. used in multistage cluster sampling designs.
(ii)In two-stage cluster sampling, the estimate (ii) develop new estimator that is more efficient
varies due to two sources: different samples of than already existing estimators and generalize it.
primary units and then different samples of (iii) apply this newly generalized estimator to a
secondary units within primary units. real life situation. That is, the estimation of
(iii) In three-stage cluster sampling, the estimate population total of diabetic patients in Niger
varies due to three sources: different samples of
17
April 2013. Vol. 3, No. 1 ISSN 2305-8269
International Journal of Engineering and Applied Sciences
© 2012 EAAS & ARF. All rights reserved
www.eaas-journal.org
state for four (4) different years: 2005 – 2008 3. Select third unit k ij (i.e. the number of
(four data sets).
tertiary units in the secondary units of the
primary unit)
MATERIALS AND METHODS
In this section, we derived a generalized form of Let y iju be the value obtained for the uth third-
multistage cluster sampling design given by stage units in the jth second-stage units drawn
Nafiu (2012) procedure. The generalization is from the ith primary units. The relevant
described as: population total for over-all sample in a three-
stage is given as follows:
1. Select first unit n (i.e. the number of
primary units in the sample)
2. Select second unit mi (i.e. the number
of secondary units in the primary unit)
N M K
Y yiju (1)
i 1 j 1 u 1
^
For any estimation h in the hth cell based on completely arbitrary probabilities of selection, the total
variance is then the sum of the variances for all strata. The symbol E is used for the operator of expectation,
^
V for the variance, and V for the unbiased estimate of V. We may then write
^ ^ ^
V ( h ) V ( E ( h )) E (V ( h )) (2)
1 1 1 1
where “>1” is the symbol to represent all stages of sampling after the first.
For instance, the state consists of number of replacement of number of cities is selected.
local government areas out of which a simple Finally, from the selected sample of city
random sampling of n number of local containing number of hospitals, number
government areas is selected. Each local of hospitals is selected at random without
government area consists of number of cities replacement and the number of diabetic patients
out of which a simple random sampling without in this hospital is collected.
Then;
∑ ∑ (4)
An unbiased estimator of the population total at secondary unit in the primary unit in the sample is:
̂ ∑
∑ (5)
18
April 2013. Vol. 3, No. 1 ISSN 2305-8269
International Journal of Engineering and Applied Sciences
© 2012 EAAS & ARF. All rights reserved
www.eaas-journal.org
where is the known sampling fraction for tertiary units in the secondary unit of the
primary unit.
An unbiased estimator of the population total in the primary unit in the sample is:
̂ ∑ ̂ (6)
Finally, an unbiased estimator of the population total as given by Nafiu et al. (2012) for the diabetic
patients undergoing treatment in all the hospitals at the secondary unit (city) in the primary unit
(local government area) is:
̂ ∑ ∑ ∑ (7)
An unbiased estimator of the variance of ̂ given in equation (7) is obtained by replacing the
population variances with the sample variances as follows:
̂( ̂ ) ∑ ∑ ∑ ( )
(8)
̂
∑
where
∑
∑ ( )
and ̂ represents one-stage, two-stage or three-stage as the case may be.
EMPIRICAL STUDY
In this section, empirical study was carried out in contained in Nafiu (2012). The standard errors
order to decide about the performance of the obtained for the estimated population totals are
generalization of the multistage cluster selection as shown in Table 2.
procedure. To carry out the empirical study, the
sampling standard errors and the corresponding ESTIMATED POPULATION TOTALS AND
coefficients of variation of one-stage, two-stage STANDARD ERRORS
and three-stage cluster designs were obtained for
all the cases and the populations.
Tables 1 and 2 give estimated population totals
and their corresponding standard errors using
There are eight (8) categories of data used in this equations (7) and (8) respectively
paper. The first four (4) data sets were obtained
and used as illustration while the second four (4)
RANKING OF COEFFICIENTS OF
data sets used are of secondary type and were
VARIATION FOR THE ESTIMATED
collected from Niger State Ministry of Health,
POPULATION TOTALS
Minna, Niger state, Nigeria. We constructed a
sampling frame from all diabetic patients with
chronic eye disease (Glaucoma and Retinopathy) This is given by the ratio of standard error for the
in the twenty-five (25) Local Government Areas estimated population total to the estimated
of the state between years 2005 and 2008 as population total itself and expressed in
percentage. That is;
19
April 2013. Vol. 3, No. 1 ISSN 2305-8269
International Journal of Engineering and Applied Sciences
© 2012 EAAS & ARF. All rights reserved
www.eaas-journal.org
The ranking for coefficient of variations, in ascending order, using one-stage, two-stage and three-stage
sampling schemes are given in Table 3 below.
DISCUSSION OF RESULTS
REFERENCES
Table 3 gives the ranking of coefficients of
variations, in ascending order, for the estimated 1. Cochran, W.G. 1977. Sampling Techniques. Third
Edition. John Wiley and Sons: New York.
population totals and it shows that the higher-
2. Fink, A. 2002. How to Sample In Surveys. Sage
stage estimator has the least ranking of Publications: Thousand Oaks, C.A.
coefficient of variation. That is, in general, the 3. Henry, G. T. 1990. Practical Sampling. Sage
coefficients of variation shows that higher-stage Publications: Thousand Oaks, C.A.
4. Kalton, G. 1983. Introduction to Survey Sampling.
sampling scheme performs better than lower-
Sage Publications: Thousand Oaks, C.A.
stage sampling scheme. 5. Nafiu, L. A. 2012. An Alternate Estimation Method for
Multistage Cluster Sampling in Finite Population .
Unpublished Ph.D Thesis, University of Ilorin, Ilorin,
CONCLUSION AND Nigeria.
RECOMMENDATIONS 6. Nafiu, L. A., Oshungade, I. O. and Adewara, A. A.,
2012. “Alternative Estimation Method for a Three-
Stage Cluster Sampling in Finite Population”.
When an unbiased estimator of high precision American Journal of Mathematics and Statistics
and an unbiased sample estimate of its variance (AJMS). 2 (6): 12 – 17.
is required, the multistage sampling system 7. Okafor, F. 2002. Sample Survey Theory with
employing cluster scheme at each stage is Applications. Afro-Orbis Publications: Nigeria.
particularly appropriate. Higher order multistage
cluster sampling design gives the best results. 8. Tate, J.E. and Hudgens, M.G.2007. “Estimating
Population Size with Two-stage and Three-stage
Therefore, it is recommended that higher-stage sampling designs”. American Journal of Epidemiology.
cluster sampling designs be employed when 165(11): 1314-1320.
considering multistage sampling. 9. Thompson, S. K. 2002. Sampling. John Wiley and
Sons: New York.
20