43 Survey Sampling

Download as pdf or txt
Download as pdf or txt
You are on page 1of 199

Notes On

Sample
Survey
Chapter 1
Introduction

Statistics is the science of data.

Data are the numerical values containing some information.

Statistical tools can be used on a data set to draw statistical inferences. These statistical inferences are
in turn used for various purposes. For example, government uses such data for policy formulation for
the welfare of the people, marketing companies use the data from consumer surveys to improve the
company and to provide better services to the customer, etc. Such data is obtained through sample
surveys. Sample surveys are conducted throughout the world by governmental as well as non-
governmental agencies. For example, “National Sample Survey Organization (NSSO)” conducts
surveys in India, “Statistics Canada” conducts surveys in Canada, agencies of United Nations like
“World Health Organization (WHO), “Food and Agricultural Organization (FAO)” etc. conduct
surveys in different countries.

Sampling theory provides the tools and techniques for data collection keeping in mind the objectives to
be fulfilled and nature of population.

There are two ways of obtaining the information


1. Sample surveys
2. Complete enumeration or census

Sample surveys collect information on a fraction of total population whereas census collect information
on whole population. Some surveys e.g., economic surveys, agricultural surveys etc. are conducted
regularly. Some surveys are need based and are conducted when some need arises, e.g., consumer
satisfaction surveys at a newly opened shopping mall to see the satisfaction level with the amenities
provided in the mall .

1
Sampling unit:
An element or a group of elements on which the observations can be taken is called a sampling unit.
The objective of the survey helps in determining the definition of sampling unit.

For example, if the objective is to determine the total income of all the persons in the household, then
the sampling unit is household. If the objective is to determine the income of any particular person in
the household, then the sampling unit is the income of the particular person in the household. So the
definition of sampling unit depends and varies as per the objective of the survey. Similarly, in another
example, if the objective is to study the blood sugar level, then the sampling unit is the value of blood
sugar level of a person. On the other hand, if the objective is to study the health conditions, then the
sampling unit is the person on whom the readings on the blood sugar level, blood pressure and other
factors will be obtained. These values will together classify the person as healthy or unhealthy.

Population:
Collection of all the sampling units in a given region at a particular point of time or a particular period
is called the population. For example, if the medical facilities in a hospital are to be surveyed through
the patients, then the total number of patients registered in the hospital during the time period of survey
will the population. Similarly, if the production of wheat in a district is to be studied, then all the fields
cultivating wheat in that district will be constitute the population. The total number of sampling units in
the population is the population size, denoted generally by N. The population size can be finite or
infinite (N is large).

Census:
The complete count of population is called census. The observations on all the sampling units in the
population are collected in the census. For example, in India, the census is conducted at every tenth
year in which observations on all the persons staying in India is collected.

Sample:
One or more sampling units are selected from the population according to some specified procedure.
A sample consists only of a portion of the population units. Such a collection of units is called the
sample.

2
In the context of sample surveys, a collection of units like households, people, cities, countries etc. is
called a finite population.
A census is a 100% sample and it is a complete count of the population.

Representative sample:
When all the salient features of the population are present in the sample, then it is called a
representative sample,
It goes without saying that every sample is considered as a representative sample.

For example, if a population has 30% males and 70% females, then we also expect the sample to have
nearly 30% males and 70% females.

In another example, if we take out a handful of wheat from a 100 Kg. bag of wheat, we expect the
same quality of wheat in hand as inside the bag. Similarly, it is expected that a drop of blood will give
the same information as all the blood in the body.

Sampling frame:
The list of all the units of the population to be surveyed constitutes the sampling frame. All the
sampling units in the sampling frame have identification particulars. For example, all the students in a
particular university listed along with their roll numbers constitute the sampling frame. Similarly, the
list of households with the name of head of family or house address constitutes the sampling frame. In
another example, the residents of a city area may be listed in more than one frame - as per automobile
registration as well as the listing in the telephone directory.

Ways to ensure representativeness:


There are two possible ways to ensure that the selected sample is representative.

1. Random sample or probability sample:


The selection of units in the sample from a population is governed by the laws of chance or probability.
The probability of selection of a unit can be equal as well as unequal.

3
2. Non-random sample or purposive sample:
The selection of units in the sample from population is not governed by the probability laws.

For example, the units are selected on the basis of personal judgment of the surveyor. The persons
volunteering to take some medical test or to drink a new type of coffee also constitute the sample on
non-random laws.

Another type of sampling is Quota Sampling. The survey in this case is continued until a
predetermined number of units with the characteristic under study are picked up.

For example, in order to conduct an experiment for rare type of disease, the survey is continued till
the required number of patients with the disease are collected.

Advantages of sampling over complete enumeration:


1. Reduced cost and enlarged scope.
Sampling involves the collection of data on smaller number of units in comparison to the
complete enumeration, so the cost involved in the collection of information is reduced. Further,
additional information can be obtained at little cost in comparison to conducting another
separate survey. For example, when an interviewer is collecting information on health
conditions, then he/she can also ask some questions on health practices. This will provide
additional information on health practices and the cost involved will be much less than
conducting an entirely new survey on health practices.

2. Organizaton of work:
It is easier to manage the organization of collection of smaller number of units than all the units
in a census. For example, in order to draw a representative sample from a state, it is easier to
manage to draw small samples from every city than drawing the sample from the whole state at
a time. This ultimately results in more accuracy in the statistical inferences because better
organization provides better data and in turn, improved statistical inferences are obtained.

4
3. Greater accuracy:
The persons involved in the collection of data are trained personals. They can collect the data
more accurately if they have to collect smaller number of units than large number of units.

4. Urgent information required:


The data from a sample can be quickly summarized.
For example, the forecasting of the crop production can be done quickly on the basis of a
sample of data than collecting first all the observation.

5. Feasibility:
Conducting the experiment on smaller number of units, particularly when the units are
destroyed, is more feasible. For example, in determining the life of bulbs, it is more feasible to
fuse minimum number of bulbs. Similarly, in any medical experiment, it is more feasible to use
less number of animals.

Type of surveys:
There are various types of surveys which are conducted on the basis of the objectives to be fulfilled.

1. Demographic surveys:
These surveys are conducted to collect the demographic data, e.g., household surveys, family size,
number of males in families, etc. Such surveys are useful in the policy formulation for any city, state or
country for the welfare of the people.

2. Educational surveys:
These surveys are conducted to collect the educational data, e.g., how many children go to school, how
many persons are graduate, etc. Such surveys are conducted to examine the educational programs in
schools and colleges. Generally, schools are selected first and then the students from each school
constitue the sample.

5
3. Economic surveys:
These surveys are conducted to collect the economic data, e.g., data related to export and import of
goods, industrial production, consumer expenditure etc. Such data is helpful in constructing the indices
indicating the growth in a particular sector of economy or even the overall economic growth of the
country.

4. Employment surveys:
These surveys are conducted to collect the employment related data, e.g., employment rate, labour
conditions, wages, etc. in a city, state or country. Such data helps in constructing various indices to
know the employment conditions among the people.

5. Health and nutrition surveys:


These surveys are conducted to collect the data related to health and nutrition issues, e.g., number of
visits to doctors, food given to children, nutritional value etc. Such surveys are conducted in cities,
states as well as countries by the national and international organizations like UNICEF, WHO etc.

6. Agricultural surveys:
These surveys are conducted to collect the agriculture related data to estimate, e.g., the acreage and
production of crops, livestock numbers, use of fertilizers, use of pesticides and other related topics. The
government bases its planning related to the food issues for the people based on such surveys.

7. Marketing surveys:
These surveys are conducted to collect the data related to marketing. They are conducted by major
companies, manufacturers or those who provide services to consumer etc. Such data is used for
knowing the satisfaction and opinion of consumers as well as in developing the sales, purchase and
promotional activities etc.

8. Election surveys:
These surveys are conducted to study the outcome of an election or a poll. For example, such polls are
conducted in democratic countries to have the opinions of people about any candidate who is contesting
the election.

6
9. Public polls and surveys:
These surveys are conducted to collect the public opinion on any particular issue. Such surveys are
generally conducted by the news media and the agencies which conduct polls and surveys on the
current topics of interest to public.

10. Campus surveys:


These surveys are conducted on the students of any educational institution to study about the
educational programs, living facilities, dining facilities, sports activities, etc.

Principal steps in a sample survey:


The broad steps to conduct any sample surveys are as follows:

1. Objective of the survey:


The objective of the survey has to be clearly defined and well understood by the person planning to
conduct it. It is expected from the statistician to be well versed with the issues to be addressed in
consultation with the person who wants to get the survey conducted. In complex surveys, sometimes the
objective is forgotten and data is collected on those issues which are far away from the objectives.

2. Population to be sampled:
Based on the objectives of the survey, decide the population from which the information can be
obtained. For example, population of farmers is to be sampled for an agricultural survey whereas the
population of patients has to be sampled for determining the medical facilities in a hospital.

3. Data to be collected:
It is important to decide that which data is relevant for fulfilling the objectives of the survey and to
note that no essential data is omitted. Sometimes, too many questions are asked and some of their
outcomes are never utilized. This lowers the quality of the responses and in turn results in lower
efficiency in the statistical inferences.

7
4. Degree of precision required:
The results of any sample survey are always subjected to some uncertainty. Such uncertainty can be
reduced by taking larger samples or using superior instruments. This involves more cost and more time.
So it is very important to decide about the required degree of precision in the data. This needs to be
conveyed to the surveyor also.

5. Method of measurement:
The choice of measuring instrument and the method to measure the data from the population needs to
be specified clearly. For example, the data has to be collected through interview, questionnaire,
personal visit, combination of any of these approaches, etc. The forms in which the data is to be
recorded so that the data can be transferred to mechanical equipment for easily creating the data
summary etc. is also needed to be prepared accordingly.

6. The frame:
The sampling frame has to be clearly specified. The population is divided into sampling units such that
the units cover the whole population and every sampling unit is tagged with identification. The list of
all sampling units is called the frame. The frame must cover the whole population and the units must
not overlap each other in the sense that every element in the population must belong to one and only
one unit. For example, the sampling unit can be an individual member in the family or the whole
family.

7. Selection of sample:
The size of the sample needs to be specified for the given sampling plan. This helps in determining and
comparing the relative cost and time of different sampling plans. The method and plan adopted for
drawing a representative sample should also be detailed.

8. The Pre-test:
It is advised to try the questionnaire and field methods on a small scale. This may reveal some troubles
and problems beforehand which the surveyor may face in the field in large scale surveys.

8
9. Organization of the field work:
How to conduct the survey, how to handle business administrative issues, providing proper training to
surveyors, procedures, plans for handling the non-response and missing observations etc. are some of
the issues which need to be addressed for organizing the survey work in the fields. The procedure for
early checking of the quality of return should be prescribed. It should be clarified how to handle the
situation when the respondent is not available.

10. Summary and analysis of data:


It is to be noted that based on the objectives of the data, the suitable statistical tool is decided which
can answer the relevant questions. In order to use the statistical tool, a valid data set is required and this
dictates the choice of responses to be obtained for the questions in the questionnaire, e.g., the data has
to be qualitative, quantitative, nominal, ordinal etc. After getting the completed questionnaire back, it
needs to be edited to amend the recording errors and delete the erroneous data. The tabulating
procedures, methods of estimation and tolerable amount of error in the estimation needs to be decided
before the start of survey. Different methods of estimation may be available to get the answer of the
same query from the same data set. So the data needs to be collected which is compatible with the
chosen estimation procedure.

11. Information gained for future surveys:


The completed surveys work as guide for improved sample surveys in future. Beside this they also
supply various types of prior information required to use various statistical tools, e.g., mean, variance,
nature of variability, cost involved etc. Any completed sample survey acts as a potential guide for the
surveys to be conducted in the future. It is generally seen that the things always do not go in the same
way in any complex survey as planned earlier. Such precautions and alerts help in avoiding the
mistakes in the execution of future surveys.

9
Variability control in sample surveys:
The variability control is an important issue in any statistical analysis. A general objective is to draw
statistical inferences with minimum variability. There are various types of sampling schemes which are
adopted in different conditions. These schemes help in controlling the variability at different stages.
Such sampling schemes can be classified in the following way.

1. Before selection of sampling units


• Stratified sampling
• Cluster sampling
• Two stage sampling
• Double sampling etc.

2. At the time of selection of sampling units


• Systematic sampling
• Varying probability sampling

3. After the selection of sampling units


• Ratio method of estimation
• Regression method of estimation
Note that the ratio and regtresion methods are the methods of estimation and not the methods of
drawing samples.

Methods of data collection


There are various way of data collection. Some of them are as follows:

1. Physical observations and measurements:


The surveyor contacts the respondent personally through the meeting. He observes the sampling unit
and records the data. The surveyor can always use his prior experience to collect the data in a better
way. For example, a young man telling his age as 60 years can easily be observed and corrected by the
surveyor.

10
2. Personal interview:
The surveyor is supplied with a well prepared questionnaire. The surveyor goes to the respondents and
asks the same questions mentioned in the questionnaire. The data in the questionnaire is then filled up
accordingly based on the responses from the respondents.

3. Mail enquiry:
The well prepared questionnaire is sent to the respondents through postal mail, e-mail, etc. The
respondents are requested to fill up the questionnaires and send it back. In case of postal mail, many
times the questionnaires are accompanied by a self addressed envelope with postage stamps to avoid
any non-response due to the cost of postage.

4. Web based enquiry:


The survey is conducted online through internet based web pages. There are various websites which
provide such facility. The questionnaires are to be in their formats and the link is sent to the
respondents through email. By clicking on the link, the respondent is brought to the concerned website
and the answers are to be given online. These answers are recorded and responses as well as their
statistics is sent to the surveyor. The respondents should have internet connection to support the data
collection with this procedure.

5. Registration:
The respondent is required to register the data at some designated place. For example, the number of
births and deaths along with the details provided by the family members are recorded at city municipal
office which are provided by the family members.

6. Transcription from records:


The sample of data is collected from the already recorded information. For example, the details of the
number of persons in different families or number of births/deaths in a city can be obtained from the
city municipal office directly.

The methods in (1) to (5) provide primary data which means collecting the data directly from the
source. The method in (6) provides the secondary data which means getting the data from the primary
sources.

11
Chapter -2
Simple Random Sampling

Simple random sampling (SRS) is a method of selection of a sample comprising of n number of


sampling units out of the population having N number of sampling units such that every sampling
unit has an equal chance of being chosen.

The samples can be drawn in two possible ways.


• The sampling units are chosen without replacement in the sense that the units once chosen
are not placed back in the population .
• The sampling units are chosen with replacement in the sense that the chosen units are
placed back in the population.

1. Simple random sampling without replacement (SRSWOR):


SRSWOR is a method of selection of n units out of the N units one by one such that at any stage of
selection, anyone of the remaining units have same chance of being selected, i.e. 1/ N .

2. Simple random sampling with replacement (SRSWOR):


SRSWR is a method of selection of n units out of the N units one by one such that at each stage of
selection each unit has equal chance of being selected, i.e., 1/ N . .

Procedure of selection of a random sample:


The procedure of selection of a random sample follows the following steps:
1. Identify the N units in the population with the numbers 1 to N .
2. Choose any random number arbitrarily in the random number table and start reading
numbers.
3. Choose the sampling unit whose serial number corresponds to the random number drawn
from the table of random numbers.
4. In case of SRSWR, all the random numbers are accepted ever if repeated more than once.
In case of SRSWOR, if any random number is repeated, then it is ignored and more
numbers are drawn.

1
Such process can be implemented through programming and using the discrete uniform distribution.
Any number between 1 and N can be generated from this distribution and corresponding unit can be
seleced into the sample by associating an index with each sampling unit. Many statistical softwares
like R, SAS, etc. have inbuilt functions for drawing a sample using SRSWOR or SRSWR.

Notations:
The following notations will be used in further notes:

N: Number of sampling units in the population (Population size).


n: Number of sampling units in the sample (sample size)
Y: The characteristic under consideration
Yi : Value of the characteristic for the i th unit of the population

1 n
y= ∑ yi : sample mean
n i =1
N
1
Y =
N
∑y
i =1
i : population mean

1 N 1 N

=
=∑
S2 (Yi =
N −1 i 1=
− Y )2 (∑ Yi 2 − NY 2 )
N −1 i 1
1 N 1 N
=
σ2 =
= ∑ (Yi − Y )2 =(∑ Yi 2 − NY 2 )
N i 1= N i1
n n
1 1
=
=
s 2
∑ ( yi =
n −1 i 1=
− y) 2
(∑ yi2 − ny 2 )
n −1 i 1

Probability of drawing a sample :


1.SRSWOR:
N
If n units are selected by SRSWOR, the total number of possible samples are   .
n
1
So the probability of selecting any one of these samples is .
N
 
n
Note that a unit can be selected at any one of the n draws. Let ui be the ith unit selected in the
sample. This unit can be selected in the sample either at first draw, second draw, …, or nth draw.
2
Let Pj (i ) denotes the probability of selection of ui at the jth draw, j = 1,2,...,n. Then

Pj (i=
) P1 (i ) + P2 (i ) + ... + Pn (i )
1 1 1
= + + ... + (n times )
N N N
n
=
N

Now if u1 , u2 ,..., un are the n units selected in the sample, then the probability of their selection is

P(u1 , u2 ,..., un ) = P(u1 ).P(u2 ),..., P(un )


Note that when the second unit is to be selected, then there are (n – 1) units left to be selected in the
sample from the population of (N – 1) units. Similarly, when the third unit is to be selected, then
there are (n – 2) units left to be selected in the sample from the population of (N – 2) units and so on.
n
If P(u1 ) = , then
N
n −1 1
=P(u2 ) = ,..., P(un ) .
N −1 N − n +1
Thus
n n −1 n − 2 1 1
=
P(u1 , u2 ,.., un ) =. . ... .
N N −1 N − 2 N − n +1  N 
 
n

Alternative approach:
The probability of drawing a sample in SRSWOR can alternatively be found as follows:

Let ui ( k ) denotes the ith unit drawn at the kth draw. Note that the ith unit can be any unit out of the N

units. Then so = (ui (1) , ui (2) ,..., ui ( n ) ) is an ordered sample in which the order of the units in which they

are drawn, i.e., ui (1) drawn at the first draw, ui (2) drawn at the second draw and so on, is also

considered. The probability of selection of such an ordered sample is


P ( so ) = P (ui (1) ) P(ui (2) | ui (1) ) P(ui (3) | ui (1)ui (2) )...P(ui ( n ) | ui (1)ui (2) ...ui ( n −1) ).

Here P(ui ( k ) | ui (1)ui (2) ...ui ( k −1) ) is the probability of drawing ui ( k ) at the kth draw given that

ui (1) , ui (2) ,..., ui ( k −1) have already been drawn in the first (k – 1) draws.

3
Such probability is obtained as
1
P (ui ( k ) | ui (1)ui (2) ...ui ( k −1) ) = .
N − k +1
So
n
1 ( N − n)!
=P( so ) ∏
=
N − k +1
k =1 N!
.

The number of ways in which a sample of size n can be drawn = n !


( N − n)!
Probability of drawing a sample in a given order =
N!
So the probability of drawing a sample in which the order of units in which they are drawn is

( N − n)! 1
=
irrelevant n=
! .
N! N
 
n

2. SRSWR
When n units are selected with SRSWR, the total number of possible samples are N n . The
1
Probability of drawing a sample is .
Nn
Alternatively, let ui be the ith unit selected in the sample. This unit can be selected in the sample
either at first draw, second draw, …, or nth draw. At any stage, there are always N units in the
population in case of SRSWR, so the probability of selection of ui at any stage is 1/N for all i =

1,2,…,n. Then the probability of selection of n units u1 , u2 ,..., un in the sample is

P(u1 , u2 ,.., un ) = P(u1 ).P(u2 )...P(un )


1 1 1
= . ...
N N N
1
= n
N

4
Probability of drawing an unit
1. SRSWOR
Let Ae denotes an event that a particular unit u j is not selected at the th draw. The

probability of selecting, say, j th unit at k th draw is

P (selection of u j at k th draw) = P( A1  A2  ....  Ak −1  Ak )

= P( A1 ) P( A2 A1 ) P( A3 A1 A2 ).....P( Ak −1 A1 , A2 ...... Ak − 2 ) P( Ak A1 , A2 ...... Ak −1 )


 1  1  1   1  1
= 1 −  1 −  1 −  ... 1 − 
 N   N −1   N − 2   N − k + 2  N − k +1
N −1 N − 2 N − k +1 1
= . ... .
N N − 1 N − +2 N − k + 1
1
=
N

2. SRSWR
1
P[ selection of u j at kth draw] = .
N

Estimation of population mean and population variance


One of the main objectives after the selection of a sample is to know about the tendency of the data
to cluster around the central value and the scatterdness of the data around the central value. Among
various indicators of central tendency and dispersion, the popular choices are arithmetic mean and
variance. So the population mean and population variability are generally measured by the arithmetic
mean (or weighted arithmetic mean) and variance, respectively. There are various popular estimators
for estimating the population mean and population variance. Among them, sample arithmetic mean
and sample variance are more popular than other estimators. One of the reason to use these
estimators is that they possess nice statistical properties. Moreover, they are also obtained through
well established statistical estimation procedures like maximum likelihood estimation, least squares
estimation, method of moments etc. under several standard statistical distributions. One may also
consider other indicators like median, mode, geometric mean, harmonic mean for measuring the
central tendency and mean deviation, absolute deviation, Pitman nearness etc. for measuring the
dispersion. The properties of such estimators can be studied by numerical procedures like
bootstraping.
5
1. Estimation of population mean
1 n
Let us consider the sample arithmetic mean y = ∑ yi as an estimator of population mean
n i =1
N
1
Y =
N
∑Y
i =1
i and verify y is an unbiased estimator of Y under the two cases.

SRSWOR
n
Let ti = ∑ yi . Then
i =1

n
1
E( y ) = E (∑ yi )
n i =1
1
= E ( ti )
n
 N 
   
1 1 n 
= ∑ ti
n   N  i =1 
  
 n  
N
 
1 1  n  n
=
n  N=
∑  ∑
 i 1= i 1
yi .

 
n
When n units are sampled from N units by without replacement , then each unit of the population
can occur with other units selected out of the remaining ( N − 1) units is the population and each unit

 N − 1 N
occurs in   of the   possible samples. So
 n −1  n
N
 
n
 n
  N − 1 N
So ∑  ∑ y  =  n − 1  ∑ y .
i i
=i 1 =i 1  
=i 1

Now
( N − 1)! n !( N − n)! N
E( y ) =
(n − 1)!( N − n)! nN!
∑i =1
yi
N
1
=
N
∑y
i =1
i

=Y.

6
Thus y is an unbiased estimator of Y . Alternatively, the following approach can also be adopted to
show the unbiasedness property.
n
1
E( y ) =
n
∑j =1
E( y j )

1 n N 
= ∑  ∑
i 1
n=j 1 =
Yi Pj (i ) 

1 n N 1
= ∑  ∑ Yi . 
i 1 N
n=j 1 =
n
1
=
n
∑Y
j =1

=Y

where Pj (i ) denotes the probability of selection of i th unit at j th stage.

SRSWR
n
1
E( y ) = E (∑ yi )
n i =1
1 n
= ∑ E ( yi )
n i =1
1 n
= ∑ (Y1P1 + .. + YN P)
n i =1
1 n
=
n
∑Y
=Y.
1
where Pi = for all i = 1, 2,..., N is the probability of selection of a unit. Thus y is an unbiased
N
estimator of population mean under SRSWR also.

7
Variance of the estimate
Assume that each observation has some variance σ 2 . Then
V (=
y ) E ( y − Y )2
2
1 n 
= E  ∑ ( yi − Y ) 
 n i =1 
1 n 1 n n 
= E  2 ∑ ( yi − Y ) 2 + 2 ∑∑ ( yi − Y )( y j − Y ) 
= n i 1 n i ≠j 
n n n
1 1
= 2 ∑ E ( yi − Y ) 2 + 2 ∑∑ E ( yi − Y )( y j − Y )
n n i ≠j
1 n 2 K
=
n2
∑ σ + n2
N −1 2 K
= S + 2
Nn n
n n
where =
K ∑∑ E ( y − Y )( y − Y )
i ≠j
i i assuming that each observation has variance σ 2 . Now we find

K under the setups of SRSWR and SRSWOR.

SRSWOR
n n
=
K ∑∑ E ( y − Y )( y − Y ) .
i ≠j
i i

Consider
N N
1
E ( y=
i − Y )( y j − Y ) ∑∑ ( yk − Y )( ye − Y )
N ( N − 1) k ≠ 
Since
2
N  N N N

∑ k −  ∑ k
= − + ∑∑ ( yk − Y )( y − Y ))
2
( y Y ) ( y Y )
= k 1=  i 1 k ≠
N N
0 =( N − 1) S 2 + ∑∑ ( yk − Y )( y − Y )
k ≠
N N
1
∑∑ ( y
k ≠
k − Y )( y=
−Y )
N ( N − 1)
[−( N − 1) S 2 ]

S2
= − .
N

8
S2
Thus K =
−n(n − 1) and so substituting the value of K , the variance of y under SRSWOR is
N
N −1 2 1 S2
V ( yWOR )= S − 2 n(n − 1)
Nn n N
N −n 2
= S .
Nn

SRSWR
N N
=
K ∑∑ E ( y − Y )( y − Y )
i ≠j
i i

N N
= ∑∑ E ( y − Y ) E ( y
i ≠j
i je −Y )

=0
because the ith and jth draws (i ≠ j ) are independent.
Thus the variance of y under SRSWR is
N −1 2
V ( yWR ) = S .
Nn
It is to be noted that if N is infinite (large enough), then
S2
V ( y) =
n
N −n
is both the cases of SRSWOR and SRSWR. So the factor is responsible for changing the
N
variance of y when the sample is drawn from a finite population in comparison to an infinite
N −n
population. This is why is called a finite population correction (fpc) . It may be noted that
N
N −n n N −n n
= 1 − , so is close to 1 if the ratio of sample size to population , is very small or
N N N N
n
negligible. The term is called sampling fraction. In practice, fpc can be ignored whenever
N
n
< 5% and for many purposes even if it is as high as 10%. Ignoring fpc will result in the
N
overestimation of variance of y .

9
Efficiency of y under SRSWOR over SRSWR
N −n 2
V ( yWOR ) = S
Nn
N −1 2
V ( yWR ) = S
Nn
N − n 2 n −1 2
= S + S
Nn Nn
= V ( yWOR ) + a positive quantity
Thus

V ( yWR ) > V ( yWOR )


and so, SRSWOR is more efficient than SRSWR.

Estimation of variance from a sample


Since the expressions of variances of sample mean involve S 2 which is based on population values,
so these expressions can not be used in real life applications. In order to estimate the variance of y

on the basis of a sample, an estimator of S 2 (or equivalently σ 2 ) is needed. Consider S 2 as an


estimator of s 2 (or σ 2 ) and we investigate its biasedness for S 2 in the cases of SRSWOR and
SRSWR,

Consider
1 n
=s2 ∑
n − 1 i =1
( yi − y ) 2
2
1 n
= ∑ ( yi − Y ) − ( y − Y ) 
n − 1 i =1 
1  n 
=  ∑
n − 1  i =1
( yi − y ) 2 − n( y − Y ) 2 

1  n 
=
E (s 2 )  ∑
n − 1  i =1
E ( yi − Y ) 2 − nE ( y − Y ) 2 

1  n  1
=  ∑
n − 1  i =1
Var ( yi ) − nVar ( y )  =
 n −1
 nσ 2 − nVar ( y ) 

10
In case of SRSWOR
N −n 2
V ( yWOR ) = S
Nn
and so
n  2 N −n 2
=
E (s 2 ) σ − S 
n − 1  Nn 
n  N −1 2 N − n 2 
= S − S 
n − 1  N Nn 
= S2
In case of SRSWR
N −1 2
V ( yWR ) = S
Nn
and so

n  2 N −n 2
=
E (s 2 ) σ − S 
n − 1  Nn 
n  N −1 2 N − n 2 
= S − S 
n − 1  N Nn 
N −1 2
= S
N
=σ2
Hence
 S 2 is SRSWOR
E (s2 ) =  2
σ is SRSWR

An unbiased estimate of Var ( y ) is


N −n 2
Vˆ ( yWOR ) = s in case of SRSWOR and
Nn
N −1 N 2
Vˆ ( yWR ) = . s
Nn N − 1
s2
= in case of SRSWR.
n

11
Standard errors
The standard error of y is defined as Var ( y ) .
In order to estimate the standard error, one simple option is to consider the square root of estimate of
variance of sample mean.

N −n
• under SRSWOR, a possible estimator is σˆ ( y ) = s.
Nn

N −1
• under SRSWR, a possible estimator is σˆ ( y ) = s.
Nn
( y) .
It is to be noted that this estimator does not possess the same properties as of Var

Reason being if θˆ is an estimator of θ , then θ is not necessarily an estimator of θ .
In fact, the σˆ ( y ) is a negatively biased estimator under SRSWOR.

The approximate expressions for large N case are as follows:


(Reference: Sampling Theory of Surveys with Applications, P.V. Sukhatme, B.V. Sukhatme, S.
Sukhatme, C. Asok, Iowa State University Press and Indian Society of Agricultural Statistics,
1984, India)

Consider s as an estimator of S .
Let
S 2 + ε with E (ε ) =
s2 = 0, E (ε 2 ) =
S 2.
Write
s ( S 2 + ε )1/2
=

ε 
1/2

= S 1 + 2 
 S 
 ε ε2 
= S 1 + 2 − 4 + ... 
 2S 8S 
assuming ε will be small as compared to S 2 and as n becomes large, the probability of such an
event approaches one. Neglecting the powers of ε higher than two and taking expectation, we have

12
 Var ( s 2 ) 
E ( s=
) 1 −
8S 4 
S

where
2S 4   n − 1  
Var ( s ) =
2
1+   ( β 2 − 3) )  for large N .
(n − 1)   2n  
N j

∑ (Y − Y )
1
=µj i
N i =1

µ4
β2 = : coefficient of kurtosis.
S4
Thus
 1 β − 3
E (s) =
S 1 − − 2
 4(n − 1) 8n 
2
 1 Var ( s 2 ) 
Var ( s ) = S − S 1 −
2 2
4 
 8 S 
2
Var ( s )
=
4S 2
S 2   n −1  
= 1+   ( β 2 − 3)  .
2 ( n − 1)   2n  
Note that for a normal distribution, β 2 = 3 and we obtain

S2
Var ( s ) = .
2 ( n − 1)

Both Var ( s ) and Var ( s 2 ) are inflated due to nonnormality to the same extent, by the inflation factor

  n −1  
1 +  2n  ( β 2 − 3) 
   
and this does not depends on coefficient of skewness.

This is an important result to be kept in mind while determining the sample size in which it is
assumed that S 2 is known. If inflation factor is ignored and population is non-normal, then the
reliability on s 2 may be misleading.

13
Alternative approach:
The results for the unbiasedness property and the variance of sample mean can also be proved in an
alternative way as follows:

(i) SRSWOR
With the ith unit of the population, we associate a random variable ai defined as follows:

1, if the i th  unit occurs in the sample


ai = 
0, if the i unit does not occurs in the sample (i =1, 2,..., N )
th

Then,
E (ai ) = 1× Probability that the i th unit is included in the sample
n
= = , i 1, 2,..., N .
N
E (ai2 ) = 1× Probability that the i th unit is included in the sample
n
= =, i 1, 2,..., N
N
E (ai a j ) = 1× Probability that the i th and j th units are included in the sample
n(n − 1)
= = , i ≠ j 1, 2,..., N .
N ( N − 1)
From these results, we can obtain
n( N − n)
Var (ai ) = E (ai2 ) − ( E (ai ) ) = 2 , i =
2
1, 2,..., N
N
n( N − n)
Cov(ai= , a j ) E (ai a j ) − E (ai ) E=
(a j ) ,= i ≠ j 1, 2,..., N .
N 2 ( N − 1)
We can rewrite the sample mean as
1 N
y= ∑ ai yi
n i =1
Then
1 N
=E( y ) = ∑ E (ai ) yi Y
n i =1
and
1  N  1 N N 
Var ( y ) = = 2
Var  ∑ a
 i =1=
i i
y 2 ∑
 n i 1
Var ( ai ) yi
2
+ ∑ Cov(ai , a j ) yi y j  .
n i≠ j 

14
Substituting the values of Var (ai ) and Cov(ai , a j ) in the expression of Var ( y ) and simplifying, we

get
N −n 2
Var ( y ) = S .
Nn
To show that E ( s 2 ) = S 2 , consider

1  n 2 2 1 N 
=
=s2  ∑ y
(n − 1)  i 1 =
i −
= ny   ∑
 (n − 1)  i 1
ai yi2 − ny 2  .

Hence, taking, expectation, we get
1 N 
=
E (s 2 )  ∑ E (ai ) yi2 − n {Var ( y ) + Y 2 }
(n − 1)  i =1 
Substituting the values of E (ai ) and Var ( y ) in this expression and simplifying, we get E ( s 2 ) = S 2 .

(ii) SRSWR
Let a random variable ai associated with the ith unit of the population denotes the number of times

the ith unit occurs in the sample i = 1, 2,..., N . So ai assumes values 0, 1, 2,…,n. The joint

distribution of a1 , a2 ,..., aN is the multinomial distribution given by

n! 1
P(a1 , a2 ,..., aN ) = N
.
Nn
∏a !
i =1
i

N
where ∑a i =1
i = n. For this multinomial distribution, we have

n
E (ai ) = ,
N
n( N − 1)
=
Var (ai ) = , i 1, 2,..., N .
N2
n
Cov(ai , a j ) =− 2 , i ≠ j = 1, 2,..., N .
N
We rewrite the sample mean as
1 N
y= ∑ ai yi .
n i =1
Hence, taking expectation of y and substituting the value of E (ai ) = n / N we obtain that

E( y ) = Y .

15
Further,
1 N N

2 ∑ ∑
=Var ( y ) Var ( ai ) yi
2
+ Cov(ai , a j ) yi y j 
=n  i 1 =i 1 
Substituting, the values of Var (ai ) =
n( N − 1) / N 2 and Cov(ai , a j ) =
−n / N 2 and simplifying, we get

N −1 2
Var ( y ) = S .
Nn
N −1 2
To prove that=
E (s 2 ) = S σ 2 in SRSWR, consider
N
n N
(n − 1) s 2 =
=i 1 =i 1
∑ yi2 − ny 2 = ∑a y i
2
i − ny 2 ,

− n {Var ( y ) + Y 2 }
N
(n − 1) E ( s 2=
) ∑ E (a ) y
i =1
i
2
i

n N ( N − 1) 2
=∑ yi2 − n. S − nY 2
N i =1 nN
(n − 1)( N − 1) 2
= S
N
N −1 2
=
E (s 2 ) = S σ2
N

Estimator of population total:


Sometimes, it is also of interest to estimate the population total, e.g. total household income, total
expenditures etc. Let denotes the population total
N
=
YT ∑=
Y
i =1
i NY

which can be estimated by

YˆT = NYˆ
= Ny .

16
Obviously

( )
E YˆT = NE ( y )
= NY
( )
Var YˆT = N 2 ( y )
 2  N − n  2 N ( N − n) 2
 N  Nn  S = S for SRSWOR
   n
=
 N 2  N − 1  S 2 = N ( N − 1) S 2 for SRSWOR
  Nn  n

and the estimates of variance of YˆT are

 N ( N − n) 2
 s for SRSWOR

Var (YT ) = 
ˆ n
 N s2 for SRSWOR
 n

Confidence limits for the population mean


Now we construct the 100 (1 − α ) % confidence interval for the population mean. Assume that the

y −Y
population is normally distributed N ( µ , σ 2 ) with mean µ and variance σ 2 . then
Var ( y )

follows N (0,1) when σ 2 is known. If σ 2 is unknown and is estimated from the sample then

y −Y
follows a t -distribution with (n − 1) degrees of freedom. When σ 2 is known, then the
Var ( y )
100( 1 − α ) % confidence interval is given by

 y −Y 
P −Zα ≤ ≤ Zα 1 α
 =−
 2 Var ( y ) 2 
 
or P  y − Z α Var ( y ) ≤ y ≤ y + Zα Var ( y )  =1 − α
 2 2 
and the confidence limits are
 
 y − Zα Var ( y ), y + Z α Var ( y 
 2 2 

17
α
when Z α denotes the upper % points on N (0,1) distribution. Similarly, when σ 2 is unknown,
2 2
then the 100(1- 1 − α ) % confidence interval is

 y −Y 
P  −tα ≤ ≤ tα  =1 − α
 2 Varˆ( y ) 2 

 
or P  y − tα ≤ Varˆ( y ) ≤ y ≤ y + tα Varˆ( y )  =1 − α
 2 2 
and the confidence limits are
 
 y − tα ≤ Varˆ( y ) ≤ y + tα Varˆ( y ) 
 2 2 
α
where tα denotes the upper % points on t -distribution with (n − 1) degrees of freedom.
2 2

Determination of sample size


The size of the sample is needed before the survey starts and goes into operation. One point to be
kept is mind is that when the sample size increases, the variance of estimators decreases but the cost
of survey increases and vice versa. So there has to be a balance between the two aspects. The
sample size can be determined on the basis of prescribed values of standard error of sample mean,
error of estimation, width of the confidence interval, coefficient of variation of sample mean,
relative error of sample mean or total cost among several others.

An important constraint or need to determine the sample size is that the information regarding the
population standard derivation S should be known for these criterion. The reason and need for this
will be clear when we derive the sample size in the next section. A question arises about how to
have information about S before hand? The possible solutions to this issue are to conduct a pilot
survey and collect a preliminary sample of small size, estimate S and use it as known value of S
it. Alternatively, such information can also be collected from past data, past experience, long
association of experimenter with the experiment, prior information etc.

Now we find the sample size under different criteria assuming that the samples have been drawn
using SRSWOR. The case for SRSWR can be derived similarly.

18
1. Prespecified variance
The sample size is to be determined such that the variance of y should not exceed a given value, say
V. In this case, find n such that
Var ( y ) ≤ V
N −n
or ( y) ≤ V
Nn
N −n 2
or S ≤V
Nn
1 1 V
or − ≤ 2
n N S
1 1 1
or − ≤
n N ne
ne
n≥
n
1+ e
N
S2
where ne = .
v
It may be noted here that ne can be known only when S 2 is known. This reason compels to assume

that S should be known. The same reason will also be seen in other cases.
The smallest sample size needed in this case is
ne
nsmallest = .
ne
1+
N
It N is large, then the required n is
n ≥ ne and nsmallest = ne .

2. Pre-specified estimation error


It may be possible to have some prior knowledge of population mean Y and it may be required that
the sample mean y should not differ from it by more than a specified amount of absolute
estimation error, i.e., which is a small quantity. Such requirement can be satisfied by associating a
probability (1 − α ) with it and can be expressed as

P  y − Y ≤ e  = (1 − α ).

19
N −n 2
Since y follows N (Y , S ) assuming the normal distribution for the population, we can write
Nn
 y −Y e 
P ≤ =1−α
 Var ( y ) Var ( y ) 

which implies that


e
= Zα
Var ( y ) 2

or Z α2 Var ( y ) = e 2
2

N −n 2
or Z α2 S = e2
2 Nn

  Z S 2 
  α2  
   
  e  
or n =    
  Zα S  
2

 1 2  
 1+  
 N  e  
   
which is the required sample size. If N is large then
2
 Zα S 
 
n =  2e  .
 
 

3. Prespecified width of confidence interval


If the requirement is that the width of the confidence interval of y with confidence coefficient
(1 − α ) should not exceed a prespecified amount W , then the sample size n is determined such that

2 Z α Var ( y ) ≤ W
2

assuming σ 2 is known and population is normally distributed. This can be expressed as

N −n
2Z α S ≤W
2 Nn

1 1 
or 4Z α2  −  S 2 ≤ W 2
2 n N
20
1 1 W2
or ≤ +
n N 4 Z α2 S 2
2

4 Z α2 S 2
2

or n ≥ W2 .
4 Z α2 S 2
1+ 2
NW 2
The minimum sample size required is
4 Z α2 S 2
2

nsmallest = W2
4 Z α2 S 2
1+ 2
NW 2
If N is large then
4Z α2 S 2
n≥ 2

W2
and the minimum sample size needed is
4Z α2 S 2
nsmallest = 2
.
W2

4. Prespecified coefficient of variation


The coefficient of variation (CV) is defined as the ratio of standard error (or standard deviation)
and mean. The know ledge of coefficient of variation has played an important role in the sampling
theory as this information has helped in deriving efficient estimators.

If it is desired that the the coefficient of variation of y should not exceed a given or prespecified
value of coefficient of variation, say C0 , then the required sample size n is to be determined such
that
CV ( y ) ≤ C0

Var ( y )
or ≤ C0
Y

21
N −n 2
S
or Nn 2 ≤ C02
Y
1 1 C02
or − ≤
n N C2
C2
Co2
or n ≥
C2
1+
NC02
S
is the required sample size where C = is the population coefficient of variation.
Y
The smallest sample size needed in this case is
C2
C02
nsmallest = .
C2
1+
NC02
If N is large, then
C2
n≥
C02
C2
and nsmalest = 2
C0

5. Prespecified relative error


When y is used for estimating the population mean Y , then the relative estimation error is defined

y −Y
as . If it is required that such relative estimation error should not exceed a prespecified value
Y
R with probability (1 − α ) , then such requirement can be satisfied by expressing it like such
requirement can be satisfied by expressing it like
 y −Y RY 
P ≤ =1−α.
 Var ( y ) Var ( y ) 

 N −n 2
Assuming the population to be normally distributed, y follows N  Y , S .
 Nn 

22
So it can be written that
RY
= Zα .
Var ( y ) 2

 N −n 2
or Z α2  S = R Y
2 2

2  Nn 

1 1  R2
or  −  =
 n N  C Zα
2 2

2
 Zα C 
 2 
 R 
 
or n =  
2
 Zα C 
1  
1+  2 
N R 
 
S
where C = is the population coefficient of variation and should be known.
Y
If N is large, then
2
 zα C 
n= 2  .
 R 
 
 

6. Prespecified cost
Let an amount of money C is being designated for sample survey to called n observations, C0 be

the overhead cost and C1 be the cost of collection of one unit in the sample. Then the total cost C
can be expressed as
= C0 + nC1
C

C − C0
Or n =
C1
is the required sample size.

23
Chapter 3
Sampling For Proportions and Percentages
In many situations, the characteristic under study on which the observations are collected are
qualitative in nature. For example, the responses of customers in many marketing surveys are
based on replies like ‘yes’ or ‘no’ , ‘agree’ or ‘disagree’ etc. Sometimes the respondents are
asked to arrange several options in the order like first choice, second choice etc. Sometimes
the objective of the survey is to estimate the proportion or the percentage of brown eyed
persons, unemployed persons, graduate persons or persons favoring a proposal, etc. In such
situations, the first question arises how to do the sampling and secondly how to estimate the
population parameters like population mean, population variance, etc.

Sampling procedure:
The same sampling procedures that are used for drawing a sample in case of quantitative
characteristics can also be used for drawing a sample for qualitative characteristic. So, the
sampling procedures remain same irrespective of the nature of characteristic under study -
either qualitative or quantitative. For example, the SRSWOR and SRSWR procedures for
drawing the samples remain the same for qualitative and quantitative characteristics. Similarly,
other sampling schemes like stratified sampling, two stage sampling etc. also remain same.

Estimation of population proportion:


The population proportion in case of qualitative characteristic can be estimated in a similar
way as the estimation of population mean in case of quantitative characteristic.

Consider a qualitative characteristic based on which the population can be divided into two
mutually exclusive classes, say C and C*. For example, if C is the part of population of
persons saying ‘yes’ or ‘agreeing’ with the proposal then C* is the part of population of persons
saying ‘no’ or ‘disagreeing’ with the proposal. Let A be the number of units in C and (N - A)
units in C* be in a population of size N. Then the proportion of units in C is
A
P=
N
and the proportion of units in C* is
N−A
Q= = 1 − P.
N

1
An indicator variable Y can be associated with the characteristic under study and then for i =
1,2,..,N
1 i th unit belongs to C
Yi = 
0 i th unit belongs to C *.

Now the population total is


N
=
YTOTAL ∑=
Y
i =1
i A

and population mean is


N

∑Y i
A
Y= i =1
= = P.
N N

Suppose a sample of size n is drawn from a population of size N by simple random sampling .

Let a be the number of units in the sample which fall into class C and (n − a ) units fall in class
C*, then the sample proportion of units in C is
a
p= .
n
which can be written as
n

a ∑y i
p= = i =1
= y.
n n
N
Since ∑Y =
i =1
i
2
A= NP, so we can write S 2 and s 2 in terms of P and Q as follows:

1 N
=S 2

N − 1 i =1
(Yi − Y ) 2
N
1
= (∑ Yi 2 − NY 2 )
N − 1 i =1
1
= ( NP − NP 2 )
N −1
N
= PQ.
N −1
n
Similarly, ∑y=
i =1
2
i a= np and

2
1 n
=s2 ∑ ( yi − y )2
n − 1 i =1
n
1
= (∑ yi2 − ny 2 )
n − 1 i =1
1
= (np − np 2 )
n −1
n
= pq.
n −1
Note that the quantities y , Y , s 2 and S 2 have been expressed as functions of sample and
population proportions. Since the sample has been drawn by simple random sampling and
sample proportion is same as the sample mean, so the properties of sample proportion in
SRSWOR and SRSWR can be derived using the properties of sample mean directly.

1. SRSWOR
Since sample mean y an unbiased estimator of population mean Y , i.e. E ( y ) = Y in case of
SRSWOR, so
E ( p=
) E( y=
) Y= P
and p is an unbiased estimator of P.

Using the expression of Var ( y ), the variance of p can be derived as

N −n 2
=
Var =
( p ) Var (y) S
Nn
N −n N
= . PQ
Nn N − 1
N − n PQ
= . .
N −1 n
Similarly, using the estimate of Var ( y ), the estimate of variance can be derived as

  N −n 2
=
Var =
( p ) Var ( y) s
Nn
N −n n
= pq
Nn n − 1
N −n
= pq.
N (n − 1)
(ii) SRSWR
Since the sample mean y is an unbiased estimator of population mean Y in case of SRSWR,
so the sample proportion,

3
E ( p=
) E( y=
) Y= P,
i.e., p is an unbiased estimator of P.
Using the expression of variance of y and its estimate in case of SRSWR, the variance of p and
its estimate can be derived as follows:
N −1 2
=
Var =
( p ) Var ( y) S
Nn
N −1 N
= PQ
Nn N − 1
PQ
=
n
 ( p ) = n . pq
Var
n −1 n
pq
= .
n −1

Estimation of population total or total number of count


It is easy to see that an estimate of population total A (or total number of count ) is
Na
= =
Aˆ Np ,
n
its variance is
Var ( Aˆ ) = N 2 Var ( p )
and the estimate of variance is
 ( Aˆ ) = N 2 Var
Var  ( p ).

Confidence interval estimation of P


p−P
If N and n are large then approximately follows N(0,1). With this approximation, we
Var ( p )
can write
 p−P 
P −Z α ≤ 1 α
≤ Z α  =−
 2 Var ( p ) 2 

and the 100(1 − α )% confidence interval of P is

 
 p − Z α Var ( p ), p + Z α Var ( p )  .
 2 2 

4
It may be noted that in this case, a discrete random variable is being approximated by a
continuous random variable, so a continuity correction n/2 can be introduced in the confidence
limits and the limits become
 n n
 p − Z α Var ( p ) + , p + Z α Var ( p ) − 
 2
2 2
2

Use of Hypergeometric distribution :


When SRS is applied for the sampling of a qualitative characteristic, the methodology is to
draw the units one-by-one and so the probability of selection of every unit remains the same at
every step. If n sampling units are selected together from N units, then the probability of
selection of units does not remains the same as in the case of SRS.

Consider a situation in which the sampling units in a population are divided into two mutually
exclusive classes. Let P and Q be the proportions of sampling units in the population belonging
to classes ‘1’ and ‘2’ respectively. Then NP and NQ are the total number of sampling units in
the population belonging to class ‘1’ and ‘2’, respectively and so NP + NQ = N. The
probability that in a sample of n selected units out of N units by SRS such that n1 selected

units belongs to class ‘1’ and n2 selected units belongs to class ‘2’ is governed by the
hypergeometric distribution and
 NP  NQ 
  
P(n1 ) =  n1  n2 
.
N
 
n
As N grows large, the hypergeometric distribution tends to Binomial distribution and P(n1 ) is
approximated by
n
=
P (n1 )   p n1 (1 − p ) n2
 n1 

Inverse sampling
In general, it is understood in the SRS methodology for qualitative characteristic that the
attribute under study is not a rare attribute. If the attribute is rare, then the procedure of
estimating the population proportion P by sample proportion n / N is not suitable. Some such
situations are, e.g., estimation of frequency of rare type of genes, proportion of some rare type

5
of cancer cells in a biopsy, proportion of rare type of blood cells affecting the red blood cells
etc. In such cases, the methodology of inverse sampling can be used.

In the methodology of inverse sampling, the sampling is continued until a predetermined


number of units possessing the attribute under study occur in the sampling which is useful for
estimating the population proportion. The sampling units are drawn one-by-one with equal
probability and without replacement. The sampling is discontinued as soon as the number of
units in the sample possessing the characteristic or attribute equals a predetermined number.

Let m denotes the predetermined number indicating the number of units possessing the
characteristic. The sampling is continued till m number of units are obtained. Therefore, the
sample size n required to attain m becomes a random variable.

Probability distribution function of n


In order to find the probability distribution function of n, consider the stage of drawing of
samples t such that at t = n, the sample size n completes the m units with attribute. Thus the first
(t - 1) draws would contain (m - 1) units in the sample possessing the characteristic out of NP
units. Equivalently, there are (t - m) units which do not possess the characteristic out of NQ
such units in the population. Note that the last draw must ensure that the units selected possess
the characteristic.

So the probability distribution function of n can be expressed as

 In a sample of (n -1) units   The unit drawn at 


   
P ( n) P  drawn from N , (m -1) units  × P  the nth draw will 
 will possess the attribute   possess the attribute 
   
  NP  NQ  
  
  m − 1 n − m    NP − m + 1 
= , n =m, m + 1,..., m + NQ.
  N    N − n + 1 
   
  n − 1 
Note that the first term (in square brackets) is derived using hypergeometric distribution as the
probability for deriving a sample of size (n – 1) in which (m – 1) units are from NP units and
NP − m + 1
(n – m) units are from NQ units. The second term is the probability associated
N − n +1
with the last draw where it is assumed that we get the unit possessing the characteristic.
m + NQ
Note that ∑
n=m
P(n) = 1.

6
Estimate of population proportion
m −1
Consider the expectation of .
n −1
m + NQ
 m −1   m −1 
E =
 n −1 
∑  n − 1  P(n)
n=m

 NP   NQ 
  
 m − 1   m − 1  n − m  Np − m + 1
m + NQ
= ∑
n=m

 n −1 

 N 
.
N − n +1
 
 n − 1
 NP − 1  NQ 
  
 NP − m + 1   m − 2   n − m 
m + NQ −1
= ∑
n=m

 N − n +1 

 N − 1
 
 n−2
which is obtained by replacing NP by NP – 1, m by (m – 1) and n by (n - 1) in the earlier
step. Thus
 m −1 
E  = P.
 n −1 
m −1
So Pˆ = is an unbiased estimator of P.
n −1

Estimate of variance of P̂
Now we derive an estimate of variance of P̂ . By definition
2
=
Var ( Pˆ ) E ( Pˆ 2 ) −  E ( Pˆ ) 

=E ( Pˆ 2 ) − P 2 .

Thus
 ( Pˆ=
Var ) Pˆ 2 − Estimate of P 2 .
(m − 1)(m − 2)
In order to obtain an estimate of P 2 , consider the expectation of , i.e.,
(n − 1)(n − 2)

 (m − 1)(m − 2)   (m − 1)(m − 2) 
E  = ∑ P ( n )
 (n − 1)(n − 2)  n≥ m  (n − 1)(n − 2) 
  NP − 2   NQ  
  
P( NP − 1)  NP − m + 1    m − 3   n − m  
= ∑ 
N − 1 n≥m  N − n + 1    N − 2 
   
  n−3  

7
where the last term inside the square bracket is obtained by replacing NP by ( NP − 2), N by
(n − 2) and m by (m - 2) in the probability distribution function of hypergeometric distribution.
This solves further to
 (m − 1)(m − 2)  NP 2 P
E =  − .
 (n − 1)(n − 2)  N − 1 N − 1
Thus an unbiased estimate of P 2 is

 N − 1  (m − 1)(m − 2) Pˆ
=
Estimate of P 2   +
 N  (n − 1)(n − 2) N
 N − 1  (m − 1)(m − 2) 1 m −1
=   + . .
 N  (n − 1)(n − 2) N n −1

Finally, an estimate of variance of P̂ is


 ( Pˆ=
Var ) Pˆ 2 − Estimate of P 2
 m − 1   N − 1 (m − 1)(m − 2) 1  m − 1  
2

=  − . +  
 n − 1   N (n − 1)(n − 2) N  n − 1  
 m − 1   m − 1  1  ( N − 1)(m − 2)  
=    + 1 −  .
 n − 1   n − 1  N  n−2 

For large N , the hypergeometric distribution tends to negative Binomial distribution with

 n − 1  m n−m
probability density function   P Q . So
 m − 1
m −1
Pˆ =
n −1
and

m − 1)(n − m) Pˆ (1 − Pˆ )
 ( Pˆ ) (=
=
Var .
(n − 1) 2 (n − 2) n−2

8
Estimation of proportion for more than two classes
We have assumed up to now that there are only two classes in which the population can be
divided based on a qualitative characteristic. There can be situations when the population is to
be divided into more than two classes. For example, the taste of a coffee can be divided into
four categories very strong, strong, mild and very mild. Similarly in another example the
damage to crop due to storm can be classified into categories like heavily damaged, damaged,
minor damage and no damage etc.

These type of situations can be represented by dividing the population of size N into, say k,
mutually exclusive classes C1 , C2 ,..., Ck . Corresponding to these classes, let

C1 C2 Ck
=P1 = , P2 =
,..., Pk , be the proportions of units in the classes C1 , C2 ,..., Ck
N N n
respectively.

Let a sample of size n is observed such that c1 , c2 ,..., ck number of units have been drawn from

C1 , C2 ,..., Ck . respectively. Then the probability of observing c1 , c2 ,..., ck is

 C1  C2   Ck 
   ...  
P (c1 , c2 ,..., ck ) =  1  2   k  .
c c c
N
 
n
ci
The population proportions Pi can be estimated by =
pi = , i 1, 2,..., k .
n
It can be easily shown that
E=
( pi ) P=
i, i 1, 2,..., k ,
N − n PQ
Var ( pi ) = i i

N −1 n
and
 ( p ) = N − n pi qi
Var
N n −1
i

For estimating the number of units in the ith class,


Cˆ i = Npi
Var (Cˆ ) = N 2Var ( p )
i i

and
 (Cˆ ) = N 2 Var
Var  ( p ).
i i

9
The confidence intervals can be obtained based on single pi as in the case of two classes.

If N is large, then the probability of observing c1 , c2 ,..., ck can be approximated by multinomial


distribution given by
n!
P(c1 , c2 ,..., ck ) = P1c1 P2c2 ...Pkck .
c1 !c2 !...ck !
For this distribution
E=
( pi ) P=
i, i 1, 2,.., k ,
Pi (1 − Pi )
Var ( pi ) =
n
and
 ( pˆ ) = pi (1 − pi ) .
Var i
n

10
Chapter 4
Stratified Sampling
An important objective in any estimation problems is to obtain an estimator of a population parameter
which can take care of all salient features of the population. If the population is homogeneous with
respect to the characteristic under study, then the method of simple random sampling will yield a
homogeneous sample and sample mean will serve as a good estimator of population mean. Thus if the
population is homogeneous with respect to the characteristic under study, then the sample drawn
through simple random sampling is expected to provide a representative sample. Moreover, the
variance of sample mean not only depends on the sample size and sampling fraction but also on the
population variance. In order to increase the precision of an estimator is to use a sampling scheme
which reduces the heterogeneity in the population. If the population is heterogeneous with respect to
the characteristic under study, then one such sampling procedure is stratified sampling.

The basic idea behind the stratified sampling is to


 divide the whole heterogeneous population into smaller groups or subpopulations, such that the
sampling units are homogeneous with respect to the characteristic under study within the
subpopulation and
 heterogeneous with respect to the characteristic under study between/among the subpopulation.
Such subpopulations are termed as strata.
 treat each subpopulation as separate population and draw a sample by SRS from each stratum.
[Note: Stratum is singular and strata is plural].

Example: In order to find the average height of students in a school of class 1 to class 12, the height
varies a lot as the students in class 1 are of age around 6 years and students in class 10 one of age
around 16 years. So one can divide all the students into different subpopulations or strata such as

Students of class 1, 2 and 3: Stratum 1


Students of class 4, 5 and 6: Stratum 2
Students of class 7, 8 and 9: Stratum 3
Students of class 10, 11 and 12: Stratum 4

Notations:
1
We use the following symbols and notations:
N : Population size
k : Number of strata
N1 : Number of sampling units in ith strata
k
N   Ni
i 1

ni : Numbers of sampling units to be drawn from ith stratum.


k
n   ni : Total sample size
i 1

SAURABH, TYPE FLOW CHART PAGE 5

Procedure of stratified sampling


Divide the population of N units into k strata. Let the ith stratum has N1 , i  1, 2,..., k number of units.

 Stratam are constructed such that they are nonoverlapping and homogeneous with respect to the
k
characteristic under study such that N
i 1
i  N.

 Draw a sample of size ni from ith ( i  1, 2,..., k ) stratum using SRS (preferably WOR)
independently from each stratum.
 All the sampling units drawn from each stratum will constitute a stratified sample of size
k
n   ni .
i 1

Difference between stratified and cluster sampling schemes


In stratified sampling, the strata are constructed such that they are
 within homogeneous and
 among heterogeneous

In cluster sampling, the clusters are constructed such that they are
 within heterogeneous and
 among homogeneous.
[Note: We consider cluster sampling later]
2
Issue in estimation in stratified sampling
Note that there are k independent samples drawn through SRS of sizes n1 , n2 ,..., nk . So one can have k

estimators of a parameter based sizes n1 , n2 ,..., nk . The ultimate goal is not to have k different
estimators of the parameters but a single estimator. In this case, the issue is how to combine the
different sample information together into one estimator which is good enough to provide the
information about the parameter.

Estimation of population mean and its variance


Let
Y : characteristic under study
yij : value of jth unit in ith stratum j = 1,2,…,n i , i = 1,2,...,k.
Ni
1
Yi 
Ni
y
j 1
ij : population mean of ith stratum

ni
1
yi 
ni
y
j 1
ij : sample mean from ith stratum or stratum mean.

1 k k
Y  ii 
N i 1
N Y 
i 1
wY
i i : population mean

Note that the population mean is defined as the weighted arithmetic mean of stratum means in case of
stratified sampling where the weights are provided in terms of strata sizes.
1 k
Based on the expression Y   NiYi , one may choose the sample mean
N i 1
1 k
y  ni yi ,
n i 1

as a possible estimator of Y .

Since the sample in each stratum is drawn by SRS, so


E ( yi )  Yi ,
thus

3
1 k
E( y )   ni E ( yi )
n i 1
1 k
  ni Yi
n i 1
k
1

N
n Y
i 1
i i

Y

and y turns out to be a biased estimator of Y . Based on this, one can modify y so as to obtain an

unbiased estimator of Y . Consider the stratum mean which is defined as the weight arithmetic mean of
strata sample means with strata sizes as weights.
1 k
y st   Ni yi .
N i 1
Now
k
1
E ( yst ) 
N
 N E( y )
i 1
i i

k
1

N
N Y
i 1
i i

Y

Thus y st is an unbiased estimator of Y .

Variance of yst
k k k
Var ( yst )   w Var ( yi )   wi w j Cov( yi , y j )
2
i
i 1 i j

Since all the samples have been drawn independently from each strata by SRSWOR so
Cov( yi , y j )  0
Ni  ni 2
Var ( yi )  Si
Ni ni

4
where
1 Ni
Si2  
N i  1 j 1
(Yij  Y i ) 2

Thus
k
N i  ni 2
Var ( yst )   wi2 Si
i 1 N i ni
k
 n  Si2
  w 1  i 2
i  .
i 1  Ni  ni
Observe that Var ( yst ) is small when Si2 is small. This observation suggest how to construct the

strata . If Si2 is small for all I = 1,2,...,k, the Var ( yst ) will also be small . That is why it was mentioned

earlier that the strata are to be constructed such that they are within homogeneous, i.e., Si2 is small and
among heterogeneous.

For example, the units in geographical proximity will tend to be more close. The consumption pattern
in households will be similar within a lower income group housing society and within a higher income
group housing society whereas they will differ a lot between the two housing societies based on
income.

Estimate of Variance
Since the samples have been drawn by SRSWOR, so
E ( si2 )  Si2
1 ni
where si2  
ni  1 j 1
( yij  yi ) 2

N i  ni 2
and Var ( yi )  si
wi ni
k
so Var ( yst )   wi2 Var ( yi )
j 1

 N n 
  wi2  i i  si2
i 1  N i ni 
Note: If SRSWR is used instead of SRSWOR for drawing the samples from stratum, then appropriate
changes can be made at required steps.

In this case
5
k
yst   wi yi
i 1

E ( yst )  Y
k
 N 1  k
2
Var ( yst )   wi2  i  Si2   wi2 i
i 1  N i ni  1 ni
k
w2 s 2
Var ( yst )   i i
i 1 ni
ni
1
where  i2 
ni
(y
j 1
ij  yi ) 2 .

Advantages of stratified sampling


1. Data of known precision may be required for certain parts of the population.
This can be accomplished with a more careful investigation to few strata.
Example: In order to know the direct impact of hike is petrol prices, the population can be
divided into strata like lower income group, middle income group and higher income group.
Obviously, the higher income group is more affected than the lower income group. So more
careful investigation can be made only in the higher income group strata.
2. Sampling problems may differ in different parts of the population.
Example: To study the consumption pattern of households, the people living is houses, hotels,
hospitals, prison etc. are to be treated differently.
3. Administrative convenience can be exercised in stratified sampling.
Example: In taking a sample of villages from a big state, it a more administratively convenient
to consider the districts as strata so that the administrative setup at district level may be used for
this purpose.
4. Full cross-section of population can be obtained through stratified sampling. It may be possible
is SRS that some large part of the population may remain unrepresented. Stratified sampling
enables one to draw a sample representing different segments of the population to any desired
extent. The desired degree of representation of some specified parts of population is also
possible.
5. Substantial gain in efficiency is achieved of strata are formed intelligently.
6. In case of skewed population, use of stratification is of importance since larger weight may
have to be given for the few extremely large units for reducing the sampling variability.

6
7. If population is large, then it is convenient to sample separately from the strata rather then the
entire population.
8. The population mean or population total can be estimated with higher precision by suitably
providing the weights to the estimates obtained from each stratum.

Allocation problem and choice of sample sizes is different strata


Question: How to choose the sample sizes n1 , n2 ,..., nk so that the available resources are used in an
effective way?
There are two aspects of choosing the sample sizes:
(i) Minimize the cost of survey for a specified precision.
(ii) Maximize the precision for a given cost.

Note: The sample size can not be determined by minimizing both the cost and variability
simultaneously. The cost function is directly proportional to the sample size whereas variability is
inversely proportional to the sample size.

1. Equal allocation
Choose the sample size ni to be same for all strata.
Draw sample of equal size from each strata.
Let n be the sample size and k be the number of strata.
n
ni  for all i  1, 2,..., k.
k

2. Proportional allocation
For fixed k, select ni such that it is proportional to stratum size N i , i.e.,

ni  Ni
or ni  CNi
Where C is constant of proportionality.

7
k k

 ni   CNi
i 1 i 1

or n  CN
n
C 
N
n
Thus ni    N i
N
Such allocation arises from the considerations like operational convenience..

3. Neyman or optimum allocation


This allocation considers the size of strata and variability both
ni  Ni Si
ni  C * Ni Si
where C* is the constant of proportionality.
k k

 ni   C * Ni Si
i 1 i 1
k
or n  C *  N i Si
i 1

n
or C *  k

N S
i 1
i i

nN i Si
Thus ni  k

N Si 1
i i

This allocation arises when the Var  y st  is minimized subject to the constraint
k

n
i 1
i (prespecified).

The knowledge of Si (i  1, 2,..., k ) is needed to know ni .

Choice of sample size based on cost of survey and variability


The cost of survey depends upon the nature of survey. A simple choice of cost function is
k
C  C0   Ci ni
i 1

where
C : total cost

8
C0 : overhead cost, e.g., setting up of office, training people etc

Ci : cost per unit in the ith stratum


k

 C n : total cost within sample.


i 1
i i

To find ni under this cost function, consider the Lagrangian function with Lagrangian

multiplier  as
  Var ( yst )   2 (C  C0 )
k
1 1  k
  wi2    Si2   2  Ci ni
i 1  ni N i  i 1
2 2
k
wi Si k k
wi2 Si2
    Ci ni  
2

i 1 ni i 1 i 1 Ni
2
k w S 
   i i   Ci ni   terms independent of ni .
i 1 
 ni 

Thus  is minimum when


wi Si
  Ci ni for all i.
ni
1 wi Si
or ni  .  *
 Ci

How to determine  ?
There are two ways to determine  .
(i) Minimize variability for fixed cost and
(ii) Minimize cost for given variability.
We consider both the cases
(i) Minimize variability for fixed cost
Let C  C0* be fixed.

9
k
so C ni 1
i i C0*
k
wi Si
So or C
i 1
i
 Ci
 C0*

 Ci wi Si
or   i 1

C0*

Substituting  in the expression for ni , the optimum ni is obtained as

 
wS  C0* 
ni*  i i  k .
Ci  
  Ci wi Si 
 i 1 

The required sample size to estimate Y such that the variance is minimum for given cost C  C0* is
k
n   ni*
i 1

(ii) Minimize cost for given variability


Let V  V0 be prespecified variance. Now determine ni such that

1  2 2
k
1
i 1  i
 n  wi Si  V0
Ni 

k
w2 S 2 k
w2 S 2
or  i i  V0   i i
i 1 ni i 1 Ni
k
 Ci k
wi2 Si2
or  WS
i 1
wi2 Si2  V0  
i 1 Ni
i i

1 wi Si
(Substituting ni  from equation (*)).
 Ci
k
wi2 Si2
V0  
Ni
or   k
i 1
.
w S
i 1
i i Ci

Thus the optimum ni is

10
 k 2 2 
wi Si  
 wi Si Ci 
ni  i 1 ..
Ci  k
wi2 Si2 
 0  N
V  
 i 1 i 
So the required sample size to estimate Y such that cost C is minimum for a
k
prespecified variance V0 is n   ni .
i 1

Sample size under proportional allocation


k
(i) If cost C  C0 is fixed then C0  C n .
i 1
i i

n
Under proportional allocation, ni  Ni  nwi
N
k
So C0  n  wi Ci
i 1

C0
or n  k .
wC
i 1
i i

Co wi
Thus ni  .
 wiCi

(ii) If variance = V0 is fixed, then

11
k
1 1  2 2
 n
i 1  i
  wi Si  V0
Ni 
k
w2 S 2 k
w2 S 2
or  i i  V0   i i
i 1 ni i 1 Ni
k
wi2 Si2 k
wi2 Si2
or 
i 1 nWi
 V0  
i 1 Ni
(using ni  nwi )
k

w S 2
i i
2

or n  i 1
k
wi2 Si2
V0  
i 1 Ni
k

w S i i
2

or ni  wi i 1
k
wi2 Si2
V0  
i 1 Ni
This is known Bowley’s allocation.

Variances under different allocation

Now we derive the variance of yst under proportional and optimum allocations.

(i) Proportional allocation


Under proportional allocation
n
ni  Ni
N
k
 N n 
Var ( y ) st    i i wi2 Si2
i 1  N i ni 

 n 
k  Ni  Ni  2
 Ni  2
Varprop ( y ) st    N
   Si
Ni Ni   N 
i 1 
n
 N 
N n k
N i Si2
 
N n i 1 N
N n k
  wi Si2
Nn i 1

12
(ii) Optimum allocation
nN i Si
ni  k

N S
i 1
i i

k
1 1 
Vopt ( yst )      wi2 Si2
i 1  ni Ni 
k
w2 S 2 k w2 S 2
  i i  i i
i 1 ni i 1 Ni
  k 
k    N i Si   k w2 S 2
   wi Si  i 1
2 2
   i i
i 1   nN i Si   i 1 N i
  
  
k
 1 N i Si  k   k wi2 Si2
   . 2   N i Si    
i 1  n N  i 1   i 1 N i
2
1  k N i Si  k
wi2 Si2
  
n  i 1 N  i 1 N i
2
1 k  1 k
   wi Si   w S 2
i i
2
.
n  i 1  N i 1

Comparison of variance of sample mean under SRS with stratified mean


under proportional and optimal allocation:
(a.) Proportional allocation:
N n 2
VSRS ( y )  S
Nn
N  n k Ni Si2
VPr op ( yst )   .
Nn i 1 N

In order to compare VSRS ( y ) and Vprop ( yst ), first we attempt to express S 2 as a function of Si2 .

Consider

13
k Ni
( N  1) S 2    (Y ij  Y )2
i 1 j 1

k Ni 2

  (Y ij  Y )  (Yi  Y ) 
i 1 j 1
k Ni k Ni
  (Yij  Y )2    (Y  Y ) i
2

i 1 j 1 i 1 j 1
k k
  ( N i  1) Si2   N (Y  Y ) i i
2

i 1 i 1

N 1 2 N 1 2 k k
Ni
S  i Si   (Yi  Y ) 2 .
N i 1 N i 1 N

For simplification, we assume that N i is large enough to permit the approximation

Ni  1 N 1
 1 and 1.
Ni N
Thus
k
Ni 2 k Ni
S2   Si   (Yi  Y ) 2
i 1 N i 1 N

N n 2 N n k
Ni 2 N  n k
Ni
or
Nn
S 
Nn

i 1 N
Si 
Nn

i 1 N
(Yi  Y ) 2

N n k
VarSRS (Y )  V prop ( y st ) 
Nn
 w (Y  Y )
i 1
i i
2

k
Since  w (Y  Y )
i 1
i i
2
 0,

 Varprop ( yst )  VarSRS ( y ).

Larger gain in the difference is achieved when Yi differs from Y more.

(b.) Optimum allocation


2
1 k  1 k
Vopt ( yst )    wi Si    wi Si2 .
n  i 1  N i 1
Consider

14
 N  n  k 2
1  k 
2
1 k 
V prop ( yst )  Vopt ( yst )     wi Si      wi Si   wS 
2
i i
 Nn  i 1   n  i 1  N i 1 
1 k  
2
 k
   wi Si    wi Si  
2

n  i 1  i 1  
1 k 1 
  
n  i 1
wi Si2  S 2 
n 
1

n
 wi ( Si  S ) 2

where
k
S   wi Si
i 1

 Varprop ( yst )  Varopt ( yst )  0.

Larger gain in the difference is achieved when Si differ from S more,


Combining (a) and (b), we have
Varopt ( yst )  Varprop ( yst )  VarSRS ( y )

Estimate of variance and confidence intervals


Under SRSWOR, an unbiased estimate of Si 2 for the ith stratum (i = 1,2,...,k) is

1 ni
si2  
ni  1 j 1
( yij  yi )2 .

In stratified sampling,
k
Ni  ni 2
Var ( yst )   wi2 Si .
i 1 Ni ni

So an unbiased estimate of Var ( yst ) is


k
N i  ni 2
Var ( yst )   wi2 si
i 1 N i ni
k
wi2 si2 k wi2 si2
 
i 1 ni

i 1 N i
k
wi2 si2 1 k
 
i 1 ni

N
ws
i 1
2
i i

The second term represents the reduction due to finite population correction.

15
The confidence limits of Y can be obtained as

yst  t Var ( yst )

Assuming y st is normally distributed and Var ( yst ) is well determined so that t can be read from

normal distribution tables. If only few degrees of freedom are provided by each stratum, then t values
are obtained from the table of student’s t-distribution.

The distribution of Var ( yst ) is generally complex. An approximate method of assigning an effective

number of degrees of freedom (ne ) to Var ( y st ) is


2
 k 2
  gi si 
ne   i k1 2 4
gi si

i 1 ni  1

Ni ( Ni  ni )
where gi 
ni
k
and Min(ni  1)  ne   (n  1)
i 1
i

assuming yij are normal.

Modification of optimal allocation


Sometimes is optimal allocation, the size of subsample exceeds the stratum size. In such a case,
replace ni by N i

and recompute the rest of ni ' s by the revised allocation.

For example, if ni  N1 , then take the revised ni ' s as

n1  N1

(n  N1 ) wi Si
and ni  k
; i  1, 2,3,..., k .
w S
i 2
i i

provided ni  Ni for all i = 2,3,…,k.

Suppose in revised allocation, n2  N 2 then the re-revised allocation would be


16
n1  N1
n2  N 2
(n  N1  N 2 ) wi Si
ni  k
; i  3, 4...k .
wS
i 3
i i

provided ni  Ni for all i  3, 4,..., k.

We continue this process until every ni  Ni .

In such cases, the formula for minimum variance of yst need to be modified as

( * wi Si )2  *
wi Si2
Min Var ( y st )  
n* N
where  *
denotes the summation over the strata in which ni  Ni and n* is the revised total sample

size in the strata.

Stratified sampling for proportions


If the characteristic under study is qualitative is nature, then its values will fall into one of the two
mutually exclusive complementary class C and C’ . Ideally, only two strata are needed in which all the
units can be divided depending on whether they belong to C or its complement C’. It is difficult to
achieve in practice. So the strata are constructed such that the proportion in C varies as much as
possible among strata.
Let
Ai
Pi  : Proportion of units in C in ith stratum
Ni
ai
pi  : Proportion of units in C in ith sample.
ni
As estimate of population proportion based on stratified sampling is
k
Ni pi
pst   .
i 1 N
which is based on

17
1 when j th unit belongs to i th stratum is in C
Yij  
0 otherwise
and yst  pst .
Ni
Here Si2  Pi Qi
Ni  1

where Qi  1  Pi .
k
Ni  ni 2 2
Also Var ( y st )   wi Si .
i 1 Ni ni

1 k Ni2 ( Ni  ni ) PQ
2 
So Var ( pst )  i i
.
N i 1 Ni  1 ni
If the finite population correction can be ignored, then
k
PQ
Var ( pst )   wi2 i i
.
i 1 ni

If proportional allocation is used for ni , then variance of pst is

N  n 1 k Ni2 PQ
Var ( pst ) prop   i i
N Nn i 1 N i  1
N n k
  wi PQ
N i 1
i i

and its estimate is


N n k pq
Var ( pst ) prop   wi i i .
N i 1 ni  1

The best choice of ni such that it minimizes the variance for fixed total sample size is

Ni PQ
ni  N i i i

Ni  1
 N i PQ
i i

Ni PQ
Thus ni  n k
i i
.
N
i 1
i PQ
i i

k
Similarly, the best choice of ni such that the variance is minimum for fixed cost C  C0   Ci ni is
i 1

18
PQ
i i
nN i
Ci
ni  k
.
PQ
N
i 1
i
i i
Ci

Estimation of the gain in precision due to stratification


What is the advantage of stratifying a population in the sense that instead of using SRS, the population
is divided into various strata is the question of interest. This is answered by estimating the variance of
estimators of population mean under SRS (without stratification) and stratified sampling by evaluating

Var SRS ( y )  Var ( y st )


.
Var ( y st )
N n 2
Since VarSRS ( y )  S .
Nn
How to estimate S 2 based on a stratified sample?
k Ni
( N  1) S 2   (Yij  Y ) 2
i 1 j 1

k Ni 2

  (Yij  Y )  (Yi  Y ) 
i 1 j 1
k Ni k
  (Yij  Y ) 2   N i (Yi  Y ) 2
i 1 j 1 i 1
k k
  ( N i  1) Si2   N i (Yi  Y ) 2
i 1 i 1
k
 k 
  ( N i  1) Si2  N   WiYi 2  Y 2 .
i 1  i 1 

In order to estimate S 2 , we need to estimates of Si2 , Yi 2 and Y 2 .

For estimate of Si2 , we have

E ( si2 )  Si2

So Sˆi2  si2 .

For estimate of Yi 2 , we know

19
Var ( yi )  E ( yi 2 )  [ E ( yi )]2
 E ( yi 2 )  Yi 2
 Yi 2  E ( yi 2 )  Var yi

An unbiased estimate of Yi 2 is

Yˆi 2  yi 2  Var ( yi )
 N n  2
 yi 2   i i  si
 Ni ni 

For example of Y 2 , we know

Var ( yst )  E ( yst2 )  [ E ( yst )]2


 E ( yst2 )  Y 2
 Y 2  E ( yst2 )  Var ( yst )

So an estimate of Y 2 is

Yˆ 2  yst2  Var ( yst )


k
 N n  2 2
 yst2    i i wi si
i 1  N i ni 
Substituting these estimates, the estimate of S 2 is obtained from
k
 k 
( N  1) S 2   ( N i  1) Si2  N   wi Yi 2  Y 2 
i 1  i 1 
k
N  k

w iYˆi 2  Yˆ 2 
1
as Sˆ 2  
N  1 i 1
( N i  1) Sˆi2   
N  1  i 1 
1  k N  k  N i  ni 2    k N i  ni 2 2  
    N i  1 si    wi  si      wi si  
2

N  1  i 1 N  1  i 1  N i ni    i 1 N i ni  
1  k 2 N  k 2 k N i  ni 2 
  
N  1  i 1
 N i  1 si     wi  wi (1  wi )
 N  1  i 1
si  .
i 1 N i ni 
Thus
N  n ˆ2
Var SRS ( y )  S
Nn
N n  k 2 N ( N  n)  k 2 k Ni  ni 2 
  
N ( N  1)n  i 1
( N i  1) si    wi  wi (1  wi )
 nN ( N  1)  i 1
si 
i 1 Ni ni 
and

20
k
Ni  ni 2 2
Var ( yst )   wi si .
i 1 Ni ni
Substituting these expressions in

Var SRS ( y )  Var ( yst )


,
Var ( yst )
the gain is efficiency due to stratification can be obtained.
If any other particular allocation is used, then substituting appropriate ni, such gain can be estimated.

Interpenetrating subsamping
Suppose a sample consists of two or more subsamples which are drawn according to the same
sampling scheme. The samples are such that each subsample yields an estimate of parameter. Such
subsamples are called interpenetrating subsamples.

The subsamples need not necessary be independent. The assumption of independent subsamples helps
in obtaining an unbiased estimate of the variance of the composite estimator. This is even helpful if
the sample design is complicated and the expression for variance of the composite estimator is
complex.

Let there be g independent interpenetrating subsamples and t1 , t2 ,..., t g be g unbiased estimators of

parameter  where t j ( j  1, 2,..., g ) is based on jth interpenetrating subsample.

Then an unbiased estimator of  is given by


1 g
ˆ   t j  t , say.
g j 1
Then
E (ˆ)  E ( t )  
and
g
1
Varˆ(ˆ)  Varˆ( t )  
g ( g  1) j 1
(t j  t ) 2 .

Note that

21
1  g 
E Var ( t )   E   (t j   ) 2  g ( t   ) 2 
  g ( g  1)
 j 1 
1  g 
 E   Var (t j )  g Var ( t ) 
g ( g  1)  j 1 
1
 ( g 2  g )Var ( t )
g ( g  1)
 Var ( t )
If distribution of each estimator tj is symmetric as about  , then the confidence interval of  can be
obtained by
g 1
1
P  Min(t1 , t2 ,..., t g )    Max(t1 , t2 ,..., t g )   1    .
2

Implementation of interpenetrating subsamples in stratified samping


Consider the set up of stratified sampling. Suppose each stratum provides an independent
interpenetrating subsample. So based on each stratum, there are L independent interpenetrating
subsamples drawn according to same sampling scheme.

Let Yˆij (tot ) be the unbiased estimator of total of jth stratum based on the ith subsample ,

i = 1,2,...,L; j = 1,2,...,k.

An unbiased estimator of the jth stratum total is given by


1 J
Yˆj (tot )   Yˆij (tot )
L i 1

and an unbiased estimator of the variance of Yˆij (tot ) is given by


L
1
Var (Yˆj (tot ) )  
L( L  1) i 1
(Yˆij (tot )  Yˆj (tot ) )2 .

Thus an unbiased estimator of population total Ytot is


k L k
1
Yˆtot   Yˆj (tot )   Yˆij (tot )
j 1 k i 1 j 1
and unbiased estimator of its variance is
k
Var (Yˆtot )  Var (Yˆj (tot ) )
j 1

22
L k
1
  (Yˆij (tot )  Yˆj (tot ) )2 .
L( L  1) i 1 j 1

Post Stratifications
Sometimes the stratum to which a unit belongs to may be known after the field survey only. For
example, the age of persons, their educational qualifications etc. can not be known in advance. In such
cases, we adopt the post stratification procedure to increase the precision of the estimates.

In post stratification
 draw a sample by simple random sampling from the population and carry out the survey.
 after the completion of survey, stratify the sampling units to increase the precision of the
estimates.
Assume the stratum size N i is fairly accurately known. Let

mi : number of sampling units from ithstratum, i = 1,2,...,k.


k

m
i 1
i  n.

Note that mi is a random variable (and that is why we are not using the symbol ni as earlier).

Assume n is large enough or the stratification is such that the probability that some mi  0 is

negligibly small. In case, mi  0 for some strata, two or more strata can be combined to make the
sample size to be non-zero before evaluating the final estimates.

A post stratified estimator of population mean Y is


1 k
y post   Ni yi .
N i 1
Now

23
1  k 
E   N i E ( yi / m1 , m2 ,..., mk ) 
E ( y post ) 
N  i 1 
1  k

 E   N i yi 
N  i 1 
Y
Var ( y post )  E Var ( y post / m1 , m2 ,..., mk )   Var  E ( y post / m1 , m2 ,..., mk ) 
 k  1 1  
 E   wi2    Si2   Var (Y )
 i 1  mi N i  
k   1   1 
  wi2  E       Si2 (Var (Y )  0).
i 1   mi   Ni  

 1  1
To find E    , proceed as follows :
 mi  Ni
Consider the estimate of ratio based on ratio method of estimation as
n N

y
 yi Y
Y i

Rˆ  
j 1 j 1
n
, R  N
.
x X
x X
j i
j 1 j 1

We know that
N  n RS X2  S XY
E ( Rˆ )  R  . .
Nn X2
1 of j th unit belongs to i th stratum
Let xj  
0 otherwise
y j  1 for all j = 1,2,...,N.

Then R, Rˆ and S x2 reduces to

n
Rˆ 
ni
N
R
Ni
1  Ni 2 
S x2   Ni  
N 1  N 

Using these values in E ( Rˆ )  R, we have

24
n  N N ( N  n)( N  N i )
E ( Rˆ )  R  E    .
 ni  Ni nNi2 ( N  1)
Thus
1  1 N N ( N  n)( N  N i ) 1
E    
 ni  N i nN i n 2
N i
2
( N  1) Ni
( N  n) N  N 1
 1   .
n( N  1) N i  N i n n 

Replacing mi is place of ni , we obtain

 1  1 ( N  n) N  N 1
E   1   
 mi  Ni n( N  1) Ni  Ni n n 

Now substrate this is the expression of Var ( y post ) as

k   1  1 2
Var ( y post )   wi2  E     Si
i 1   mi  Ni 
k  N n N  N 1 
  wi2 Si2  . 1   
i 1  ( N  1) n Ni  nN i n  
N n k 2 2  1  1 1 
 
n( N  1) i 1
wi Si  1   
 wi  nwi n  
N n k 2 2  1
 
n ( N  1) i 1
2
wi Si  n  1  
 wi 
N n k
 
n ( N  1) i 1
2
(nwi  1  wi ) Si2

N n k N n k
 
n ( N  1) i 1
2
w S
i i
2
 
n ( N  1) i 1
2
(1  wi ) Si2

Assuming N  1  N .

N n n 2 2 N n n
V ( y post ) 
Nn i 1
 wi Si  N 2n  i 1
(1  wi ) Si2

N n n

2 
 V prop ( yst )  (1  wi ) Si2
Nn i 1
The second term is the contribution in the variance of y post due to mi ' s not being proportionately

distributed.

If Si2  Sw2 , say for all i, then the last term is


25
N n k N n 2 k

2   w  1)
(1  wi ) Si2  Si (k  1) ( Since i
Nn i 1 Nn 2 i 1

 k  1  N  n  2
   Sw
 n  Nn 
k 1
 Var ( yst ).
n
n
The increase in variance over Varprop ( yst ) is small if the average sample size n  per stratum.
2

Thus a post stratification with a large sample produces an estimator which is almost precise as an
estimator in stratified sampling with proportional allocation.

26
Chapter 5
Ratio and Product Methods of Estimation

An important objective is any statistical estimation procedure is to obtain the estimators of parameters of
interest with more precision. It is also well understood that incorporation of more information in the
estimation procedure yields better estimators, provided the information is valid and proper. Use of such
auxiliary information through made through the ratio method of estimation to obtain an improved
estimator of population mean. In ratio method of estimation, auxiliary information on a variable is
available which is linearly related to the variable under study and is utilized to estimate the population
mean.

Let Y be the variable under study and X be any auxiliary variable which is correlated with Y . The
observation xi on X and yi on Y are obtained for each sampling unit. The population mean X of X

(or equivalently the population total X tot ) must be known. For example, xi ' s may be the values of

yi ' s from .
- some earlier completed census,
- some earlier surveys,
- some characteristic on which it is easy to obtain information etc.

For example, if yi is the quantity of fruits produced in the ith plot, then xi can be the area of ith plot or
the production of fruit in the same plot is previous error.

Let ( x1 , y1 ),( x2 , y2 ),...,( xn , yn ) be the random sample of size n on paired variable (X, Y) drawn,

preferably by SRSWOR, from a population of size N. The ratio estimate of population mean Y is

YˆR  X  RX
y ˆ
x
N
assuming the population mean X is known. The ratio estimator of population total Ytot   Yi is
i 1

y
YˆR (tot )  tot X tot
xtot

1
N n
where X tot   X i is the population total of X which is assumed to be known, ytot   yi and
i 1 i 1

n
xtot   xi are the sample totals of Y and X respectively. The YˆR (tot ) can be equivalently expressed as
i 1

y
YˆR (tot )  X tot
x
ˆ .
 RX tot

Ytot
Looking at the structure of ratio estimators, note that the ratio method estimates the relative change
X tot
yi
that occurred after ( xi , yi ) were observed. It is clear that if the variation among the values of and is
xi
ytot y
nearly same for all i = 1,2,...,n then values of (or equivalently ) varies little from sample to
xtot x
sample and ratio estimate will be of high precision.

Bias and mean squared error of ratio estimator:


Assume that the random sample ( xi , yi ), i  1, 2,..., n is drawn by SRSWOR and population mean X is
known. Then
N
 
n
E (YˆR ) 
1 yi

 N  i 1 xi
X
 
n
 Y (in general).

y  y2 
Moreover it is difficult to find the exact expression for E   and E  2  . So we approximate them and
x x 
proceed as follows:
Let
y Y
0   y  (1   o )Y
Y
xX
1   x  (1  1 ) X .
X
Since SRSWOR is being followed , so

2
E ( 0 )  0
E (1 )  0
1
E ( 02 )  E ( y  Y )2
Y2
1 N n 2
 2 SY
Y Nn
f SY2

n Y2
f
 CY2
n
N n 2 1 N S
where f 
N
, SY  
N  1 i 1
(Yi  Y )2 and CY  Y is the coefficient of variation related to Y.
Y
Similarly,
f 2
E (12 )  CX
n
1
E ( 01 )  E[( x  X )( y  Y )]
XY
1 N n 1 N

XY Nn N  1 i 1
 ( X i  X )(Yi  Y )
1 f
 . S XY
XY n
1 f
  S X SY
XY n
f S S
  X Y
n X Y
f
  C X CY
n
SX
where C X  is the coefficient of variation related to X and  is the correlation coefficient between X
X
and Y.

Writing YˆR in terms of  ' s, we get

YˆR  X
y
x
(1   0 )Y
 X
(1  1 ) X
 (1   0 )(1  1 ) 1Y

3
Assuming 1  1, the term (1  1 )1 may be expanded as an infinite series and it would be convergent.

xX
Such assumption means that  1, i.e., all possible estimate x of population mean X lies
X

between 0 and 2 X , This is likely to hold true if the variation in x is not large. In order to ensures that
variation in x is small, assume that the sample size n it is fairly large. With this assumption,

YˆR  Y (1   0 )(1  1  12  ...)


 Y (1   0  1  12  1 0  ...)

So the estimation error of YˆR is

YˆR  Y  Y ( 0  1  12  1 0  ...).

In case, when sample size is large, then  0 and 1 are likely to be small quantities and so the terms
involving second and higher powers of  0 and 1 would be negligibly small. In such a case

YˆR  Y Y ( 0  1 )
and
E (YˆR  Y )  0.
So the ratio estimator is an unbiased estimator of population mean to the first order of approximation.

If we assume that only terms of  0 and 1 involving powers more than two are negligibly small (which is
more realistic them assuming powers more than one are negligibly small), then

YˆR  Y Y ( 0  1  12  1 0 )


and
 
E (YˆR  Y )  Y  0  0  C X2   C X C y 
f f
 n n 
Bias (Yˆ )  E (YˆR  Y )  YC X (C X   CY ).
f
n
Upto second order of approximation, the bias generally decreases as the sample size grows large.

The bias of YˆR is zero, i.e.,

4
Bias (YˆR )  0
if E (12   01 )  0
Var ( x ) Cov( x , y )
or if  0
X2 XY
1  Y 
or if 2 Var ( x )  Cov( x , y )   0
X  X 
or if Var ( x )  RCov( x , y )  0 (assuming X  0)
Y Cov( x , y )
or if R  
X Var ( x )
which is satisfied when the regression line of Y on X passes through origin.

Now, to find the mean squared error, consider

MSE (YˆR )  E (YˆR  Y ) 2


 E Y 2 ( 0  1  12  1 0  ...)2 
E Y 2 ( 02  12  2 01 ) 

Under the assumption 1 1 and the terms of  0 and 1 involving powers more than two are negligible

small,
f 
MSE (YˆR )  Y 2  C X2  CY2 
f 2f
 C X CY 
n n n 
2
Y f
 C X2  CY2  2  C X C y 
n
Up to the second order of approximation.

Efficiency of ratio estimator in comparison to SRSWOR


Ratio estimator is better estimate of Y than sample mean based on SRSWOR if

MSE (YˆR )  VarSRS ( y )


f f
or if Y 2 (C X2  CY2  2  C X CY )  Y 2 CY2
n n
or if C X  2  C X CY  0
2

1 CX
or if   .
2 CY

5
Thus ratio estimator is more efficient than sample mean based on SRSWOR if
1 CX
 if R  0
2 CY
1 CX
and    if R  0.
2 CY
It is clear from this expression that the success of ratio estimator depends on how close is the auxiliary
information to the variable under study.

Upper limit of ratio estimator:


Consider

Cov( Rˆ , x )  E ( Rx
ˆ )  E ( Rˆ ) E ( x )
y 
 E  x   E ( Rˆ ) E ( x )
x 
 Y  E ( Rˆ ) X .
Thus
Y Cov( Rˆ , x )
E ( Rˆ )  
X X
Cov( Rˆ , x )
 R
X
ˆ ˆ
Bias ( R )  E ( R )  R
Cov( Rˆ , x )

X
 Rˆ , x  Rˆ x

X

where  Rˆ , x is the correlation between Rˆ and x ;  Rˆ and  x are the standard errors of Rˆ and x

respectively.
Thus

  Rˆ , x  Rˆ x
Bias( Rˆ ) 
X
 Rˆ x

X
 Rˆ , x 
1 .

6
assuming X  0. Thus

Bias ( Rˆ ) x

 Rˆ X

Bias ( Rˆ )
or  CX
 Rˆ

when C X is the coefficient of variation of X. If C X < 0.1, then the bias in R̂ may be safely regarded as

negligible in relation to standard error of Rˆ .

Alternative form of MSE (YˆR )


Consider
N N 2

 (Y  RX )   (Y  Y )  (Y  RX ) 
i 1
i i
2

i 1
i i

N 2

  (Yi  Y )  R( X i  X )  (Using Y  RX )
i 1
N N N
  (Yi  Y ) 2  R 2  ( X i  X ) 2  2 R  ( X i  X )(Yi  Y )
i 1 i 1 i 1
N
1

N  1 i 1
(Yi  RX i ) 2  SY2  R 2 S X2  2 RS XY .

The MSE of YˆR has already been derived which is now expressed again as follows:

fY 2 2
MSE (YˆR )  (CY  C X2  2  C X CY )
n
f  S2 S2 S 
 Y 2  Y2  X2  2 XY 
n Y X XY 
f Y2  2 Y2 2 Y 
 S  2 S X  2 S XY 
2  Y
nY  X X 
  SY2  R 2 S X2  2 RS XY 
f
n
N
f
 
n( N  1) i 1
(Yi  RX i ) 2

N n N
 
nN ( N  1) i 1
(Yi  RX i ) 2 .

7
Estimate of MSE (YˆR )

Let Ui  Yi  RX i , i  1, 2,.., N then MSE of YˆR can be expressed as

f 1 N
MSE (YˆR )   (U i  U ) 2
n N  1 i 1
f
= SU2
n
1 N
where SU2  
N  1 i 1
(U i  U ) 2 .

Based on this, a natural estimator of MSE (YˆR ) is

MSE (YˆR )  su2


f
n
1 n
where su2  
n  1 i 1
(ui  u ) 2

1 n
  (ui  u )  Rˆ ( xi  x ) 
n  1 i 1 
 s 2  Rˆ 2 s 2  2 Rs
y x
ˆ ,
xy

y
Rˆ  .
x
Based on the expression
N
MSE (YˆR ) 
f

n( N  1) i 1
(Yi  RX i )2 ,

an estimate of MSE (YˆR ) is


n
MSE (YˆR ) 
f
 ( yi  Rx
n(n  1) i 1
ˆ )2
i
.
f 2 ˆ2 2 ˆ ).
 ( s y  R sx  2 Rs xy
n

8
Confidence interval of ratio estimator
If the sample is large so that the normal approximation is applicable, then the 100(1-  )% confidence

interval of Y and R are

ˆ ˆ ˆ ˆ 
 YR  Z  Var (YR ), YR  Z  Var (YR ) 
 2 2 
and
ˆ 
 R  Z  Var ( Rˆ ), Rˆ  Z  Var ( Rˆ ) 
 2 2 
respectively where Z  is the normal derivate to be chosen for given value of confidence coefficient
2

(1   ).

If ( x , y ) follows a bivariate normal distributions, then (Y  Rx ) is normally distributed. It SRS is


followed for drawing the sample, then assuming R is known
y  Rx
N n 2
( s y  R 2 sx2  2 R sxy )
Nn
is approximately N(0,1).

This can also be used for finding confidence limits, see Cochran (1997, Chapter 6, page 156) for more
details.

Conditions under which the ratio estimate is optimum


The ratio estimate YˆR is best linear unbiased estimator of Y when

(i) the relationship between yi and xi is linear passing through origin., i.e.

yi   xi  ei ,

where ei ' s are independent with E (ei / xi )  0 and  is the slope parameter

(ii) this line is proportional to xi , i.e.

Var ( yi / xi )  E (ei2 )  Cxi


where C is constant.

9
n
Proof. Consider the linear estimate of  as ˆ   i yi where yi   xi  ei .
i 1

Then ˆ is unbiased of Y   X as E ( y)   X  E (ei / xi ).

If n sample values of xi are kept fixed and then in repeated sampling


n
E ( ˆ )   i xi 
i 1
n n
and Var ( ˆ )   2
i Var ( yi / xi )  C  2i xi
i 1 i 1

n
So E ( ˆ )   when  i 1
x  1.
i i

n
Consider the minimization of it Var ( yi / xi ) subject to condition for unbiased estimator 
i 1
x  1 using
i i

Lagrangian function. Thus


n
  Var ( yi / xi )  2 ( i xi  1.)
i 1
n n
 C ( 12 xi  2 ( i xi  1).
i 1 i 1

Now

 0  i xi   xi , i  1, 2,.., n
 i
 n
 0   i xi  1
 i 1
n
Using i 1
x 1
i i

n
or x
i 1
i 1

1
or   .
nx
1
i 
nx
n

y i
y
and so ˆ  i 1
 .
nx x

Thus ˆ is not only superior to y but also best in the class of linear and unbiased estimators.

10
Alternative approach:
This result can alternatively be derived as follows:
y Y
The ratio estimator Rˆ  is the best linear unbiased estimator of R  if the following two
x x
conditions a hold:
(i) For fixed x, E ( y)   x, i.e., the line of regression of y on x is a straight line passing through
the origin.
(ii) For fixed x , Var ( x)  x, i.e., Var ( x)   x where  is constant of proportionality.
Proof: Let y  ( y1) , y2 ,..., yn ) ' and x  ( x1, x2 ,..., xn ) ' be two vectors of observations on

y ' s and x ' s. Hence for any fixed x ,


E ( y)   x
Var ( y)     diag( x1 , x2 ,..., xn )

where diag( x1 , x2 ,..., xn ) is the diagonal matrix with x1 , x2 ,..., xn as the diagonal elements.

The best linear unbiased estimator of  is obtained by minimizing

S 2  ( y   x ) '  1 ( y   x )
n
( yi   xi ) 2
 .
i 1  xi
Solving
S 2
0

n
  ( yi  ˆ xi )  0
i 1

y ˆ
or ˆ  R.
x
ˆ  Yˆ is the best
Thus R̂ is the best linear unbiased estimator of R Consequently, RX R

linear unbiased estimator of Y .

11
Ratio estimator in stratified sampling
Suppose a population of size N is divided into k strata. The objective is to estimate the population mean
Y using ratio method of estimation.

In such situation, a random sample of size ni is being drawn from ith strata of size N i on variable under
study Y and auxiliary variable X using SRSWOR.
Let
yij : jth observation on Y from ith strata

xij : jth observation on X from ith strata i =1, 2,…,k; j  1, 2,..., ni .

An estimator of Y based on the philosophy of stratified sampling can be devised in following two
possible ways:

1. Separate ratio estimator


- Employ first the ratio method of estimation separately in each strata and obtain ratio estimator

YˆRi i  1, 2,.., k assuming the stratum mean X i to be known.

- Then combine all the estimates using weighted arithmetic mean.

This gives the separate ratio estimator as


k N Yˆ
ˆ
YRs  
i Ri

i 1 N
k
  wiYˆRi
i 1

k
yi
  wi Xi
i 1 xi
ni
1
where yi 
ni
Yj 1
ij : sample mean of Y from ith strata

ni
1
xi 
ni
xj 1
ij : sample mean of X from ith strata

Ni
1
Xi 
Ni
x
j 1
ij : mean of all the units in ith stratum

12
No assumption is made that the true ratio remains constant from stratum to stratum. It depends on
information on each X i .

2. Combined ratio estimator:


- Find first the stratum mean of Y ' s and X ' s as
k
yst   wi yi
i 1
k
xst   wi xi .
i 1

- Then define the combine ratio estimator as

YˆRc  st X
y
xst
N
where X is the population mean of X based on all the N   Ni units. It does not depend on individual
i 1

stratum units. It does not depend on information on each X i but only on X .

Properties of separate ratio estimator:


k k

i i and YRs   wY
Note that there is an analogy between Y   wY i Ri .
i 1 i 1

We already have derived the bias of YˆR  X as


y
x

E (YˆR )  Y  (Cx2  C X CY ) .
Yf
n

So for YˆRi , we can write

E (YˆRi )  Yi  Yi i (Cix2  i CiX CiY )


f
ni
Ni Ni
1 1
where Yi 
Ni
 yij , X i 
j 1 Ni
x
j 1
ij

N i  ni 2 Siy
2
Six2
fi  , Ciy  2 , Cix  2 ,
2

Ni Yi Xi
1 Ni 1 Ni
Siy2  
N i  1 j 1
(Yij  Yi ) 2 , Six2  
N i  1 j 1
( X ij  X i ) 2 ,

13
i : correlation coefficient between the observation on X and Y in ith stratum
Cix : coefficient of variation of X values in ith sample.
Thus
k
E (YRs )   wi E (YˆRi )
i 1
k
 f 
  wi Yi  Yi i (Cix2  ix CixCiy 
i 1  ni 
k
wY f
 Y   i i i (Cix2  i Cix Ciy )
i 1 ni

Bias (YRs )  E (YRs )  Y


k
wiYi f i
 Cix (Cix  i Ciy ).
i 1 ni

Assuming finite population correction to be approximately 1, ni  n / k and Cix , Ciy and i are all same for

the ith stratum as Cx , C y and  respectively.

Bias(YˆRs )  (Cx2  CxC y ) .


k
n
Thus the bias is negligible when the sample size within each stratum should be sufficiently large and is
unbiased when Cix  Ciy .

Now we derive the MSE of YˆRs . We already have derived the MSE of YˆR earlier as

Y2f
MSE (YˆR )  (C X2  CY2  2  CxC y )
n
N
f
 
n( N  1) i 1
(Yi  RX i ) 2

Y
where R  .
X
Thus for ith stratum

MSE (YˆRi ) 
fi
(CiX2  CiY2  2  CiX CiY )
ni ( Ni  1)
Ni
fi
 
ni ( Ni  1) i 1
(Yij  Ri X ij ) 2

and so
14
k
MSE (YˆRs )   wi2 MSE (YˆRi )
i 1
k
 w2 f 
   i i Yi 2 (CiX2  CiY2  2  CiX CiY ) 
i 1  ni 
k  Ni

fi
   wi2  (Yij  Ri X ij ) 2 
i 1  ni ( N i  1) j 1 

An estimate of MSE (YˆRs ) can be found by substituting the unbiased estimators of SiX2 , SiY2 and SiY2 as

six2 , siy2 and sixy respectively for ith stratum and Ri  Yi / X i can be estimated by ri  yi / xi .

ˆ
k
 wi2 fi 2 
MSE (YRs )    ( siy  ri 2 six2  2ri sixy ) .
i 1  ni 
Also .
k  
wi2 fi ni
MSE (YˆRs )     ( yij  ri xij ) 2 
i 1  ni ( ni  1) j 1 

Properties of combined ratio estimator:

Here
k

w y i i
YˆRC 
yst
i 1
k
X X  Rˆc X .
w x
xst
i i
i 1

It is difficult to find the exact expression of bias and mean squared error of YˆRc , so we find their
approximate expressions.
Define
yst  Y
1 
Y
x X
 2  st
X
E (1 )  0
E ( 2 )  0

15
k
N i  ni wi2 SiY2 k
f i wi2 SiY2
E (12 )    
i 1 N i ni Y 2 i 1 ni Y2
fi wi2 SiX2
k
E ( )  
2
2
i 1 ni Y2
k
fi SiXY
E (1 2 )   .
i 1 ni XY

Thus assuming  2  1,

(1  1 )Y
YˆRC  X
(1   2 ) X
 Y (1  1 )(1   2   22  ...)
 Y (1  1   2  1 2   22  ...)

Retaining the terms upto order two due to same reason as in the case of YˆR ,

YˆRC Y (1  1   2  1 2   22 )
Yˆ  Y (        2 )
RC 1 2 1 2 2

The approximate bias of YˆRc upto second order of approximation is

Bias (YˆRc )  E (YˆRc  Y )


YE (1   2  1 2   22 )
 YE (0  0  1 2   22 )
k 
f  S 2 S 
 Y   i wi2  iX2  iXY  
i 1  ni X XY  
k 
f  S 2  S S 
 Y   i wi2  iX2  i iX iY  
i 1  ni X XY  
Y k  fi S  S 
 n wi2 SiX  iX  i iY  
X i 1  i  X Y 
f
k

 R   i wi2 SiX  CiX  i CiY  
i1  ni 
Y
where R  , i is the correlation coefficient between the observations on Y and X in the ith stratum,
X
Cix and Ciy are the coefficients of variation of X and Y respectively is ith stratum.

The mean squared error upto second order of approximation is

16
MSE (YˆRc )  E (YˆRc  Y ) 2
Y 2 E (1   2  1 2   2 ) 2
Y 2 E (12   22  21 2 )
k 
f  S 2 S 2 2S 
 Y   i wi2  iX2  iY2  iXY 
i 1  ni X Y XY 
k 
f  S 2 S 2 2 S S 
 Y 2   i wi2  iX2  iY2  i iX iY  
i 1  ni X Y X Y 
Y2 k  fi 2  Y 2 2 Y 
 2   wi  2 SiX  SiY  2 i SiX SiY  
2

Y i 1  ni X X 
k
f 
   i wi2 ( R 2 SiX2  SiY2  2 i RSiX SiY ) .
i 1  ni 

An estimate of MSE (YRc ) can be obtained by replacing SiX2 , SiY2 and SiXY by their unbiased estimators

Y y
six2 , siy2 and sixy respectively whereas R  is replaced by r  as follows: Thus the following estimate
X x
is obtained:
k  2  six2 siy2 sixy  
w f
MSE (YRc )  Y 2   1 i  2  2 
i 1 
 ni X Y 2
XY  
 w2 f 
   i i  r 2 six2  siy2  2rsixy  
k

i 1  ni 
where X is known.

Comparison of combined and separate ratio estimators


An obvious question arises that which of the estimates YˆRs or YˆRc is better. So we compare their MSEs.
Note that the only difference in the term of these MSEs is due to the form of ratio estimate. It is

in MSE (YˆRs )
yi
 Ri 
xi

in MSE (YˆRc ).
Y
 R
X
Thus

17
  MSE (YˆRc )  MSE (YˆRs )
k
 w2 f 
   i i ( R 2  Ri2 ) SiX2  2( Ri  R) i SiX SiY  
i 1  ni 
k
 w2 f 
   i i ( R  Ri2 ) 2 SiX2  2( R  Ri )( Ri SiX2  i SiX SiY )  .
i 1  ni 
The difference  depends on
(i) The magnitude of the difference between the strata ratios ( Ri ) and population ratio as whole
(R).
(ii) The value of ( Ri Six2  i Six Siy ) is usually small and vanishes when the regression line of y on

x is linear and passes through origin within each stratum. In such a case

MSE (YˆRc )  MSE (YˆRs )


but Bias (Yˆ )  Bias (Yˆ ).
Rc Rs

So unless Ri varies considerably, the use of YˆRc would provide an estimate of Y with negligible bias

and precision as good as YˆRs .

 If Ri  R, YˆRs can be more precise but bias may be large.

 If Ri R, YˆRc can be as precise as YˆRs but its bias will be small. It also does not require

knowledge of X1 , X 2 ,..., X k .

Ratio estimators with reduced bias:

The ratio type estimators that are unbiased or have smaller bias them Rˆ , YˆR or YˆRc (tot ) are useful in sample

surveys . There are several approaches to derive such estimators. We consider here two such approaches:

1. Unbiased ratio – type estimators:


Y
Under SRS, the ratio estimator has form X to estimate the population mean Y . As an alternative to
x
this, we consider following as an estimator of population mean
1 n Y 
YˆRo    i X .
n i 1  X i 

18
Yi
Let R  , i  1, 2,.., N ,
Xi
then
1 n
YˆR 0   Ri X
n i 1
 rX
where
1 n
r   Ri
n i 1
Bias (YˆR 0 )  E (YˆR 0 )  Y
 E (rX )  Y
 E (r ) X  Y .
Since
1 n 1 N
E (r )  (
n i 1 N
R )
i 1
i

n
1

N
R
i 1
i

 R.
So Bias (YˆR 0 )  RX  Y .

N n
Using the result that under SRSWOR, Cov( x , y )  S XY , it also follows that
Nn
N n 1 N
Cov(r , x )   ( Ri  R )( X i  X )
Nn N  1 i 1
N n 1 N
 ( Ri X i  NRX )
Nn N  1 i 1
N n 1 N
Y
 ( i X i  NRX )
Nn N  1 i 1 X i
N n 1
 ( NY  NRX )
Nn N  1
N n 1
 [ Bias (YˆR 0 )].
Nn N  1
N n
Thus using the result that in SRSWOR, Cov( x , y )  S XY , we have
Nn

19
Nn( N  1)
Bias (YˆRo )   Cov(r , x )
N n
Nn( N  1) N  n
 S RX
N n Nn
 ( N  1) S RX

1 N
where S RX   ( Ri  R)( X i  X ).
N  1 i 1
The following result helps is obtaining an unbiased estimator of population mean:.
Since under SRSWOR set up,
E ( sxy )  S xy
1 n
where sxy   ( xi  x )( yi  y ),
n  1 i 1
1 N
S xy   ( X i  X )(Yi  Y )
N  1 i 1

so an unbiased estimator of the bias is Bias(YˆR 0 )  ( N  1)S RX which is obtained as follows:

Bias (YˆR 0 )  ( N  1) srx


N 1 n
  (ri  r )( xi  x )
n  1 i 1
N 1 n
 ( ri xi  n r x )
n  1 i 1
N  1  n yi 
   xi  nr x 
n  1  i 1 xi 
N 1
 (ny  nr x ).
n 1
So
n( N  1)
Bias (YR 0 )  E (YˆR 0 )  Y   ( y  r x ).
n 1
Thus
E YˆR 0  Bias (YˆR 0 )   Y
 
 n( N  1) 
or E YˆR 0  ( y  r x )  Y .
 n 1 
Thus
n( N  1) n( N  1)
YˆR 0  ( y  r x )  rX  (y  r x)
n 1 n 1
is an unbiased estimator of population mean.

20
2. Jackknife method for obtaining a ratio estimate with lower bias
Jackknife method, is used to get rid of the term of order 1/n from the bias of an estimator. Suppose the

E ( Rˆ ) can be expanded after ignoring finite population correction as


a a
E ( Rˆ )  R  1  22  ...
n n
Let n = mg and the sample is divided at random ratio g groups, each of size m. Then
ga ga
E ( gRˆ )  gR  1  2 2 2  ...
gm g m
a a
 gR  1  2 2  ...
m gm

Let Rˆi*   *
yi
where the  *
denote that the summation is on all values of the
 *
xi

sample except the ith group. So Rˆi* is based on a simple random sample of size m(g - 1),
so we can express
a1 a
E ( Rˆi* )  R   2 2 2  ...
m( g  1) m ( g  1)
or
a a
E ( g  1) Rˆi*   ( g  1) R  1  2 2  ...
m m ( g  1)
Thus
a2
E  gRˆ  ( g  1) Rˆi*   R   ...
g ( g  1)m 2
or
a g
E  gRˆ  ( g  1) Rˆi*   R  22  ...
n g 1
1
Hence the bias of  gRˆ  ( g  1) Rˆi*  is of order 2 .
n
Now g estimates of this form can be obtained, one estimator for each group. Then the jackknife or
Quenouille’s estimator is the average of these of estimators
g

 Rˆ i
RˆQ  gRˆ  ( g  1) i 1
.
g

A large sample variance of YˆHR is obtained as follows. We assume n and N are large enough so that

21
n
 1 and make r  R . Then
n 1

YˆHR  ( y  Rx ).

Hence, large sample variance of YˆHR is given by

1 f
Var (YˆHR )   S y2  R 2 S x2  2 RS xy  .
n 

Product method of estimation:


1 C
The ratio estimator is more efficient than the mean of a SRSWOR if   . x , if R  0, which is
2 Cy

1 Cx
usually the case. This shows that if auxiliary information is such that    , then we cannot use the
2 Cy

ratio method of estimation to improve the sample mean as an estimator of population mean. So there is
need of another type of estimator which also makes use of information on auxiliary variable x . Product
estimator is an attempt in this direction.
The product estimator of the population mean Y is defined as

YˆP 
yx
.
X

We now derive the bias and variance of Yˆp .

y Y xX
Let  0  , 1  ,
Y X

(i) Bias of Yˆp .

We write Yˆp as

Yˆp 
yx
 Y (1   0 )(1  1 )
X
 Y (1  1   2  1 0 ).

Taking expectation we obtain bias of Yˆp as

Bias(Yˆp )  Cov( y , x ) 
1 f
S xy ,
X nX

22
which shows that bias of Yˆp decreases as n increases. Bias of Yˆp can be estimated by

Bias(Yˆp ) 
f
sxy .
nX

(ii) Variance of Yˆp :

Writing Yˆp is terms of  0 and 1 , we find that the variance of the product estimator Yˆp upto second order

of approximation is given by

Var (Yˆp )  E (Yˆp  Y ) 2


 Y 2 E (1   2  1 2 ) 2
 Y 2 E (12   22  21 2 ).

Here terms in (1 ,  2 ) of degrees greater than two are assumed to be negligible. Using we find that

Var (Yˆp )   SY2  R 2 S X2  2RS XY  .


f
n

(iii) Estimation of variance of Yˆp

The variance of Yˆp can be estimated by

Var (Yˆp )   s y2  r 2 sx2  2rsxy 


f
n
where r  y / x .
(iv) Comparison with SRSWOR:
From the variances of the mean of SRSWOR and the product estimator, we obtain

Var ( y ) SRS  Var (Yˆp )   RS X (2  SY  RS X ),


f
n

which shows that Yˆp is more efficient than the simple mean y for

1 Cx
 if R  0
2 Cy

and for
1 Cx
  if R  0.
2 Cy

23
Multivariate Ratio Estimator
Let y be the study variable and X1 , X 2 ,..., X p be p auxiliary variables assumed to be corrected with y .

Further it is assumed that X1 , X 2 ,..., X p are independent. Let Y , X1 , X 2 ,..., X p be the population means of

the variables y , X1 , X 2 ,..., X p . We assume that a SRSWOR of size n is selected from the population of

N units. The following notations will be used.


Si2  the population mean sum of squares for the variate X i ,
si2  the sample mean sum of squares for the variate X i ,
S02  the population mean sum of squares for the study variable y,
s02  the sample mean sum of squares for the study variable y,
Si
Ci   coefficient of variation of the variate X i ,
Xi
S0
C0   coefficient of variation of the variate y,
Y
S
i  iy  coefficient of correlation between y and X i ,
Si S 0

YˆRi   ratio estimator of Y , based on X i


y
xi

where i  1, 2,..., p. Then the multivariate ratio estimator of Y is given as follows.


p p
YˆMR   wiYˆRi , w 1 i
i 1 i 1
p
Xi
 y  wi .
i 1 xi

(i) Bias of the multivariate ratio estimator:

The bias of YˆRi as

Bias(YˆRi )  Y (Ci2  iCiC0 ).


f
n

The bias of YˆMR is obtained as


p
Bias(YˆMR )   wi
Yf
(Ci2  i Ci C0 )
i 1 n
p
Yf

n
 w C (C   C ).
i 1
i i i i 0

24
(ii) Variance of the multivariate ratio estimator:

The variance of YˆRi is given by

Var (YˆRi )  Y 2 (C02  Ci2  2 iC0Ci ).


f
n

The variance of YˆMR is obtained as

ˆ f 2 p 2 2
Var (YMR )  Y  wi (C0  Ci2  2 iC0Ci ).
n i 1

25
Chapter 6
Regression Method of Estimation
The ratio method of estimation uses the auxiliary information which is correlated with the study variable to
improve the precision which results in improved estimators when the regression of y on x if is linear and
passes through origin. When the regression of if on X is linear, it is not necessary that the line should
always pass through origin. Under such conditions, it is more appropriate to use the regression type
estimators.

In ratio method, the conventional estimator sample mean y was improved by multiplying it by a a factor

X
where x is an unbiased estimator of population mean X which is chosen as population mean of
x
auxiliary variable. Now we consider another idea based on difference.

Consider an estimator ( x  X ) for which E ( x  X )  0.

Consider an improved estimator of Y as

Yˆ *  y   ( x  X )

which is an unbiased estimator of Y and  is any constant. Now find  such that the Var (Yˆ * ) is
minimum

Var (Yˆ *)  Var ( y )   2 Var ( x )  2  Cov( x , y )


Var (Y * )
0

Cov( x , y )

Var ( x )
N n
S XY
 Nn
N n 2
SX
Nn
S
 XY2
SX
1 N 1 N
where S XY   i
N  1 i 1
( X  x )(Yi  y ), S 2
X   ( X i  X ).
N  1 i 1

1
Note that the value of regression coefficient  is a linear regression model y  x  e of y on x
n
Cov( x, y ) S xy
obtained by minimizing e
i 1
2
i based on n data sets ( xi , yi ), i  1, 2,.., n is  
Var ( x)
 2 . Thus the
Sx
optimum value of  is same as the regression coefficient of y on x with a negative sign, i.e.,
   .

So the estimator Yˆ * with optimum value of  is

Yˆreg  y   ( X  x )

which is the regression estimator of Y and the procedure of estimation is regression method of estimation.

The variance of Yˆreg is

Var (Yˆreg )  V ( y )[1   2 ( x , y )]

where  ( x , y ) is the correlation coefficient between x and y . So Yˆreg would be efficient of x and y

ˆ
are highly correlated. The estimator Yreg is more efficient than Y of  ( x , y )  0 which generally holds.

Regression estimates with preassigned  :

If value of  is known as  0 , say then the regression estimator is

Yˆreg  y  0 ( X  x ) .

Bias of Yreg :

Now, assuming that the random sample ( xi , yi ), i  1, 2,.., n is drawn by SRSWOR,

E (Yˆreg )  E ( y )   0  X  E ( x ) 
 Y   0  X  X ) 
Y
ˆ
Thus Yreg is an unbiased estimator of Y when  is known.

2
Variance of Yˆreg
2
Var (Yˆreg )  E Yˆreg  E (Yˆreg ) 
 
2
 E  y   0 ( X  x )  Y 
2
 E ( y  Y )   0 ( x  X ) 
 E ( y  Y ) 2   02 ( x  X )  2  0 E ( x  X )( y  Y 
 Var ( y )   02Var ( x )  2  0Cov( x , y )
f
  SY2   02 S X2  2 0 S XY 
n
f
  SY2   02 S X2  2 0  S X SY 
n

where
N n
f 
N
1 N
S X2   ( X i  X )2
N  1 i 1
1 N
SY2  
N  1 i 1
(Yi  Y ) 2

 : Correlation coefficient between X and Y .

Comparing the Var (Yˆreg ) with Var ( y ) , we note that

Var (Yˆreg )  Var ( y ) .

If 02 S X2  20 S XY  0

 2S XY 
or 0 S X2  0  0
 S X2 

 2S  2S
which is possible when either 0  0 and  0  2XY   0  2XY  0  0 .
 SX  SX

 2S  2S 2S
or 0  0 and  0  2XY )   0  0  0  2XY 0  2XY .
 SX  SX SX

3
Optimal value of 

Choose  such that Var (Yˆreg ) is minimum .

So

Var (Yˆreg ) 
  SY2   2 S X2  2 S X SY   0
  
S S
    Y  XY2 .
SX SX

S
The minimum value of variance of Yˆreg with optimum value of  opt  Y is
SX

f  S2 
Varmin (Yˆreg )   SY2   2 Y2 S X2  2  Y  S X SY 
S
n SX SX 
f
 SY2 (1   2 ).
n
Since 1    1, so

Var (Yˆreg )  VarSRS ( y )

which always holds true.

Departure from  :

If  0 is the preassigned value of regression coefficient, then

Varmin (Yˆreg )   SY2   02 SY2  2 0  S X SY 


f
n
f
  SY2   02 S X2  2  0 S X SY   2 SY2   2 SY2 
n
f
 (1   2 ) SY2   02 S X2  2 0 S X  opt   opt
2
S X2 
n
f
 (1   2 ) SY2  (  0   opt ) 2 S X2 
n

 SY
where  opt  .
SX

4
Estimate of variance

An unbiased sample estimate of Var (Yˆreg ) is

n 2

Var (Yˆreg ) 
f
 ( yi  y )  0 ( xi  x )
n(n  1) i 1
n
f

n
 (s
i 1
2
y   02 sx2  2 0 sxy ).

Regression estimates when  is computed from sample

Suppose a random sample of size n, ( xi , yi ), i  1, 2,.., n is drawn by SRSWOR. When  is unknown, it is


estimated as
sxy
ˆ 
sx2

and then the regression estimator of Y is

Yˆreeg  y  ˆ ( X  x ).

ˆ
It is difficult to find the exact expressions of E (Yreg ) and Var (Yreg ). So we approximate then using the

same methodology as in the case of ratio method of estimation.


Let
y Y
0   y  Y (1   0 )
Y
xX
1   x  X (1  1 )
x
s  S XY
 2  xy  sxy  S XY (1   2 )
S XY
sx2  S x2
3  2
 sx2  S X2 (1   3 )
Sx
Then

5
E ( 0 )  0
E (1 )  0
E ( 2 )  0
E ( 3 )  0
f 2
E ( 02 )  CY
n
f
E (12 )  C X2
n
f
E ( 0 1 )   C X CY
n
and
sxy
Yreg  y  (X  x)
sx2
sxy (1   2 )
 Y (1   0 )  (1 X )
sx2 (1   3 )

The estimation error of Yˆreg is

(Yˆreg  Y )  Y  0   X 1 (1   2 )(1   3 )1

S XY
where   is the population regression coefficient.
S X2

Assuming  3 1,

(Yˆreg  Y )  Y  0   X (1  1 2 )(1   3   32  ....)

Retaining the terms upto second power of  ' s and ignoring other terms, we have

(Yˆreg  Y ) Y  0   X 1 (1  1 2 )(1   3   32) )


Y  0   X (1  1 3  1 2 )

ˆ
Bias of Yreg

ˆ
Now the bias of Yreg is

E (Yˆreg  Y ) E Y  0   X 1 (1  1 2 )(1   3   32) ) 


 Xf  21  
   302 
n  XS XY XS X 

6
N n
where f  , (r , s)th cross product moment is
N
rs  E ( x  X )r ( y  Y )s 
so

21  E ( x  X )2 ( y  Y ) 
30  E ( x  X )3 

So

 f  21 30 
Bias(Yˆreg )     .
n  S XY S X2 

Also,

Bias(Yˆreg )  E ( y )  E[ ˆ ( X  x )]
 Y  XE ( ˆ )  E ( ˆ x )
 Y  E ( x ) E ( ˆ )  E ( ˆ x )
 Y  Cov( ˆ , x )
Bias(Yˆreg )  E (Yˆreg )  Y  Cov( ˆ , x )

ˆ
MSE of Yreg

ˆ
To obtain the MSE of Yreg , consider

E (Yˆreg  Y )2  E  0Y   X (1  1 3  1 2 ) 


2

Retaining the terms of  ' s upto the second power second and ignoring others, we have

E (Yˆreg  Y ) 2  E  02Y 2   2 X 212  2  XY  0 1 


 Y 2 E ( 0 )   2 X 2 E (12 )  2  XYE ( 01 )
 2 SY2
f 2 SX
2
S S 
  Y2
Y   2
X 2
 2 XY  X Y 
n X XY 
MSE (Yˆreg )  E (Yˆreg  Y ) 2
f 2
( SY   2 S X2  2 S X SY )

n
S S
Since   XY2   Y ,
SX SX
ˆ
so substituting it is MSE (Yreg ), we get
7
f 2
MSE (Yreg )  SY (1   2 ).
n
So upto second order of approximation, the regression estimator is better than the conventional sample
mean estimator under SRSWOR. This is because the regression estimator uses some extra information
also. Moreover, such extra information requires some extra cost also. This shows a false superiority in
some sense. So the regression estimators and SRS estimates can be combined if cost aspect is also taken
into consideration.

ˆ
Comparison of Yreg with ratio estimate and SRS sample mean estimate

MSE (Yˆreg )  SY2 (1   2 )


f
n
MSE (YˆR )  ( SY2  R 2 S X2  2  RS X SY )
f
n
f
VarSRS ( y )  SY2 .
n
ˆ ˆ
(i) As MSE (Yreg )  VarSRS ( y )(1   2 ) since  2  1, so Yreg is always superior to y .

(ii ) Yˆreg is better than YˆR if MSE (Yreg )  MSE (YˆR )


f 2 f
or if SY (1   2 )  ( SY2  R 2 S X2  2  RS X SY )
n n
or if ( RS X   SY )  0
2

which always holds true.

So regression estimate is always superior to ratio estimate upto second order of approximation.

Regression estimates in stratified sampling

Under the set up of stratified sampling, let the population of N sampling units is divided into k strata.
k
The strata sizes are N1 , N2 ,.., N k such that N
i 1
i  N . A sample of size ni on ( xij , yij ), j  1, 2,.., ni ,

is drawn from ith strata (i = 1,2,..,k) by SRSWOR where xij and yij denotes the jth unit from ith strata on

auxiliary and study variables, respectively.

In order to estimate the population mean, there are two approaches.


8
1. Separate regression estimator
 Estimate regression estimator

Yˆreg  y  0 ( X  x )
from each stratum separately i.e., the regression estimate in the ith stratum is

Yˆreg (i )  yi  i ( X i  xi ).

 Find the stratified mean as the weighted mean of Yˆreg (i ) i  1, 2,.., k as

k N Yˆ
ˆ
Ysreg   i reg (i )
i 1 N
k
  [ wi { yi  i ( X i  xi )}]
i 1

Sixy Ni
where i  2
, wi  .
S ix N
In this approach , the regression estimator is separately obtained in each stratum and then combined
ˆ
using the philosophy of stratified sample. So Ysreg is termed as separate regression estimator,

2. Combined regression estimator

ˆ
Another strategy is to estimate x and y in the Yreg as respective stratified mean. Replacing x by
k k
xst   wi xi and y by yst   wi yi , we have
i 1 i 1

Yˆcreg  yst   ( X  xst ).

In this case, all the sample information is combined first and then implemented in regression
ˆ
estimator, so Yreg is termed as combined regression estimator.

Properties of separate and combined regression


ˆ ˆ
In order to derive the mean and variance Ysreg and Ycreg , there are two cases

- when  is preassigned as  0

- when  is estimated from the sample.

9
sxy
We consider here the case that  is preassigned as  0 . Other case when  is estimated as   can
sx2

be dealt using the same approach based on defining various  ' s and using the approximation theory as in

the case of Yˆreg .

1. Separate regression estimator


Assume  is known, say  0 . Then
k
Yˆs reg   wi [ yi   0i ( X i  xi )
i 1
k
Yˆs reg   wi [ E ( yi )   0i ( X i  E ( xi )]
i 1
k
  wi [Yi  ( X i  X i )]
i 1

Y.
2
Var (Yˆs reg )  E Yˆs reg  E (Yˆs reg ) 
 
k k
  wi yi  i   wi  0i ( X i  xi )  Y ]2
i 1 i 1
k k
  wi ( yi  Y )   wi  0i ( xi  X i )]2
i 1 i 1
k k k
  wi2 ( yi  Yi ) 2   wi2  0i E ( xi  X i )]2   wi2  0i E ( xi  X i )( yi  Yi )
i 1 i 1 i 1
k k k
  wi2Var (Yi )   wi2  0iVar ( xi )  2 wi2  0i Cov( xi , yi )
i 1 i 1 i 1
k 2
w f 2
 ( SiY   oi2 SiX2  2  0i SiXY )]
i i

i 1 ni

Var (Yˆs reg ) is minimum when  0i  iXY


S
and so substituting  0i , we have
SiX2
k
 w2 f 
Vmin (Yˆs reg )    i i ( SiY2   02i SiX2 ) 
i 1  ni 
N n
where fi  i i .
Ni

Since SRSWOR is followed is drawing the samples from each stratum, so


10
E ( six2 )  SiX2
E ( siy2 )  SiY2
E ( sixy )  SiXY

Thus an unbiased estimator of variance can be obtained by replacing SiX2 and SiY2 by their respective

unbiased estimators six2 and siy2 respectively as

k
 w2 f 
Var (Yˆs reg )    i i ( siy2   oi2 six2  2  0i sixy ) 
i 1  ni 
and
k
 w2 f 
Var min (Yˆs reg )    i i ( siy2   oi2 six2 ) 
i 1  ni 

2. Combined regression estimator:


Assume  is known as  0 . Then
k k
Yˆc reg   wi yi   0 ( X   wi xi )
i 1 i 1
k k
Yˆc reg   wi E ( yi )   0 [ X   wi E ( xi )]
i 1 i 1
k k
  wY
i i   0 [ X   wi X i ]
i 1 i 1

 Y  0 ( X  X )
Y.
ˆ
Thus Yc reg is an unbiased estimator of Y .

Var (Yˆc reg )  E[Yc reg  E (Yc reg )]2


k k
 E[ wi yi   0 [ X   wi xi )  Y ]2
i 1 i 1
k k
 E[ wi ( yi  Y ) i  0  wi ( xi  X i )]2
i 1 i 1
k
 k k

  wi2Var ( yi )   o2   wi2Var ( xi )  2 wi2  Cov( xi , yi ) 
i 1  i 1 i 1 
2
k
wi2 f i 2
  SiY   o2 SiX2  2 0 SiXY  .
i 1 ni

11
Var (Yˆc reg ) is minimum when

Cov( xst , yst )


0 
Var ( xst )
k
wi2 fi
i 1 ni
SiXY
 k 2
wi fi 2
i 1 ni
SiX

and the minimum variance is given by


k 2
Varmin (Yˆc reg )   i i ( SiY2  02 SiX2 )
w f
i 1 ni

Since SRSWOR is followed to draw the sample from strata, so using


k
 w2 f 
Var (Yˆc reg )    i i ( siy2   o2 six2  2  0i sixy ) 
i 1  ni 
and
k
 w2 f 
Var min (Yˆc reg )    i i ( siy2   oi2 six2 ) 
i 1  ni 

ˆ ˆ
Comparison of Ys reg and Yc reg :

Note that
k 2
Var (Yˆc reg )  Var (Yˆs reg )   ( io   02 ) i i SiX2
w f
i 1 ni
k
fi
 ( io   0 ) 2 wi2 SiX2
i 1 ni
 0.
So if regression line of y on x is approximately linear and the regression coefficient do
not vary much among strata, then separate regression estimate is more efficient is more
efficient than combined regression estimator.

12
Chapter 7
Varying Probability Sampling

The simple random sampling scheme provides a random sample where every unit in the population has
equal probability of selection. Under certain circumstances, more efficient estimators are obtained by
assigning unequal probabilities of selection to the units in the population. This type of sampling is
known as varying probability sampling scheme.

If Y is the variable under study and X is an auxiliary variable related to Y, then in the most commonly
used varying probability scheme, the units are selected with probability proportional to the value of X,
called as size. This is termed as probability proportional to a given measure of size (pps) sampling. If
the sampling units vary considerably in size, then SRS does not takes into account the possible
importance of the larger units in the population. A large unit, i.e., a unit with large value of Y contributes
more to the population total than the units with smaller values, so it is natural to expect that a selection
scheme which assigns more probability of inclusion in a sample to the larger units than to the smaller
units would provide more efficient estimators than the estimators which provide equal probability to all
the units. This is accomplished through pps sampling.

Note that the “size” considered is the value of auxiliary variable X and not the value of study variable Y.
For example in an agriculture survey, the yield depends on the area under cultivation. So bigger areas are
likely to have larger population and they will contribute more towards the population total, so the value
of the area can be considered as the size of auxiliary variable. Also, the cultivated area for a previous
period can also be taken as the size while estimating the yield of crop. Similarly, in an industrial survey,
the number of workers in a factory can be considered as the measure of size when studying the industrial
output from the respective factory.

Difference between the methods of SRS and varying probability scheme:


In SRS, the probability of drawing a specified unit at any given draw is the same. In varying probability
scheme, the probability of drawing a specified unit differs from draw to draw.
It appears in pps sampling that such procedure would give biased estimators as the larger units are over-
represented and the smaller units are under-represented in the sample. This will happen in case of
sample mean as an estimator of population mean where all the units are given equal weight. Instead of
giving equal weights to all the units, if the sample observations are suitably weighted at the estimation

1
stage by taking the probabilities of selection into account, then it is possible to obtain unbiased
estimators.

In pps sampling, there are two possibilities to draw the sample, i.e., with replacement and without
replacement.

Selection of units with replacement:


The probability of selection of a unit will not change and the probability of selecting a specified unit is
same at any stage. There is no redistribution of the probabilities after a draw.

Selection of units without replacement:


The probability of selection of a unit will change at any stage and the probabilities are redistributed after
each draw.

PPS without replacement (WOR) is more complex than PPS with replacement (WR) . We consider both
the cases separately.

PPS sampling with replacement (WR):


First we discuss the two methods to draw a sample with PPS and WR.

1. Cumulative total method:


The procedure of selection a simple random sample of size n consists of
- associating the natural numbers from 1 to N units in the population and
- then selecting those n units whose serial numbers correspond to a set of n numbers where each
number is less than or equal to N which is drawn from a random number table.

In selection of a sample with varying probabilities, the procedure is to associate with each unit a set of
consecutive natural numbers, the size of the set being proportional to the desired probability.

If X 1 , X 2 ,..., X N are the positive integers proportional to the probabilities assigned to the N units in the
population, then a possible way to associate the cumulative totals of the units. Then the units are selected
based on the values of cumulative totals. This is illustrated in the following table:

2
Units Size Cumulative
1 X1 T1 = X 1
• If Ti −1 ≤ R ≤ Ti , then
2 X2 T= X1 + X 2
2 ith unit is selected
   Select a random with probability
number R Xi
i −1 , i = 1,2,…, N .
between 1 and TN
i −1 X i −1 Ti −1 = ∑ X j TN
j =1 by using random
number table.
i • Repeat the procedure
i Xi Ti = ∑ X j n times to get a
 
j =1
sample of size n.

N
XN = ∑ X j
N
N TN = ∑ X j
j =1 j =1

In this case, the probability of selection of ith unit is


Ti − Ti −1 X i
=Pi =
TN TN
⇒ Pi ∝ X i .

Note that TN is the population total which remains constant.

Drawback : This procedure involves writing down the successive cumulative totals. This is time
consuming and tedious if the number of units in the population is large.

This problem is overcome in the Lahiri’s method.

Lahiri’s method:
Let M = Max X i , i.e., maximum of the sizes of N units in the population or some convenient
i =1,2,..., N

number greater than M .


The sampling procedure has following steps:
1. Select a pair of random number (i, j) such that 1 ≤ i ≤ N , 1 ≤ j ≤ M .
2. If j ≤ X i , then ith unit is selected otherwise rejected and another pair of random number is
chosen.
3. To get a sample of size n , this procedure is repeated till n units are selected.
Now we see how this method ensures that the probabilities of selection of units are varying and are
proportional to size.

3
Probability of selection of ith unit at a trial depends on two possible outcomes
– either it is selected at the first draw
– or it is selected in the subsequent draws preceded by ineffective draws. Such probability is given by
P(1 ≤ i ≤ N ) P(1 ≤ j ≤ M | i )
1 Xi
= = . Pi * , say.
N M
1 N
 Xi 
Probability that no unit is selected at a=
trial
N
∑ 1 − M 
i =1

1 NX 
= N − 
N M 
X
=−
1 = Q, say.
M
Probability that unit i is selected at a given draw (all other previous draws result in the non selection of
unit i)
=Pi * + QPi * + Q 2 Pi * + ...
Pi *
=
1− Q
X i / NM Xi Xi
= = = ∝ Xi.
X /M NX X total

Thus the probability of selection of unit i is proportional to the size X i . So this method generates a pps
sample.

Advantage:
1. It does not require writing down all cumulative totals for each unit.
2. Sizes of all the units need not be known before hand. We need only some number greater than the
maximum size and the sizes of those units which are selected by the choice of the first set of
random numbers 1 to N for drawing sample under this scheme.

Disadvantage: It results in the wastage of time and efforts if units get rejected.
X
The probability of rejection = 1 − .
M
M
The expected numbers of draws required to draw one unit = .
X
This number is large if M is much larger than X .

4
Example: Consider the following data set of 10 number of workers in the factory and its output. We
illustrate the selection of units using the cumulative total method.

Factory no. Number of workers Industrial production Cumulative total of sizes


(X) (in thousands) (in metric tonns) (Y)
1 2 30 T1 = 2

2 5 60 T2 = 2 + 5 = 7

3 10 12 T3 = 2 + 5 + 10 = 17

4 4 6 T4 = 17 + 4 = 21

5 7 8 T5 = 21 + 7 = 28

6 12 13 T6 = 28 + 12 = 30

7 3 4 T7 = 30 + 3 = 33

8 14 17 T8 = 33 + 14 = 47

9 11 13 T9 = 47 + 11 = 58

10 6 8 T10 = 58 + 6 = 64

Selection of sample using cumulative total method:


1.First draw: - Draw a random number between 1 and 64.
- Suppose it is 23
- T4 < 23 < T5

- Unit Y is selected and Y5 = 8 enters in the sample .

2. Second draw:
- Draw a random number between 1 and 64
- Suppose it is 38
- T7 < 38 < T8

- Unit 8 is selected and Y8 = 17 enters in the sample


- and so on.
- This procedure is repeated till the sample of required size is obtained.

5
Selection of sample using Lahiri’s Method
In this case
=M =
Max X i 14
i =1,2,...,10

So we need to select a pair of random number (i, j ) such that 1 ≤ i ≤ 10, 1 ≤ j ≤ 14 .


Following table shows the sample obtained by Lahiri’s scheme:
Random no Random no Observation Selection of unit
1 ≤ i ≤ 10 1 ≤ j ≤ 14

3 7 j =7 < X 3 =10 trial accepted ( y3 )

6 13 j =13 > X 6 =12 trial rejected

4 7 j =7 > X 4 =4 trial rejected

2 9 j=9 > X2 =5 trial rejected

9 2 j =2 < X 9 =11 trial accepted ( y9 )

and so on. Here ( y3 , y9 ) are selected into the sample.

Varying probability scheme with replacement: Estimation of population mean


Let
Yi : value of study variable for the ith unit of the population, i = 1, 2,…,N.
X i : known value of auxiliary variable (size) for the ith unit of the population.

Pi : probability of selection of ith unit in the population at any given draw and is proportional to size X i .

Consider the varying probability scheme and with replacement for a sample of size n. Let yr be the

value of rth observation on study variable in the sample and pr be its initial probability of selection.
Define
yr
=zr = , r 1, 2,..., n,
Npr
then
1 n
z= ∑ zi
n i =1

6
2
σ z2 N
 Y 
is an unbiased estimator of population mean Y , variance of z is =
where σ ∑ Pi  i − Y  and
2
z
n i =1  NPi 
s2 1 n
an unbiased estimate of variance of =
z is z ∑ ( zr − z ) 2 .
n n − 1 r =1
Proof:
Note that zr can take any one of the N values out of Z1 , Z 2 ,..., Z N with corresponding initial probabilities

P1 , P2 ,..., PN , respectively. So
N
E ( zr ) = ∑ Z i Pi
i =1

N
Yi
=∑ Pi
i =1 NPi

=Y.
Thus
1 n
E(z ) = ∑ E ( zr )
n i =1
1 n
= ∑Y
n i =1

=Y.

So z is an unbiased estimator of population mean Y .

The variance of z is

1  n 
Var ( z ) = 2 Var  ∑ zr 
n  r =1 
n
1
=
n2
∑Var ( z )
r =1
r ( zr' s are independent in WR case).

Now

( zr ) E [ zr − E ( zr ) ]
Var=
2

2
= E  zr − Y 
N

∑(Z − Y ) Pi
2
= i
i =1

2
 Y  N
= ∑  i − Y  Pi
i =1  NPi 
= σ z2 (say) .

7
Thus
n
1
Var ( z ) =
n2
∑σ
r =1
2
z

σ z2
= .
n

sz2
To show that is an unbiased estimator of variance of z , consider
n
 n 
(n − 1) E ( sz2 ) = E  ∑ ( zr − z ) 2 
 r =1 
 n 2 
= E  ∑ zr − nz 2 
 r =1 
 n 
=  ∑ E ( zr2 ) − nE ( z ) 2 
 r =1 
n
= ∑ Var ( z ) + {E ( z )}  − n Var ( z ) + {E ( z )} 
2 2
r r
r =1

 2

) − n( )  Yi 
∑ (σ
n N
= 2 2 σ z2
+Y 2
using +Y  ) ∑
Var ( z = − Y  P= σ 2

z n r
  i
NP 
i z

r 1 =i 1
 
= (n − 1)σ z2
E ( sz2 ) = σ z2
 s2  σ z2
or E  z=
 = Var ( z )
n n

1  n  yr  
2
 sz2
⇒ Var ( z ) == ∑   − nz 2
.
n n(n − 1)  r =1  Npr  

1
Note: If Pi = , then z = y ,
N
2
 
1 1 N  Yi  σ y2
=
Var ( z ) ∑ =
n N i =1  N . 1
−Y 
n

 N 
which is the same as in the case of SRSWR.

8
Estimation of population total:
An estimate of population total is

1 n  yr 
=Yˆtot = ∑  N z. .
n r =1  pr 

Taking expectation, we get

1 n  Y1 Y2 YN 
=
E (Yˆtot ) ∑  P1 + P2 + ... +
n r =1  P1 P2
PN 
PN 
1 n N 
= ∑  ∑ Yi 
n=r 1 =i 1 
1 n
= ∑ Ytot
n r =1
= Ytot .

Thus Yˆtot is an unbiased estimator of population total. Its variance is

Var (Yˆtot ) = N 2Var ( z )


2
1 N 1 Y 
= N ∑ 2  i − NY  Pi 2

n i =1 N  Pi 
2
1 N  Yi 
= ∑  − Ytot  Pi
n i =1  Pi 
1  N Yi 2 
= ∑ − Ytot2  .
n  i =1 Pi 

An estimate of the variance


2
 2 sz
Var (Ytot ) = N
ˆ .
n

Varying probability scheme without replacement


In varying probability scheme without replacement, when the initial probabilities of selection are
unequal, then the probability of drawing a specified unit of the population at a given draw changes with
the draw. Generally, the sampling WOR provides a more efficient estimator than sampling WR. The
estimators for population mean and variance are more complicated. So this scheme is not commonly
used in practice, especially in large scale sample surveys with small sampling fractions.

9
Let U i : i th unit,

Pi : Probability of selection of U i at the first draw, i = 1, 2,..., N


N

∑ P =1
i =1
i

Pi ( r ) : Probability of selecting U i at the r th draw

Pi (1) = Pi .

Consider
Pi (2) = Probability of selection of U i at 2nd draw.

Such an event can occur in the following possible ways:

U i is selected at 2nd draw when

- U1 is selected at 1st draw and U i is selected at 2nd draw


- U 2 is selected at 1st draw and U i is selected at 2nd draw

- U i -1 is selected at 1st draw and U i is selected at 2nd draw
- U i +1 is selected at 1st draw and U i is selected at 2nd draw

- U N is selected at 1st draw and U i is selected at 2nd draw

So Pi (2) can be expressed as

Pi P Pi Pi Pi
=
Pi (2) P1 + P2 i + ... + Pi −1 + Pi +1 + ... + PN
1 − P1 1 − P2 1 − Pi −1 1 + Pi +1 1 − PN
N
Pi
= ∑
j ( ≠i ) =
1
Pj
1 − Pj
N
Pi P P
= ∑
j ( ≠i ) =
1
Pj
1 − Pj
+ Pi i − Pi i
1 − Pi 1 − Pi
N
Pi Pi
= ∑ P 1− P
j =1
j − Pi
1 − Pi
j

N P P 
= Pi  ∑ j − i 
 j =1 1 − Pj 1 − Pi 

1
Pi (2) ≠ Pi (1) for all i unless Pi = .
N

10
y 
Pi (2) will, in general, be different for each i = 1,2,…, N . So E  i  will change with successive draws.
 pi 
y1
This makes the varying probability scheme WOR more complex. Only will provide an unbiased
Np1
yi
estimator of Y . In general, (i ≠ 1) will not provide an unbiased estimator of Y .
Npi

Ordered estimates
To overcome the difficulty of changing expectation with each draw, associate a new variate with each
draw such that its expectation is equal to the population value of the variate under study. Such
estimators take into account the order of the draw. They are called the ordered estimates. The order of
the value obtained at previous draw will affect the unbiasedness of population mean.

We consider the ordered estimators proposed by Des Raj, first for the case of two draws and then
generalize the result.

Des Raj ordered estimator


Case 1: Case of two draws:
Let y1 and y2 denote the values of units U i (1) and U i (2) drawn at the first and second draws

respectively. Note that any one out of the N units can be the first unit or second unit, so we use the
notations U i (1) and U i (2) instead of U1 and U 2 . Also note that y1 and y2 are not the values of the first two

units in the population. Further, let p1 and p2 denote the initial probabilities of selection of U i(1) and
U i(2) , respectively.

Consider the estimators


y1
z1 =
Np1

1  y2 
=z2  y1 + 
N  p2 / (1 − p1 ) 

1 (1 − p1 ) 
=  y1 + y2 
N p2 
z1 + z2
z= .
2
p2
Note that is the probability P(U i (2) | U i (1) ).
1 − p1
11
Estimation of Population Mean:
First we show that z is an unbiased estimator of Y .
E(z ) = Y .
N
Note that ∑ P = 1.
i =1
i

Consider

1  y1   y1 Y Y Y 
E ( z1 ) = E   Note that can take any one of out of the N values 1 , 2 ,..., N 
N  p1   p1 P1 P2 PN 

1  Y1 Y2 YN 
=  P1 + P2 + ... + PN 
N  P1 P2 PN 

=Y

1  (1 − p1 ) 
=
E ( z2 ) E  y1 + y2 
N  p2 

1   (1 − P1 )  
=  E ( y1 ) + E1  E2  y2 U i (1)   (Using E (Y ) =
E X [ EY (Y | X )].
N    p2  

where E 2 is the conditional expectation after fixing the unit U i (1) selected in the first draw.

y2 Y
Since can take any one of the (N – 1) values (except the value selected in the first draw) j with
p2 Pj

Pj
probability , so
1 − P1

 (1 − P1 )  y  * Y P 
E2  y2 U i (1)  = (1 − P1 )∑ j  j . j  .
(1 − P1 ) E2  2 U i (1)  =
 p2   p2   Pj 1 − P1 
where the summation is taken over all the values of Y except the value y 1 which is selected at the first
draw. So
 (1 − P1 ) 

*
E2  y2 U i (1)=
 j
Y=
j Ytot − y1.
 p2 
Substituting it is E ( z2 ), we have

1
E ( z2=
) [ E ( y1 ) + E1 (Ytot − y1 )]
N
1
= [ E ( y1 ) + E (Ytot − y1 )]
N
1 Ytot
= =
E (Ytot ) = Y.
N N
12
E ( z1 ) + E ( z2 )
Thus E ( z ) =
2
Y +Y
=
2
=Y.

Variance:
The variance of z for the case of two draws is given as

 1 N 2   1 N  Yi  
2 2
1 N 2  Yi 
1 − 2 ∑ Pi   2 N 2 ∑ Pi  P − Ytot   − 4 N 2 ∑ Pi  P − Ytot 
Var ( z ) =
=  i 1=   i 1  i  
= i 1  i 

Proof: Before starting the proof, we note the following property


N N N 
∑ aib j
= ∑ ai  ∑ b j − bi 
=
i≠ j 1 =i 1 = j 1 
which is used in the proof.

The variance of z is

= ( z ) E ( z 2 ) − [ E ( z )]
2
Var
2
 1  y1 y2 (1 − p1 ) 
= E  + y1 +  − Y
2

 2 N  p1 p2 
2
1  y (1 + p1 ) y2 (1 − p1 ) 
= 2
E 1 +  −Y
2

4N  p1 p2 
↓ ↓
nature of nature of
variable variable
depends depends
only on upon1st and
1st draw 2nd draw

1  N  Yi (1 + Pi ) Y j (1 − Pi )  PP 
2

= ∑  + 
i j
 −Y 2
4 N  i ≠ j =1  Pi  1 − Pi 
2
Pj

1  N  Yi 2 (1 + Pi ) 2 PP Y j2 (1 − Pi ) 2 PP (1 − Pi 2 ) PP
i j 

=  ∑  P2 1 − P
4 N 2  i ≠ j =1 
i j
+ 2
i j


+ 2YY
i j

 − Y
2

i i Pj 1 P i PPi j 1 Pi 

1  N  Y 2 (1 + P ) 2 Pj Y j2 (1 − Pi ) 2 Pi 
∑  + + 2YY
i j (1 + Pi )  − Y .
i i 2
=
4N 2  i ≠ j =1  Pi 1 − Pi P j 1 − Pi 

13
Using the property
N N N 
∑ aib j
= ∑ ai  ∑ b j − bi  , we can write
=
i≠ j 1 =i 1 = j 1 
1  N Yi 2 (1 + Pi ) 2  N  N  N Y j2 Yi 2  N N

2 ∑ ∑ j i  ∑ i i ∑  ∑ i i ∑ Y j − Yi )] − Y
Var ( z ) P − P + P (1 − P ) − + 2 Y (1 + P )( 2

= (1 − Pi )  j 1 =
4 N  i 1 Pi=  i1=  j 1 Pj = Pi  i 1 =j 1

1  N Yi 2  N Y j Yi  
N 2 2 N N

2 ∑ ∑ ∑ ∑ i ∑ j
=  (1 + Pi
2
+ 2 Pi ) + Pi (1 − Pi )  −  + 2 Yi (1 + P )( Y − Yi  −Y
) 2

=4 N  i 1 i
P =i 1 =  j 1 j =
P Pi 
 i 1 =j 1 
1  N Yi 2 N 2 N N N Y2 N N N Y2

2 ∑
= + ∑ Yi P + 2∑ Yi + ∑ Pi ∑ − ∑ Yi − ∑ Pi ∑ j
2 j 2 2

4N =  i 1 Pi =i 1 =i 1 =i 1 =j 1 Pj =i 1 =i 1 =j 1 Pj

N N N N N N
+ ∑ PY
i i + 2∑ Yi ∑ Y j − 2∑ Yi Pi + 2∑ Yi Pi ∑ Y j − 2∑ Yi ] − Y
2 2 2 2

i ==
i 1 j 1 =i 1 =i=
1 j 1 =i 1

1  N Yi 2 N 2 N Y j 
2 N N

2  ∑ ∑ i ∑ ∑ tot ∑ Yi Pi  − Y
= 2 − P − Yi
2
+ 2Y 2
tot + 2Y 2

4N = i 1 P=
i i 1 =j 1 P=
j i 1 =i 1 
 1 N  1  N Yi 2 2 2  1 N 2 N
2 2
= 2 1 − ∑ Pi 2  2 ∑
−Ytot + Ytot  − 2 ∑ i
Y − 2Y 2
tot − 2Ytot ∑ Yi Pi + 4 N Y 
=  2 i 1=  4 N  i 1 Pi =  4N  i 1 i =1 
2
 1 N  1 N  Yi  1 N N
= 1 − ∑ Pi 2  2 ∑ Pi  − Ytot  − 2
( ∑ Yi
2
− 2Ytot ∑ Yi Pi − 2Ytot2 + 4Ytot2 )
=  2 i 1=  2 N i 1  Pi = 4 N i 1 =i 1
 1 N  1
+ 1 − ∑ Pi 2  Y2
2 tot
 2 i =1  2 N
2
 1 N  1 N  Yi  1 N N
= 1 − ∑ Pi 2  2 ∑ i
P − Ytot  − 2 ∑ i
( Y 2
− 2Ytot ∑ Yi Pi + 2Ytot − 2Ytot + ∑ Pi Ytot )
2 2 2 2

 2 i =1 =  2 N i 1  Pi = 4 N i 1 =i 1 i
2
 1 N 2  1 N  Yi 
1 2 ∑ Pi  2 N 2 ∑ Pi  P − Ytot  − 4 N 2 ∑ (Yi − 2YtotYi Pi + Pi Ytot )
1 N
=− 2 2 2


= i 1=  i 1  i = i 1
2
1  1 N 2  N  Yi  1 N 2  Yi 
= 2 
1 − ∑ Pi ∑ i 
P
2 N  2 i =1  i 1 =
− Ytot  − 2 ∑ i 
P − Ytot  `
=  Pi  4 N i 1  Pi 
2 2 2
1 N  Yi  1 N 2 N  Yi  1 N 2  Yi 
= ∑ Pi
2 i 1  NPi
− Y  − 2 ∑ i ∑
P − Ytot  − 2 ∑ i 
P − Ytot 
=  4N=i 1 =i 1  Pi  4N =i 1  Pi 
↓ ↓
variance of WR reduction of variance
case for n = 2 in WR with varying
probability

14
Estimation of Var ( z )
=
Var ( z ) E ( z 2 ) − ( E ( z )) 2
= E(z 2 ) − Y 2
Since
E ( z1 z2 ) = E [ z1 E ( z2 | u1 ) ]
= E  z1Y 
= YE ( z1 )
= Y 2.
Consider
E  z 2 − z1 z2  = E ( z 2 ) − E ( z1 z2 )
= E(z 2 ) − Y 2
= Var ( z )
 (z ) =
⇒ Var z 2 − z1 z2 is an unbiased estimator of Var ( z )

Alternative form
 ( z=
Var ) z 2 − z1 z2

z +z 
2

=  1 2  − z1 z2
 2 
( z1 − z2 ) 2
=
4
2
1  y1 y y 1 − p1 
=  − 1− 2 
4  Np1 N N p2 
2
1  y1 y2 (1 − p1 ) 
= (1 − p1 ) − 
4N 2  p1 p2 
2
(1 − p1 ) 2  y1 y2 
=  −  .
4 N 2  p1 p2 

Case 2: General Case


Let (U i (1) , U i (2) ,..., U i ( r ) ,..., U i ( n ) ) be the units selected in the order in which they are drawn in n draws

where U i ( r ) denotes that the ith unit is drawn at the rth draw. Let ( y1 , y2 ,.., yr ,..., yn ) and

( p1 , p2 ,..., pr ,..., pn ) be the values of study variable and corresponding initial probabilities of selection,

respectively. Further, let Pi (1) , Pi (2) ,..., Pi ( r ) ,..., Pi ( n ) be the initial probabilities of

U i (1) , U i (2) ,..., U i ( r ) ,..., U i ( n ) , respectively.


15
Further, let
y1
z1 =
Np1

1  yr 
=
zr  y1 + y2 + ... + yr −1 + (1 − p1 − ... − pr −1 )  for=
r 2,3,..., n.
N  pr 

1 n
Consider z = ∑ zr as an estimator of population mean Y .
n r =1

We already have shown in case 1 that E ( z1 ) = Y .

Now we consider E ( zr ), r = 2,3,..., n. We can write

1
E ( zr ) = E1 E2  zr U i (1) , U i (2) ,..., U i ( r −1) 
N
where E 2 is the conditional expectation after fixing the units U i (1) , U i (2) ,..., U i ( r −1) drawn in the first (r -

1) draws.
Consider
y  y 
E  r (1 − p − ... −=
p )  E E  r (1 − p − ... − p ) U ,U ,...,U 
p 1 r −1  1 2p 1 r − 1 i(1) i(2) i(r − 1) 
 r   r 
 y 
= E (1 − P − P ... − P ) E  r U ,U ,...,U  .
1 i(1) i(2) i(r − 1) 2  p i(1) i(2) i(r − 1)  
  r 
y Y
r j
Since conditionally can take any one of the ( N - r -1) values , j = 1, 2,..., N with probabilities
p P
r j
P
j
, so
1 − P − P ... − P
i(1) i(2) i(r − 1)

y   N Yj P 
j
E  r (1 − p − ... − p )  E (1 − P − P ... − P
= ) ∑ * . 
p 1 r −1  1 i(1) i(2) i(r − 1) P (1 − P − P ... − P )
 r   j = 1 j i(1) i(2) i(r − 1) 
 N 
= E  ∑ *Y 
1 j
 j =1 
N *
where ∑ denotes that the summation is taken over all the values of y except the y values selected in the first (r -1) draws
j =1
N
like as ∑ , i.e., except the values y , y ,..., y
1 2 r −1
which are selected in the first (r -1) draws.
j=
1(≠ i(1), i(2),..., i(r − 1))

16
Thus now we can express

1  y 
=
E ( zr ) E1E2  y1 + y2 + ... + yr −1 + r (1 − p1 − ... − pr −1 ) 
N  pr 
1  N 
= E1 Yi (1) + Yi (2) + ... + Yi ( r −1) + ∑ *Y j 
N  j =1 

1  N 
= E1 Yi (1) + Yi (2) + ... + Yi ( r −1) +
N  ∑ Yj 
j=
1( ≠i (1),i (2),...,i ( r −1)) 

=
1 
{ (
E1 Yi (1) + Yi (2) + ... + Yi ( r −1) + Ytot − Yi (1) + Yi (2) + ... + Yi ( r −1) 
N 
)}

1
= E Y 
N 1  tot 
Y
= tot
N
= Y=
for all r 1, 2,..., n.

Then
1 n
E(z ) = ∑ E ( zr )
n r =1
1 n
= ∑Y
n r =1
=Y.
Thus z is an unbiased estimator of population mean Y .
The expression for variance of z in general case is complex but its estimate is simple.
Estimate of variance:
=
Var (z ) E(z 2 ) − Y 2 .
Consider for r < s,

E ( zr zs ) = E [ zr E ( zs | U1 , U 2 ,..., U s −1 ) ]

= E  zrY 

= YE ( zr )
=Y2

because for r < s, zr will not contribute

and similarly for s < r , zs will not contribute in the expectation.

17
Further, for s < r ,

E ( zr zs ) = E [ zs E ( zr | U1 , U 2 ,..., U r −1 ) ]

= E  zsY 

= YE ( zs )
= Y 2.
Consider
 1 n n  1 n n
E

∑ ∑ z r s
z =

∑ ∑ E ( zr z s )
 n ( n 1) r ( ≠ s ) = 1 s = 1  n ( n 1) r ( ≠ s ) = 1 s = 1

1
= n(n − 1)Y 2
n(n − 1)
= Y 2.
Substituting Y 2 in Var ( z ), we get

Var ( z ) = E ( z 2 ) − Y 2
 1 n n 
= E(z 2 ) − E  ∑ ∑ E ( zr z s ) 
 n(n − 1) r ( ≠ s ) = 1 s = 1 
n n
 (z ) = 1
⇒ Var z2 − ∑ ∑ zr z s .
n(n − 1) r ( ≠ s ) = 1 s = 1
2
 n  n n n
Using  ∑ = zr  ∑ zr2 + ∑ ∑z z r s
 r= 1  r= 1 r ( ≠ s )= 1 s = 1
n n n
⇒ ∑ ∑ zr z s =
r ( ≠ s )= 1 s = 1
n 2 z 2 − ∑ zr2 ,
r= 1

 ( z ) can be further simplified as


The expression of Var

 (z ) = 1  2 2 n 2
Var z2 − n z − ∑ zr 
n(n − 1)  r =1 
1  n 2 
=  ∑
n(n − 1)  r =1
zr − nz 2 

n
1
= ∑ ( zr − z ) 2 .
n(n − 1) r =1

18
Unordered estimator:
In ordered estimator, the order in which the units are drawn is considered. Corresponding to any ordered
estimator, there exist an unordered estimator which does not depend on the order in which the units are
drawn and has smaller variance than the ordered estimator.
N
In case of sampling WOR from a population of size N , there are   unordered sample(s) of size n .
n
Corresponding to any unordered sample(s) of size n units, there are n ! ordered samples.
For example, for n = 2 if the units are u1 and u2 , then

- there are 2! ordered samples - (u1 , u2 ) and (u2 , u1 )

- there is one unordered sample (u1 , u2 ) .

Moreover,
 Probability of unordered   Probability of ordered   Probability of ordered 
 =   + 
 sample (u1 , u2 )   sample (u1 , u 2 )   sample (u2 , u 1 ) 
For n = 3, there are three units u1 , u2 , u3 and
-there are following 3! = 6 ordered samples:
(u1 , u2 , u3 ), (u1 , u3 , u2 ), (u2 , u1 , u3 ), (u2 , u3 , u1 ), (u3 , u1 , u2 ), (u3 , u2 , u1 )

- there is one unordered sample (u1 , u2 , u3 ).


Moreover,
Probability of unordered sample   
= Sum of probability of ordered sample, i.e.
P(u1 , u2 , u3 ) + P(u1 , u3 , u2 ) + P(u2 , u1 , u3 ) + P(u2 , u3 , u1 ) + P(u3 , u1 , u2 ) + P(u3 , u2 , u1 ),

N
Let=
zsi , s 1, 2,..,= 2,..., n !( M ) be an estimator of population parameter θ based on ordered
  , i 1,=
n
sample si . Consider a scheme of selection in which the probability of selecting the ordered sample

( si ) is psi . The probability of getting the unordered sample(s) is the sum of the probabilities, i.e.,
M
ps = ∑ psi .
i =1

For a population of size N with units denoted as 1, 2,…, N , the samples of size n are n − tuples. In the
nth draw, the sample space will consist of N ( N − 1)...( N − n + 1) unordered sample points.

19
1
psio [selection of any ordered sample]
P=
N ( N − 1)...( N − n + 1)
n! selection of any 
psiu P [selection of any
= unordered sample ] = n!P  
N ( N − 1)...( N − n + 1) ordered sample 
M ( = n!)
n !( N − n)! 1
then=ps ∑ psio
=
i =1
=
N! N
.
 
n

N
θˆ0 z=
Theorem := si , s 1, 2,...,  =
 ; i 1, 2,...,=
M ( n !)
n
M
and θˆu = ∑ zsi psi'
i =1

are the ordered and unordered estimators of θ , then

(i) E (θˆu ) = E (θˆ0 )

(ii) Var (θˆu ) ≤ Var (θˆ0 )

where zsi is a function of si th ordered sample (hence a random variable) and psi is the probability of

psi
selection of si th ordered sample and ps' i = .
ps

N
Proof: Total number of ordered sample = n ! 
n
N
 
n M
(i ) E (θˆ0 ) = ∑∑ zsi psi
=s 1 =i 1
N
 
n
M 
E (θˆu ) = ∑  ∑ zsi psi'  ps
=s 1 =i1 
 p 
= ∑  ∑ zsi si  ps
s  i ps 
= ∑∑ zsi psi
s i

= E (θˆ0 )

N
(ii) Since θˆ0 = zsi , so θˆ02 = zsi2 with probability
= psi , i 1,=
2,..., M , s 1, 2,...,   .
n
2
M M
 
Similarly, θˆu
= '
si si
=i 1 =i 1
2
u ∑=
z p , so θˆ  ∑ zsi psi'  with probability ps
 

20
Consider
2
(θˆ0 ) E (θˆ02 ) −  E (θˆ0 ) 
Var=

∑∑ z
2
= 2
si psi −  E (θˆ0 ) 
s i
2
(θˆu ) E (θˆu2 ) −  E (θˆu ) 
Var=
2
 
= ∑  ∑ zsi psi'  ps −  E (θˆ0 ) 
2

s  i 
2
 
Var (θˆ0 ) − Var (θˆ=
u) ∑∑
s i
zsi2 psi − ∑  ∑ zsi psi'  ps
s  i 
2
 
= ∑∑ z psi + ∑  ∑ zsi psi′  ps 2
si
s i s  i 
  
− 2∑  ∑ zsi psi'   ∑ zsi psi′  ps
s  i  i 
 
2
'       
=∑s ∑i si si  ∑i si si   ∑i psi  − 2  ∑i zsi psi'   ∑i zsi psi  ps 
z 2
p + z p
 
  
2

'   
=∑ ∑  zsi psi +  ∑ zsi psi  psi − 2  ∑ zsi psi'  zsi psi 
 2

s  i   i   i  
 
 
= ∑∑
s i 
( zsi − ∑ zsi psi ) psi  ≥ 0
i
' 2


⇒ Var (θ 0 ) − Var (θu ) ≥ 0
ˆ ˆ

or Var (θˆ ) ≤ Var (θˆ )


u 0

Estimate of Var (θˆu )


Since
 
Var (θˆ0 ) − Var (θˆu )=
s i
∑∑ ( z − ∑ z
i
psi' ) 2 psi 

si si

 (θˆ ) = (θˆ ) −  
Var u Var 0 ∑∑
s i 
( zsi − ∑ zsi psi ) psi 
i
' 2


 (θˆ ) − p ' ( 
= p ' Var ∑
i
z − z p ' )2 .
si 0 ∑i
si si ∑ i
si si

Based on this result, now we use the ordered estimators to construct an unordered estimator. It follows
from this theorem that the unordered estimator will be more efficient than the corresponding ordered
estimators.

21
Murthy’s unordered estimator corresponding to Des Raj’s ordered estimator for the
sample size 2
Suppose yi and y j are the values of units U i and U j selected in the first and second draws respectively

with varying probability and WOR in a sample of size 2 and let pi and p j be the corresponding initial

probabilities of selection. So now we have two ordered estimates corresponding to the ordered samples
s1* and s2* as follows

s1* = ( yi , y j ) with (U i , U j )
s2* = ( y j , yi ) with (U j ,U i )

which are given as

1  yi yj 
z (=
s1* ) (1 + pi ) + (1 − pi ) 
2 N  pi p j 

where the corresponding Des Raj estimator is given by

1  yi y j (1 − pi ) 
 yi + + 
2 N  pi pj 
and

1  yj yi 
z (=
s2* ) (1 + p j ) + (1 − p j ) 
2 N  pj pi 

where the corresponding Des Raj estimator is given by

1  y j yi (1 − p j ) 
yj + + .
2 N  pj pi 

The probabilities corresponding to z ( s1* ) and z ( s2* ) are

pi p j
p ( s1* ) =
1 − pi
p j pi
p ( s2* ) =
1− p j

=
p ( s ) p ( s1* ) + p ( s2* )
pi p j (2 − pi − p j )
=
(1 − pi )(1 − p j )

22
1− p j
p '( s1* ) =
2 − pi − p j
1 − pi
p '( s2* ) = .
2 − pi − p j

Murthy’s unordered estimate z (u ) corresponding to the Des Raj’s ordered estimate is given as

=z (u ) z ( s1* ) p '( s1 ) + z ( s2* ) p '( s2 )


z ( s1* ) p ( s1* ) + z ( s2* ) p ( s2* )
=
p ( s1* ) + p ( s2* )
 1  yi y j   pi p j    1  yj yi 
 p j pi 
  (1 + p ) + (1 − p )   +  (1 + p j ) + (1 − p j )    
p j   1 − pi    2 N  pi  1 − p j
i i
 2N  pi pj  
=
pi p j p p
+ j i
1 − pi 1 − p j

1   y y j   yj y  
 (1 + pi ) i + (1 − pi )  (1 − p j ) + (1 + p j ) + (1 − p j ) i  (1 − pi ) 
=
2N   pi p j   pi pi  
(1 − p j ) + (1 − pi )

1  
(1 − p j ) {(1 + pi ) + (1 − pi )} + (1 − pi ) {(1 − p j ) + (1 + p j }
yi yj
2 N  pi pj 
=
2 − pi − p j

yi y
(1 − p j ) + (1 − pi ) j
pi pj
= .
N (2 − pi − p j )

Unbiasedness:
Note that yi and pi can take any one of the values out of Y1 , Y2 ,..., YN and P1 , P2 ,..., PN ,

respectively. Then y j and p j can take any one of the remaining values out of Y1 , Y2 ,..., YN and

P1 , P2 ,..., PN , respectively, i.e., all the values except the values taken at the first draw. Now

23
  Y Y j   PP PP 
 (1 − Pj ) i + (1 − Pi )   + i j 
i j

  Pi Pj  1 − Pi 1 − Pj 
E [ z (u ) ] = ∑ 
1
N i< j 2 − Pi − Pj

  Y Y j   PP P P 
 (1 − Pj ) i + (1 − Pi )   + j i 
i j

  Pi Pj  1 − Pi 1 − Pj 
2∑ 
1
=
2 N i< j 2 − Pi − Pj

  Y Y j   PP P P 
 (1 − Pj ) i + (1 − Pi )   + j i 
i j

1   Pi Pj  1 − Pi 1 − Pj 
= ∑
2 N i≠ j 2 − Pi − Pj

1   Yi Y j   PP 
= ∑ 
i≠ j 
 (1 − Pj ) + (1 − Pi ) 

i j



2N  Pi P j 
  (1 Pi )(1 Pj ) 

1  Yi Pj Y j Pi 
= ∑ 1 − P + 1 − P 
2N i≠ j  i j 
N N
N 
=
Using result ∑ aib j ∑ ai  ∑ b j − bi , we have
=
i≠ j 1 =i 1 = j 1 

1   N Yi N   N Y N 
E [ z (u ) ]
=  ∑ (∑ Pj − Pi )  + ∑ j (∑ Pi −Pj ) 
1 − Pi j 1
2 N   i 1 =
= = 1 − Pj i 1
  j 1 = 

1   N Yi  N Y 
=  ∑ (1 − Pi )  + ∑ j (1 − Pj ) 
= 1 − Pi
2 N   i 1 =  j 1 1 − Pj 

1 N N 
= ∑ Yi + ∑ Y j 
N  i 1 =j 1 
2=

Y +Y
=
2
=Y.

24
Variance: The variance of z (u ) can be found as
2
1 N (1 − Pi − Pj )(1 − Pi )(1 − Pj )  Yi Y j  PP
i j (2 − Pi − Pj )
Var [ z (u ) ] ∑  − 
2 i ≠ j =1 N (2 − Pi − Pj )
2
 Pi Pj  (1 − Pi )(1 − Pj )
2
1 N PP 
i j (1 − Pi − Pj ) Yi Yj 
∑  − 
2 i ≠ j =1 N (2 − Pi − Pj )  Pi Pj 
2

Using the theorem that Var (θˆu ) ≤ Var (θˆ0 ) we get

Var [ z (u ) ] ≤ Var  z ( s1* ) 


and Var [ z (u ) ] ≤ Var  z ( s2* ) 

Unbiased estimator of V [ z (u )]

An unbiased estimator of Var ( z | u ) is


2
 [ z (u ) ] (1 − pi − p j )(1 − pi )(1 − p j )  yi y j 
−  .

Var
N 2 (2 − pi − p j ) 2 
 pi p j 

Horvitz Thompson (HT) estimate


The unordered estimates have limited applicability as they lack simplicity and the expressions for the
estimators and their variance becomes unmanageable when sample size is even moderately large. The
HT estimate is simpler than other estimators. Let N be the population size and yi , (i = 1, 2,..., N ) be the

value of characteristic under study and a sample of size n is drawn by WOR using arbitrary probability
of selection at each draw.

Thus prior to each succeeding draw, there is defined a new probability distribution for the units available
at that draw. The probability distribution at each draw may or may not depend upon the initial
probability at the first draw.

Define a random variable α i (i = 1, 2,.., N ) as

1 if Yi is included in a sample ' s ' of size n


αi = 
0 otherwise.

25
nyi
=
Let zi = , i 1...N assuming E (α i ) > 0 for all i
NE (α i )
where
E (α=
i) 1.P(Yi ∈ s ) + 0.P (Yi ∉ s )
= πi
is the probability of including the unit i in the sample and is called as inclusion probability.

The HT estimator of Y based on y1 , y2 ,..., yn is

1 n
zn Yˆ=
= HT ∑ zi
n i =1
1 N
= ∑ α i zi .
n i =1

Unbiasedness
1 N
E (YˆHT ) = ∑ E ( ziα i )
n i =1
1 N
= ∑ zi E (α i )
n i =1
1 N nyi
= ∑ E (α i )
n i =1 NE (α i )
1 N nyi
= = ∑
n i =1 N
Y

which shows that HT estimator is an unbiased estimator of population mean.

Variance
V (YˆHT ) = V ( zn )
= E ( zn2) − [ E ( zn ) ]
2

= E ( zn2) − Y 2 .
Consider
2
1 N 
E ( z ) = 2 E  ∑ α i zi 
n
2

n  i =1 
1  N N N 
= 2 E  ∑ α i2 zi2 + ∑ ∑ α iα j zi z j 
n  i= 1 i ( ≠ j )= 1 j = 1 
1  N N N 
= 2  ∑ zi2 E (α i2 ) + ∑ ∑ zi z j E (α iα j )  .
n  i= 1 i ( ≠ j )= 1 j = 1 

26
If S = {s} is the set of all possible samples and π i is probability of selection of ith unit in the sample s

then
E (α=
i) 1 P( yi ∈ s ) + 0.P( yi ∉ s )
= 1.π i + 0.(1 − π i ) = π i
E (α=) 12. P( yi ∈ s ) + 02.P( yi ∉ s )
i
2

= πi.
So
E (α i ) = E (α i2 )
 N N N

1 
=
E(z )
n 
n
2
∑ z π + ∑ ∑ π ij zi z j 
2
=i 1 =
2
i i

 
i (# j ) i 1

where π ij is the probability of inclusion of ith and jth unit in the sample. This is called as second order

inclusion probability.
Now

Y 2 = [ E ( zn ) ]
2

2
1   N 
= 2  E  ∑ α i zi  
n   i =1 

1 N 2 2
N N

∑ z [ E (α ) ] + ∑ ∑ zi z j E (α i ) E (α j )
n 2  i = 1 
i i
 i ( ≠ j )= 1 j = 1

1 N 2 2 N N 
2 ∑ i i ∑ ∑
= z π + π iπ j zi z j  .
n  i= 1 i ( ≠ j )= 1 j = 1 

Thus

1 N N N 
Var (YˆHT )
= 2  ∑
n  i= 1
π i zi
2
+ ∑ ∑ π ij zi z j 
i ( ≠ j )= 1 j = 1 
1 N N N 
− 2  ∑ π i2 zi2 + ∑ ∑ π iπ j zi z j 
n  i= 1 i ( ≠ j )= 1 j = 1 
1  N N N 
= 2  ∑ π i (1 − π i ) zi2 + ∑ ∑ (π ij − π iπ i ) zi z j 
n  i= 1 i ( ≠ j )= 1 j = 1 
1 N n yi2 2 N N n 2 yi y j 
= 2  ∑ π i (1 − π i ) 2 2 + ∑ ∑ (π ij − π iπ i ) 2 
n  i =1 N π i i ( ≠ j )= 1 j = 1 N π iπ j 

1  N  1− π  2 N N  π −π π  
∑  + ∑ ∑   yi y j 
ij i i
 i
i
y
N2  i = 1  π i  i ( ≠ j )= 1 j = 1  π iπ j  
27
Estimate of variance
1  n y 2 (1 − π ) n n  π −π π  yi y j 
 ∑
= =
Vˆ1 Var (YˆHT )
N2
i

πi2
i
+ ∑ ∑ 
ij

πi j
i j

π π
.
 i = 1 i ( ≠ j )= 1 j = 1   i j 
This is an unbiased estimator of variance .

yi
Drawback: It does not reduces to zero when all are same, i.e., when yi ∝ π i .
πi
Consequently, this may assume negative values for some samples.
A more elegant expression for the variance of yˆ HT has been obtained by Yates and Grundy.

Yates and Grundy form of variance

Since there are exactly n values of α i which are 1 and ( N − n) values which are zero, so
N

∑α
i =1
i = n.

Taking expectation on both sides


N

∑ E (α ) = n.
i =1
i

Also
2
 N  N N N

∑αi 
E =
 i= 1 
∑ E (α i2 ) +
i= 1
∑ ∑ E (α α
i ( ≠ j )= 1 j = 1
i j )

N N N

∑ E (α i ) + ∑ ∑ E (α iα J ) (using E (α i ) =
E (n) = E (α i2 ))
2

i= 1 i ( ≠ j )= 1 j = 1
N N
n 2= n + ∑ ∑ E (α α
i ( ≠ j )= 1 j = 1
i J )

N N

∑ ∑ E (α α=)
i ( ≠ j )= 1 j = 1
i J n(n − 1)

Thus E (α iα
= j) (α i 1,=
P= α j 1)
α i 1) P(=
= P(= α j 1=
α i 1)
(α i ) E (α j α i 1)
= E=

28
Therefore
N

∑  E (α i α j ) − E (α i ) E (α j ) 
j ( ≠i ) =
1
N
= ∑  E (α i ) E (α j | α i= 1) − E (α i ) E (α j ) 
j ( ≠i ) =
1
N
= E (α i ) ∑
j ( ≠i ) =
1
 E (α j | α i= 1) − E (α j ) 

= E (α i ) [ (n − 1) − (n − E (α i ) ]
− E (α i ) [1 − E (α i ) ]
=
−π i (1 − π i )
= (1)
Similarly
N


i ( ≠ j )=
1
 E (α i α j ) − E (α i ) E (α j )  =
−π j (1 − π j ). (2)

We had earlier derived the variance of HT estimator as

1 N N N 
ˆ )
Var (Y=HT  ∑
n2  i= 1
π i (1 − π i ) zi
2
+ ∑ ∑ (π ij − π i π j ) zi z j 
i ( ≠ j )= 1 j = 1 
Using (1) and (2) in this expression, we get

1 N N N N 
2 ∑ i ∑ ∑ ∑
YˆHT )
Var (= π (1 − π i ) zi
2
+ π j (1 − π j ) z 2
j − 2 (π iπ j − π ij ) z i z j 
2n  i= 1 j= 1 i ≠ j= 1 j= 1 
1  N  N 
2  ∑ ∑
= − E (α iα j ) − E (α i ) E (α j )  zi2
2n  =i 1  j (=
≠i ) 1 
N  N  
− ∑  ∑ E (α iα j ) − E (α i ) E (α j )  z 2j − 2 ∑ ∑ { E (α i ) E (α j ) − E (α iα j )} zi z j 
N n

=j 1 i ( ≠ =
j) 1  i ( ≠ j )= 1 j = 1 

1  N N N N N N 
=  ∑ ∑ −π + π π + ∑ ∑ − π + π π + ∑ ∑ (π ij − π iπ i ) zi z j 
2 2
( ij i i ) zi ( ij i i ) z j 2
2n 2  i ( ≠ j ) = 1 j = 1 i ( ≠ j )= 1 j = 1 i ( ≠ j )= 1 j = 1 
1  N N 
2  ∑ ∑
(π iπ j − π ij )( zi2 + z 2j − 2 zi z j )  .
2n  i ( ≠ j ) = 1 j = 1 

The expression for π i and π ij can be written for any given sample size.

For example, for n = 2 , assume that at the second draw, the probability of selecting a unit from the units
available is proportional to the probability of selecting it at the first draw. Since

29
E (α i ) = Probability of selecting Yi in a sample of two

= Pi1 + Pi 2

where Pir is the probability of selecting Yi at r th draw (r = 1, 2). If Pi is the probability of selecting the

ith unit at first draw (i = 1, 2,..., N ) then we had earlier derived that
Pi1 = Pi
 yi is not selected   yi is selected at 2nd draw| 
Pi 2 = P  st P 
  yi is not selected at 1 draw 
st
at 1 draw
N Pj Pi
= ∑
1 1 − Pj
j ( ≠i ) =

N P P 
=  ∑ j − i  Pi .
 j =1 1 − Pj 1 − Pi 
So
 N Pj P 
= E (α i ) Pi  ∑ − i 
 j =1 1 − Pj 1 − Pi 
Again
E (α iα j ) = Probability of including both yi and y j in a sample of size two
= Pi1 Pj 2|i + Pj1 Pi 2| j
Pj Pi
= Pi + Pj
1 − Pi 1 − Pj
 1 1 
i j 
=PP + .
1 − Pi 1 − Pj 

Estimate of Variance
The estimate of variance is given by

 (Yˆ ) 1 n n π iπ j − π ij
=Var HT
2n 2
∑∑
i(≠ j ) j=
1 π
( zi −z j ) 2 .
ij

30
Midzuno system of sampling:
Under this system of selection of probabilities, the unit in the first draw is selected with unequal
probabilities of selection (i.e., pps) and remaining all the units are selected with SRSWOR at all
subsequent draws.

Under this system


E (α i=
) π=
i P (unit i (U i ) is included in the sample)

= P (U i is included in 1st draw) + P (U i   is included in any other draw )

 Probability that U i  is not selected at the first draw and 


= 
Pi +    
 is selected at any of subsequent  ( n -1) draws 
 
n −1
= Pi + (1 − Pi )
N −1
N −n n −1
= Pi + .
N −1 N −1
Similarly,
E (α iα j ) = Probability that both the units U i and U j are in the sample

 Probability that  U i is selected at the first draw and  


=  
 U is selected at any of   the subsequent  draws ( n − 1)  draws  
 j 
 Probability that  U j is selected at the first draw and  
+  
 U is selected at any of   the subsequent ( n − 1)  draws  
 i 
 Probability that neither  U i nor U j is selected at the first draw but 
+   
 both of   them  are selected during  the  subsequent  ( n − 1)  draws  
 
n −1 n −1 (n − 1)(n − 2)
= Pi + Pj + (1 − Pi − Pj )
N −1 N −1 ( N − 1)( N − 2)

(n − 1)  N − n n−2 
=  ( Pi + Pj ) +
( N − 1)  N − 2 N − 2 

n −1  N − n n−2 
=π ij  ( Pi + Pj ) + .
N −1  N − 2 N − 2 
Similarly,
E (α iα jα=
k) π=
ijk Probability of including  U i , U j and U k in the sample
(n − 1)(n − 2)  N − n n−3 
 ( Pi + Pj + Pk ) + .
( N − 1)( N − 2)  N − 3 N − 3 

31
By an extension of this argument, if U i , U j ,..., U r are the r units in the sample of size n(r < n), the

probability of including these r units in the sample is


(n − 1)(n − 2)...(n − r + 1)  N − n n−r 
E (α iα j ...α r =
) π ij ...=  ( Pi + Pj + ... + Pr ) +
( N − 1)( N − 2)...( N − r + 1)  N − r N − r 
r

Similarly, if U1 , U 2 ,..., U q be the n units, the probability of including these units in the sample is

(n − 1)(n − 2)...1
E (α iα j ...α q=
) π ij ...= ( Pi + Pj + ... + Pq )
( N − 1)( N − 2)...( N − n + 1)
q

1
= ( Pi + Pj + ... + Pq )
 N − 1
 
 n −1 
which is obtained by substituting r = n .

Thus if Pi ' s are proportional to some measure of size of units in the population then the probability of
selecting a specified sample is proportional to the total measure of the size of units included in the
sample.
Substituting these π i , π ij , π ijk etc. in the HT estimator, we can obtain the estimator of population’s mean

and variance. In particular, an unbiased estimate of variance of HT estimator given by

 (Yˆ ) 1 n n π iπ j − π ij
=Var HT
2n 2
∑∑
i ≠ j= 1 j= 1 π ij
( zi − z j ) 2

where
N −n  n −1 
π=
iπ j − π ij ( N − n) PP
i j + (1 − Pi − Pj )  .
( N − 1) 2 N −2 

The main advantage of this method of sampling is that it is possible to compute a set of revised
probabilities of selection such that the inclusion probabilities resulting from the revised probabilities are
proportional to the initial probabilities of selection. It is desirable to do so since the initial probabilities
can be chosen proportional to some measure of size.

32
Chapter 8
Double Sampling (Two Phase Sampling)

The ratio and regression methods of estimation require the knowledge of population mean of auxiliary
variable ( X ) to estimate the population mean of study variable (Y ). If information on the auxiliary
variable is not available, then there are two options – one option is to collect a sample only on study
variable and use sample mean as an estimator of population mean.

An alternative solution is to use a part of the budget for collecting information on auxiliary variable to
collect a large preliminary sample in which xi alone is measured. The purpose of this sampling is to

furnish a good estimate of X . This method is appropriate when the information about xi is on file

cards that have not been tabulated. After collecting a large preliminary sample of size n ' units from
the population, select a smaller sample of size n from it and collect the information on y . These two

estimates are then used to obtain an estimator of population mean Y . This procedure of selecting a
large sample for collecting information on auxiliary variable x and then selecting a sub-sample from
it for collecting the information on the study variable y is called double sampling or two phase
sampling. It is useful when it is considerably cheaper and quicker to collect data on x than y and
there is high correlation between x and y.

In this sampling, the randomization is done twice. First a random sample of size n ' is drawn from a
population of size N and then again a random sample of size n is drawn from the first sample of size
n'.

So the sample mean in this sampling is a function of the two phases of sampling. If SRSWOR is
utilized to draw the samples at both the phases, then
- number of possible samples at the first phase when a sample of size n is drawn from a
N
population of size N is    M 0 , say.
 n '
- number of possible samples at the second phase where a sample of size n is drawn from the
 n '
first phase sample of size n ' is    M 1 , say.
n
1
Population of X (N units)

Sample
(Large) M 0 samples
n ' units

Subsample
(small) M 1 samples
n units

Then the sample mean is a function of two variables. If  is the statistic calculated at the second
phase such that  ij , i  1, 2,..., M 0 , j  1, 2,..., M 1 with Pij being the probability that ith sample is chosen

at first phase and jth sample is chosen at second phase, then


E ( )  E1  E2 ( )

where E2 ( ) denotes the expectation over second phase and E1 denotes the expectation over the first
phase. Thus
M 0 M1
E ( )   Pij ij
i 1 j 1
M 0 M1
  PP
i j / i  ij (using P( A  B)  P( A) P( B / A))
i 1 j 1
M0 M1
  Pi  Pj / i  ij

i 1 j 1
  
st
1 stage 2nd stage

2
Variance of 
Var ( )  E   E ( ) 
2

 E  (  E2 ( ))  ( E2 ( )  E ( )) 
2

 E   E2 ( )   E2 ( )  E ( )   0
2 2

2
 E1 E2   E2 ( )]2  [ E2 ( )  E ( ) 

 E1 E2   E2 ( )  E1 E2  E2 ( )  E ( ) 
2 2


constant for E2
 E1 V2 ( )   E1  E2 ( )  E1 ( E2 ( ))
2

 E1 V2 ( )   V1  E2 ( ) 

Note: The two phase sampling can be extended to more than two phases depending upon the need and
objective of the experiment. Various expectations can also be extended on the similar lines .

Double sampling in ratio method of estimation


If the population mean X is not known then double sampling technique is applied. Take a large
initial sample of size n ' by SRSWOR to estimate the population mean X as
1 n'
Xˆ  x '   xi .
n ' i 1
Then a second sample is a subsample of size n selected from the initial sample by SRSWOR. Let
y and x be the means of y and x based on the subsample. Then E ( x ')  X , E ( x )  X , E ( y )  Y .

The ratio estimator under double sampling now becomes

YˆRd  x ' .
y
x

The exact expressions for the bias and mean squared error of YˆRd are difficult to derive. So we find
their approximate expressions using the same approach mentioned while describing the ratio method
of estimation.

3
Let
y Y xX x ' X
0  , 1  , 2 
Y X X
E ( 0 )  E (1 )  E ( 2 )  0
1 1 
E (12 )     Cx2
n N 
1
E (1 2 )  2 E ( x  X )( x ' X )
X
1
 2 E1  E2 ( x  X )( x ' X ) | n '
X
1
 2 E1 ( x ' X ) 2 
X
2
1 1S
    x2
 n' N  X
1 1
    Cx2
 n' N 
 E ( 22 ).

1
E ( 0 2 )  Cov( y , x ')
XY
1 1
 Cov  E ( y | n '), E ( x ' | n ')   E Cov( y , x ') | n '
XY XY
1 1
 Cov Y , X   E Cov( y ', x ') 
XY XY
1
 Cov  ( y ', x '
XY
1 1S
    xy
 n ' N  XY
 1 1  S Sy
   x
 n' N  X Y
1 1
     CxC y
 n' N 
where y ' is the sample mean of y ' s based on the sample size n '.

4
1
E ( 01 )  Cov( y , x )
xy
1 1  S
    xy
n N  XY
1 1  S S
   x y
n N  X Y
1 1 
     CxC y
n N 

1
E ( 02 )  Var ( y )
Y2
1
 2 V1  E2 ( y | n ')  E1 V2 ( yn | n ')
Y
1   1 1   
 2 V1 ( yn' )  E1    s '2y  
Y   n n '   
1  1 1  2  1 1  2 
    Sy     Sy 
Y 2  n ' N   n n' 
2
1 1 S
    y2
 n N Y
1 1 
    C y2
n N 
where s '2y is the mean sum of squares of y based on initial sample of size n '.

1
E (1 2 )  Cov( x , x ')
X2
1
 2 Cov  E ( x | n '), E ( x ' | n ')  0 
X
1
 2 Var ( X ')
X
where Var ( X ') is the variance of mean of x based on initial sample of size n ' .

5
Estimation error of YˆRd

Write YˆRd as

(1   0 )
YˆRd 
Y
(1   2 ) X
(1  1 ) X
 Y (1   0 )(1   2 )(1  1 ) 1
 Y (1   0 )(1   2 )(1  1  12  ...)
 Y (1   0   2   0 2  1   o1  1 2  12 )
upto the terms of order two. Other terms of degree greater than two are assumed to be negligible.
Bias of YRd

E (YˆRd )  Y 1  0  0  E ( 0 2 )  0  E ( 01 )  E (1 2 )  E (12 ) 

Bias (Yˆ )  E (Yˆ )  Y


Rd Rd

 Y  E ( 0 2 )  E ( 01 )  E (1 2 )  E (12 ) 


 1 1  1 1  1 1 1 1  
 Y     CxC y      CxC y     Cx2     Cx2 
 n ' N  n N   n' N  n N  
1 1 
 Y     Cx2   CxC y 
 n n'
1 1 
 Y    Cx (Cx   C y ).
 n n'
The bias is negligible if n is large and relative bias vanishes if C x2  C xy , i.e., the regression line
passes through origin.
MSE of Yˆ : Rd

MSE (YˆRd )  E (YˆRd  Y ) 2`


 Y 2 E ( 0   2  1 ) 2 (retaining the terms upto order two)

 Y 2 E  02  12   22  2 0 2  2 01  21 2 

 Y 2 E  02  12   22  2 0 2  2 01  2 22 

 1 1  1 1  1 1 1 1 1 1  
 Y 2    C y2     Cx2     C x2  2     Cx C y  2     C x C y 
 n N  n N   n' N   n' N  n N  
1 1  1 1
 Y 2     C x2  C y2  2  Cx C y   Y 2    Cx (2  C y  C x )
n N   n' N 
 1 1
 MSE (ratio estimator)  Y 2     2  Cx C y  Cx2  .
 n' n 
6
The second term is the contribution of second phase of sampling. This method is preferred over ratio
method if
2  CxC y  Cx2  0
1 Cx
or  
2 Cy

Choice of n and n '


Write
V V'
MSE (YˆRd )  
n n'
where V and V ' contain all the terms containing n and n ' respectively.

The cost function is C0  nC  n ' C ' where C and C ' are the costs per unit for selecting the samples

n and n ' respectively.

Now we find the optimum sample sizes n and n ' for fixed cost C0 . The Lagrangian function is

V V'
    (nC  n ' C ' C0 )
n n'
 V
 0  C  2
n n
 V'
 0  C '  2 .
n ' n'
Thus  Cn 2  V
V
or n
C
or  nC  VC .
Similarly  n ' C '  V ' C '.
Thus

VC  V ' C '

C0
and so

7
C0 V
Optimum n   nopt , say
VC  V ' C ' C
C0 V'
Optimum n '   nopt
'
, say
VC  V ' C ' C '
V'
Varopt (YˆRd ) 
V
 '
nopt nopt
( VC  V ' C ') 2

C0

Comparison with SRS


C0
If X is ignored and all resources are used to estimate Y by y , then required sample size = .
C
S y2 CS y2
Var ( y )  
C0 / C C0
Var ( y ) CS y2
Relative effiiency = 
Varopt (YˆRd ) ( VC  V ' C ')
2

Double sampling in regression method of estimation


When the population mean of auxiliary variable X is not known, then double sampling is used as
follows:
- A large sample of size n ' is taken from of the population by SRSWOR from which the

population mean X is estimated as x ' , i.e. Xˆ  x '.


- Then a subsample of size n is chosen from the larger sample and both the variables x and y

are measured from it by taking x ' in place of X and treat it as if it is known.

Then E ( x ')  X , E ( x )  X , E ( y )  Y . The regression estimate of Y in this case is given by

Yˆregd  y  ˆ ( x ' x )
n

s  ( xi  x )( yi  y )
S xy
where ˆ  xy2  i 1 is an estimator of   based on the sample of size n .
n
S x2
 (x  x )
sx 2
i
i 1

8
It is difficult to find the exact properties like bias and mean squared error of Yˆregd , so we derive the

approximate expressions.

Let
xX
1   x  (1  1 ) X
X
x ' X
2   x '  (1   2 ) X
X
s  S xy
 3  xy  sxy  (1   3 ) S xy
S xy
sx2  S x2
4  2
 sx2  (1   4 ) S x2
Sx
E (1 )  0, E ( 2 )  0, E ( 3 )  0, E ( 4 )  0
Define

21  E ( x  X )2 ( y  Y ) 
3
30  E  x  X 

Estimation error:
Then

Yˆregd  y  ˆ ( x ' x )
S xy (1   3 )
 y ( 2  1 ) X
S x2 (1   4 )
S xy
 yX 2
(1   3 )( 2  1 )(1   4 )1
S x

 y  X  (1   3 )( 2  1 )(1   4   42  ...)

Retaining the powers of  ' s upto order two assuming  3 1, (using the same concept as detailed in

the case of ratio method of estimation)

Yˆregd  y  X  ( 2   2 3   2 4   1   1 3   1 4 ).

9
Bias:
The bias of Yˆ upto the second order of approximation is
regd

E (Yˆregd )  Y  X   E ( 2 3 )  E ( 2 4 )  E (1 3 )  E (1 4 ) 

Bias (Yˆregd )  E (Yˆregd )  Y

 1 1  1  ( x ' X )( sxy  S xy )  
 X        
 n ' N  N  XS xy  

1 11  ( x ' X )( sx2  S X2 ) 


  
 n' N  N
 XS X2

 

1 1  1  ( x  X )( sxy  S xy ) 
  
n N N
  XS xy

 

1 1  1  ( x  X )( sx2  S x2 ) 
  
n N N
 XS x2

 
 1 1   1 1  1 1   1 1   
 X     21     302     21     302 
 n ' N  XS xy  n ' N  XS x  n N  XS xy  n N  XS x 

 1 1    
       21  302  .
 n n '   S xy S x 

Mean squared error:


MSE (Yˆregd )  E (Yregd  Y ) 2
2
  y  ˆ ( x ' x )  Y 
2
 E ( y  Y )  X  (1   3 )( 2  1 )(1   4   42  ...) 

Retaining the powers of  ' s upto order two, the mean squared error upto the second order of
approximation is

10
MSE (Yˆregd )  E ( y  Y )  X  ( 2   2 3   2 4  1  1 3  1 4 ) 
2

 E ( y  Y ) 2  X 2  2 E (12   22  21 2 )  2 X  E[( y  Y )(1   2 )]


 1 1  S 2  1 1  S 2 1 1  S 
2
 Var ( y )  X 2  2    x2     x2  2    x2 
 n N  X  n' N  X n N  X 
 1 1  S xy  1 1  S xy 
 2  X       
 n ' N  X  n N  X 
1 1  1 1 
 Var ( y )   2    S x2  2     S xy
 n n'  n n'
1 1 
 Var ( y )   2      2 S x2  2  S xy 
 n n'
 1 1   S xy 
2
S xy
 Var ( y )      4 S x2  2 2 S xy 
 n n '   S x Sx 

2
1 1   1 1   S xy 
    S y2      
n N   n n '   Sx 
1 1  1 1 
    S y2      2 S y2 (using S xy   S x S y )
n N   n n'
(1   2 ) S y2  2 S y2
  . (Ignoring the finite population correction)
n n'

Clearly, Yˆregd is more efficient than sample mean SRS, i.e. when no auxiliary variable is used.

Now we address the issue that whether the reduction in variability is worth the extra expenditure
required to observe the auxiliary variable.

Let the total cost of survey is


C0  C1n  C2 n '

where C1 and C2 are the costs per unit observing the study variable y and auxiliary variable x
respectively.

Now minimize the MSE (Yˆregd ) for fixed cost C0 using Lagrangian function with Lagranagian

multiplier  as

11
S y2 (1   2 )  2 S y2
    (C1n  C2 n ' C0 )
n n'
 1
 0   2 S y2 (1   2 )  C1  0
n n
 1
 0   2 S y2  2  C2  0
n ' n'

S y2 (1   2 )
Thus n
C1

Sy
and n'  .
C2
Substituting these values in the cost function, we have
C0  C1n  C2 n '
S y2 (1   2 )  2 S y2
 C1  C2
C1  C2
or C0   C1S y2 (1   2 )  C2  2 S y2
1  2
 .
or   S C (1   2
)   S C
C02  
y 1 y 2

Thus the optimum values of n and n ' are


 S y C0
'
nopt 
C2  S y C1 (1   2 )   S y C2 
 
C0 S y 1   2
nopt  .
C1  S y C1 (1   2 )   S y C2 
 

The optimum mean squared error of Yˆregd is obtained by substituting n  nopt and n '  nopt
'
as

MSE (Yˆregd )opt 


 
S y2 (1   2 )  C1 C1S y2 (1   2 )   S y C2 
 
C0 S y2 (1   2 )


S y2  2 C2  S y
  
C1 (1   2 )   S y C2 

 S y C0
1  2
 S y C1 (1   2 )   S y C2 
C0  
S y2  2
 C1 (1   2 )   C2 
C0  
12
The optimum variance of y under SRS for SRS where no auxiliary information is used is

C1S y2
Var ( ySRS )opt 
C0

which is obtained by substituting   0, C2  0 in MSE (YˆSRS )opt . The relative efficiency is

Var ( ySRS )opt C1S y2


RE  
MSE (Yˆ )
2
S y2  C1 (1   2 )   C2 
 
regd opt

1
 2
 C2 
 1   
2

 C1 
 1.
Thus the double sampling in regression estimator will lead to gain in precision if
C1 2
 .
C2 1  1   2  2
 

Double sampling for probability proportional to size estimation:


Suppose it is desired to select the sample with probability proportional to auxiliary variable x but
information on x is not available. Then, in this situation, the double sampling can be used. An initial
sample of size n ' is selected with SRSWOR from a population of size N , and information on x is
collected for this sample. Then a second sample of size n is selected with replacement and with
probability proportional to x from the initial sample of size n ' . Let x ' denote the mean of x for the
initial sample of size n ' , Let x and y denote means respectively of x and y for the second
sample of size n . Then we have the following theorem.

Theorem:
(1) An unbiased estimator of population mean Y is given as

x' n  y 
Yˆ  tot   i ,
n ' n i 1  xi 
'
where xtot denotes the total for x in the first sample.

13
2
 
1 1 (n ' 1) N  
(2) Var (Yˆ )     S y2 
xi yi
 n' N 
 
N ( N  1)nn ' i 1 X tot  xi
 Ytot
 , where X tot and Ytot denote the totals

X 
 tot 
of x and y respectively in the population.

(3) An unbiased estimator of the variance of Yˆ is given by

 (Yˆ )   1  1  1  ' n yi2 xtot


'2
( A  B)  1 n
 xtot
'
yi ˆ 
Var     tot 
x      Y 
 n ' N  n(n ' 1)  i 1 xi n '( n  1)  n( n  1) i 1  n ' xi 
2
 n y  n
y2
where A    i  and B   i2
 i 1 xi  i 1 xi

Proof. Before deriving the results, we first mention the following result proved in varying probability
scheme sampling.

Result: In sampling with varying probability scheme for drawing a sample of size n from a
population of size N and with replacement .
1 n yi
(i) z   zi is an unbiased estimator of population mean
n i 1
y where zi 
Npi
, pi being the

probability of selection of ith unit. Note that yi and pi can take anyone of the N values Y1 , Y2 ,..., YN

with initial probabilities P1 , P2 ,..., PN , respectively.


2
1  N Yi 2  1 N
Y 
2  
(ii) Var ( z )   N 2Y 2   2
Pi  i  Y  . .
nN  i 1 Pi  nN i 1  Pi 
(iii) An unbiased estimator of variance of z is
2
1 n
 yi 
(z ) 
Var  
n( n  1) i 1  Npi
 z  ..

Let E2 denote the expectation of Yˆ , when the first sample is fixed. The second is selected with

xi
probability proportional to x , hence using the result (i) with Pi  '
, we find that
xtot

14
 
 Yˆ  1 n y 
E2    E2   i 
 n'  n i 1 n ' xi 
 
 '
xtot 
 x' n  y 
 E2  tot   i 
 nn ' i 1  xi 
 y'
where y ' is the mean of y for the first sample. Hence


E (Yˆ )  E1  E2 Yˆ | n ' 
  
 E1 ( yn ' )
 Yˆ ,
which proves the part (1) of the theorem. Further,

   
Var (Yˆ )  V1 E2 Yˆ | n '  E1V2 Yˆ | n '

 V ( y ')  E V Yˆ | n ' 


1 1 2

    S  E V Yˆ | n '  .
1 1 2
y 1 2
 n' N 
Now, using the result (ii), we get
2
 
 
 
n'
1
V2 Yˆ | n '  '2  ' i  i  ytot
x y '

nn i 1 xtot  xi 
 x' 
 tot 
2
1 n' n'  y yj 
 '2
nn
 xi x j  i   ,
x x 
i 1 i  j  i j 

and hence
2

 1 n '( n ' 1) N n '


E1V2 Yˆ | n '  '2 
nn N ( N  1) i 1 i  j
y y 
xi x j  i  j  ,
x x 
 i j 

n '(n ' 1)
using the probability of a specified pair of units being selected in the sample is . So we can
N ( N  1)
express

15
2
 
 
  1 n '(n ' 1) N
E1V2 Yˆ / n '  '2 
nn N ( N  1) i 1
xi y
 i  Ytot  .
X tot  xtot 
X 
 tot 

 
Substituting this in V2 Yˆ | n ' , we get

2
 
1 1 (n ' 1) N
xi  yi 
Var (Yˆ )     S y2     Ytot  .
 n' N  nn ' N ( N  1) i 1 X tot  xi 
X 
 tot 
This proves the second part (2) of the theorem.

We now consider the estimation of Var (Yˆ ). Given the first sample, we obtain

 1 n y2  n'
E2   i    yi2 ,
 n i 1 pi  i 1
xi
where pi  '
. Also, given the first sample,
xtot

 1 n
 yi  
2

E2     Y    V2 (Yˆ )  E2 (Yˆ 2 )  y '2 .


ˆ
n(n  1) i 1  n ' pi
  

Hence
  yi
2

1 2
n
ˆ ˆ
E2 Y 

2

(  1)
 
'
 Y   y' .
2

n n i 1  n pi  

ˆ
'
xtot n
 yi  xi
Substituting Y     and pi  ' the expression becomes
n ' n i 1  xi  xtot

 x '2  n y 2  n y 2  


       2    y '
i i 2
E2
 nn (n  1)  i 1 xi   i 1 xi  
'2
 

Using
 1 n y2  n'
E2   i    yi2 ,
 n i 1 pi  i 1

we get

16
1 n x' '2
xtot  n'
E2   yi2 tot  ( A  B)    yi2  n ' y '2
 n i 1 xi nn '(n  1)  i 1
2
 n yi  n
yi2
where A     , and B   2 which further simplifies to
 i 1 xi  i 1 xi

 1  ' n y 2 xtot'2 ( A  B)  '2


E2   xtot     s y ,
i

 n(n ' 1)  i 1 xi n '(n  1) 

where s '2y is the mean sum of squares of y for the first sample. Thus, we obtain

 1  ' n yi2 xtot


'2
( A  B) 
 tot     E1 ( s y )  S y
'2 2
E1 E2  x (1)
 n(n ' 1)  i 1 xi n '(n  1) 

which gives an unbiased estimator of S y2 . Next, since we have


2
 
 
  1 (n ' 1) N
E1V2 Yˆ | n ' 
xi yi
 
nn ' N ( N  1) i 1 X tot  xi
 Ytot
 ,

X 
 tot 
and from this result we obtain
 1  
 
2
n
 yi xtot
'
E2     Y    V2 Yˆ | n ' .
ˆ
 n(n  1) i 1  n ' xi  

Thus
2
 
 1 yi ˆ   xi  yi 
2
n
 xtot
'
(n ' 1) N
E1 E2  
 n(n  1) i 1  n ' xi
 Y       Y  (2)
  nn ' N ( n  1) i 1 X tot  xi
tot

X 
 tot 
when gives an unbiased estimator of
2
 
(n ' 1) N
xi  yi 
 
nn ' N ( N  1) i 1 X tot  xi
 Ytot
 .

X 
 tot 

Using (1) and (2) an unbiased estimator of the variance of Yˆ is obtained as


2
 (Yˆ )   1  1  1  ' n yi2 xtot
'2
( A  B)  1 n
 xtot
'
yi ˆ 
Var    tot 
x      Y 
 n ' N  n(n ' 1)  i 1 xi n '(n  1)  n( n  1) i 1  n ' xi 
Thus, the theorem is proved.
17
Chapter 9
Cluster Sampling

It is one of the basic assumptions in any sampling procedure that the population can be divided into a finite
number of distinct and identifiable units, called sampling units. The smallest units into which the
population can be divided are called elements of the population. The groups of such elements are called
clusters.

In many practical situations and many types of populations, a list of elements is not available and so the
use of an element as a sampling unit is not feasible. The method of cluster sampling or area sampling can
be used in such situations.

In cluster sampling
- divide the whole population into clusters according to some well defined rule.
- Treat the clusters as sampling units.
- Choose a sample of clusters according to some procedure.
- Carry out a complete enumeration of the selected clusters, i.e., collect information on all the
sampling units available in selected clusters.

Area sampling
In case, the entire area containing the populations is subdivided into smaller area segments and each
element in the population is associated with one and only one such area segment, the procedure is called as
area sampling.

Examples:
 In a city, the list of all the individual persons staying in the houses may be difficult to obtain or even
may be not available but a list of all the houses in the city may be available. So every individual
person will be treated as sampling unit and every house will be a cluster.
 The list of all the agricultural farms in a village or a district may not be easily available but the list
of village or districts are generally available. In this case, every farm in sampling unit and every
village or district is the cluster.

1
Moreover, it is easier, faster, cheaper and convenient to collect information on clusters rather than on
sampling units.

In both the examples, draw a sample of clusters from houses/villages and then collect the observations on
all the sampling units available in the selected clusters.

Conditions under which the cluster sampling is used:


Cluster sampling is preferred when
(i) No reliable listing of elements is available and it is expensive to prepare it.
(ii) Even if the list of elements is available, the location or identification of the units may be
difficult.
(iii) A necessary condition for the validity of this procedure is that every unit of the population
under study must correspond to one and only one unit of the cluster so that the total number of
sampling units in the frame may cover all the units of the population under study without any
omission or duplication. When this condition is not satisfied, bias is introduced.

Open segment and closed segment:


It is not necessary that all the elements associated with an area segment need be located physically within
its boundaries. For example, in the study of farms, the different fields of the same farm need not lie within
the same area segment. Such a segment is called an open segment.

In a closed segment, the sum of the characteristic under study, i.e., area, livestock etc. for all the elements
associated with the segment will account for all the area, livestock etc. within the segment.

Construction of clusters:
The clusters are constructed such that the sampling units are heterogeneous within the clusters and
homogeneous among the clusters. The reason for this will become clear later. This is opposite to the
construction of the strata in the stratified sampling.

There are two options to construct the clusters – equal size and unequal size. We discuss the estimation of
population means and its variance in both the cases.

2
Case of equal clusters
 Suppose the population is divided into N clusters and each cluster is of size n .
 Select a sample of n clusters from N clusters by the method of SRS, generally WOR.
So
total population size = NM
total sample size = nM .

Let
yij : Value of the characteristic under study for the value of j th element ( j  1, 2,..., M ) in the i th cluster

(i  1, 2,..., N ).

1 M
yi 
M
y
j 1
ij mean per element of i th cluster .

Population (NM units)

Cluster Cluster Cluster


M units M units … … … M units Population
N clusters

N Clusters

Cluster Cluster Cluster


M units M units … … … M units Sample
n clusters

n Clusters

3
Estimation of population mean:
First select n clusters from N clusters by SRSWOR.
Based on n clusters, find the mean of each cluster separately based on all the units in every cluster. So we
have the cluster means as y1 , y2 ,..., yn . Consider the mean of all such cluster means as an estimator of
population mean as
1 n
ycl   yi .
n i 1

Bias:
1 n
E ( ycl )   E ( yi )
n i 1
1 n
 Y
n i 1
(since SRS is used)

Y.

Thus ycl is an unbiased estimator of Y .

Variance:
The variance of ycl can be derived on the same lines as deriving the variance of sample mean in

SRSWOR. The only difference is that in SRSWOR, the sampling units are y1 , y2 ,..., yn whereas in case

of ycl , the sampling units are y1 , y2 ,..., yn .

 N n 2  N n 2
 Note that is case of SRSWOR, Var ( y )  Nn S and Var ( y )  Nn s  ,

Var ( ycl )  E ( ycl  Y ) 2


N n 2
 Sb
Nn
1 N
where Sb2   ( yi  Y )2 which is the mean sum of square between the cluster means in the
N  1 i 1
population.
Estimate of variance:
Using again the philosophy of estimate of variance in case of SRSWOR, we can find

 ( y )  N  n s2
Var cl b
Nn
1 n
where sb2  
n  1 i 1
( yi  ycl ) 2 is the mean sum of squares between cluster means in the sample .

4
Comparison with SRS :
If an equivalent sample of nM units were to be selected from the population of NM units by SRSWOR,
the variance of the mean per element would be
NM  nM S 2
Var ( ynM )  .
NM nM
2
f S
 .
n M
N -n 1 N M
where f 
N
and S 2  
NM  1 i 1 j 1
( yij  Y ) 2 .

N n 2
Also Var ( ycl )  Sb
Nn
f
 Sb2 .
n
Consider
N M
( NM  1) S 2   ( yij  Y ) 2
i 1 j 1

N M 2

  ( yij  yi )  ( yi  Y ) 
i 1 j 1

N M N M
  ( yij  yi ) 2   ( yi  Y ) 2
i 1 j 1 i 1 j 1

 N ( M  1) S  M ( N  1) Sb2
2
w

where
1 N
S w2 
N
S
i 1
i
2
is the mean sum of squares within clusters in the population

1 M
Si2   ( yij  yi )2 is the mean sum of squares for the ith cluster.
M  1 j 1

The efficiency of cluster sampling over SRSWOR is


Var ( ynM )
E
Var ( ycl )
S2

MSb2
1  N ( M  1) S w2 
   ( N  1)  .
( NM  1)  M Sb 2

5
Thus the relative efficiency increases when S w2 is large and Sb2 is small. So cluster sampling will be
efficient if clusters are so formed that the variation the between cluster means is as small as possible while
variation within the clusters is as large as possible.

Efficiency in terms of intra class correlation


The intra class correlation between the elements within a cluster is given by
E ( yij  Y )( yik  Y ) 1
 ;   1
E ( yij  Y ) 2
M 1
1 N M M

 
MN ( M  1) i 1 j 1 k (  j ) 1
( yij  Y )( yik  Y )

1 N M
 ( yij  Y )2
MN i 1 j 1
1 N M M

 
MN ( M  1) i 1 j 1 k (  j )1
( yij  Y )( yik  Y )

 MN  1  2
 S
 MN 
N M M

 
i 1 j 1 k (  j ) 1
( yij  Y )( yik  Y )
 .
( MN  1)( M  1) S 2
Consider
2
N 1 N M 
 ( yi  Y )   2

i 1  M
 ( yij  Y ) 
i 1 j 1 
2
N  1 M
1 M M 
  2  ( yij  Y )  22
  ( yij  Y )( yik  Y ) 
i 1  M j 1 M j 1 k (  j ) 1 
N M M N N M
   ( yij  Y )( yik  Y )  M 2  ( yi  Y ) 2   ( yij  Y ) 2
i 1 j 1 k (  j ) 1 i 1 i 1 j 1

or
 ( MN  1)( M  1) S 2  M 2 ( N  1) Sb2  ( NM  1) S 2
( MN  1)
or Sb2  1   ( M  1) S 2 .
M ( N  1)
2

6
The variance of ycl now becomes

N n 2
Var ( ycl )  Sb
N
N  n MN  1 S 2
 1  ( M  1)  .
Nn N  1 M 2

MN  1 N n
For large N ,  1, N  1  N ,  1 and so
MN N
1 S2
Var ( ycl )  1  ( M  1)  .
nM
The variance of sample mean under SRSWOR for large N is
S2
Var ( ynM )  .
nM
The relative efficiency for large N is now given by
Var ( ynM )
E
Var ( ycl )
S2
 nM
S2
1  ( M  1)  
nM
1 1
 ;     1.
1  ( M  1)  M 1
 If M  1 then E  1, i.e., SRS and cluster sampling are equally efficient. Each cluster will consist
of one unit, i.e., SRS.
 If M  1, then cluster sampling is more efficient when
E 1
or ( M  1)   0
or   0.
 If   0, then E  1 , i.e., there is no error which means that the units in each cluster are arranged
randomly. So sample is heterogeneous.
 In practice,  is usually positive and  decreases as M increases but the rate of decrease in 
is much lower in comparison to the rate of increase in M . The situation that   0 is possible
when the nearby units are grouped together to form cluster and which are completely enumerated.
 There are situations when   0.
7
Estimation of relative efficiency:
The relative efficiency of cluster sampling relative to an equivalent SRSWOR is obtained as
S2
E .
MSb2

An estimator of E can be obtained by substituting the estimates of S 2 and Sb2 .

1 n
Since ycl   yi is the mean of n means yi from a population of N means yi , i  1, 2,..., N which
n i 1
are drawn by SRSWOR, so from the theory of SRSWOR,
1 n 
E ( sb2 )  E   ( yi  yc ) 2 
 n i 1 
1 N
 
N  1 i 1
( yi  Y ) 2

 Sb2 .

Thus sb2 is an unbiased estimator of Sb2 .

1 n 2
Since sw2  
n i 1
Si is the mean of n mean sum of squares Si2 drawn from the population of N mean

sums of squares Si2 , i  1, 2,..., N , so it follows from the theory of SRSWOR that

1 n  1 n 1 n 1 N

E ( sw2 )  E   Si2    E ( Si2 )    S i
2

 n i 1  n i 1 n i 1  N i 1 
1 N
  Si2
N i 1
 S w2 .

Thus sw2 is an unbiased estimator of S w2 .


Consider
1 N M
S2  
MN  1 i 1 j 1
( yij  Y ) 2

N M 2

or ( MN  1) S   ( yij  yi )  ( yi  Y ) 
2

i 1 j 1
N M
  ( yij  yi ) 2  ( yi  Y ) 2 
i 1 j 1
N
  ( M  1) Si2  M ( N  1) Sb2
i 1

 N ( M  1) S w2  M ( N  1) Sb2 .

8
An unbiased estimator of S 2 can be obtained as
1
Sˆ 2   N ( M  1) sw2  M ( N  1) sb2  .
MN  1 
So

 ( y )  N  n s2
Var cl b
Nn
ˆ2
( y )  N  n S
Var nM
Nn M
1 n
where sb2   ( yi  ycl )2 .
n  1 i 1

S2
An estimate of efficiency E  is
MSb2

N ( M  1) sw2  M ( N  1) sb2
Eˆ  .
M ( NM  1) sb2

If N is large so that M ( N  1)  MN and MN  1  MN , then

1  M  1  S w2
E  
M  M  MSb2
and its estimate is
1  M  1  sw2
Eˆ    .
M  M  Msb2

Estimation of a proportion in case of equal cluster


Now, we consider the problem of estimation of the proportion of units in the population having a specified
attribute on the basis of a sample of clusters. Let this proportion be P .

Suppose that a sample of n clusters is drawn from N clusters by SRSWOR. Defining yij  1 if the j th

unit in the i th cluster belongs to the specified category (i.e. possessing the given attribute) and yij  0

otherwise, we find that

9
yi  Pi ,
1 N
Y   Pi  P,
N i 1
MPQ
Si2  i i
,
( M  1)
N
M  PQ
i i
S w2  i 1
,
N ( M  1)
NMPQ
S2  ,
NM  1)
1 N
Sb2   ( Pi  P)2 ,
N  1 i 1
1 N 2 
  
N  1  i 1
Pi  NP 2 

1  N N

   Pi (1  Pi )   Pi  NP 2 
( N  1)  i 1 i 1 
1  N

 
( N  1) 
NPQ  
i 1
i i,
PQ

where Pi is the proportion of elements in the i th cluster, belonging to the specified category and

Qi  1  Pi , i  1, 2,..., N and Q  1  P. Then, using the result that ycl is an unbiased estimator of Y , we
find that
1 n
Pˆcl   Pi
n i 1
is an unbiased estimator of P and
 N


( N  n) 
NPQ   PQ
i i
.
Var ( Pˆcl )  i 1

Nn ( N  1)

This variance of Pˆcl can be expressed as

N  n PQ
Var ( Pˆcl )  [1  ( M  1)  ],
N  1 nM

where the value of  can be obtained from where

M ( N  1) Sb2  NS w2

( MN  1)
by substituting Sb2 , S w2 and S 2 in  , we obtain

10
N

M 1  PQ i i
  1 i 1
.
( M  1) N PQ

The variance of Pˆcl can be estimated unbiasedly by

 ( Pˆ )  N  n s 2
Var cl b
nN
N n 1 n
  ( Pi  Pˆcl )2
nN (n  1) i 1
N n  ˆ ˆ n

 
Nn(n  1) 
nP Q
cl cl  
i 1
PQ
i i

where Qˆ cl  I  Pˆcl . The efficiency of cluster sampling relative to SRSWOR is given by

M ( N  1) 1
E
( MN  1) 1  ( M  1)  
( N  1) NPQ
 .
NM  1  N

 NPQ   PQ i i
 i 1 
1
If N is large, then E  .
M
An estimator of the total number of elements belonging to a specified category is obtained by multiplying
Pˆcl by NM , i.e. by NMPˆcl . The expressions of variance and its estimator are obtained by multiplying the

corresponding expressions for Pˆcl by N 2 M 2 .

Case of unequal clusters:


In practice, the equal size of clusters are available only when planned. For example, in a screw
manufacturing company, the packets of screws can be prepared such that every packet contains same
number of screws. In real applications, it is hard to get clusters of equal size. For example, the villages
with equal areas are difficult to find, the districts with same number of persons are difficult to find, the
number of members in a household may not be same in each household in a given area.

11
Let there be N clusters and M i be the size of i th cluster, let
N
M0   Mi
i 1

1 N
M
N
M
i 1
i

1 Mi
yi 
Mi
y
j 1
ij : mean of i th cluster

1 N Mi
Y 
M0
 y
i 1 j 1
ij

N
Mi
 yi
i 1 M0
1 N
Mi

N
M i 1
yi
0

Suppose that n clusters are selected with SRSWOR and all the elements in these selected clusters are
surveyed. Assume that M i ’s (i  1, 2,..., N ) are known.

Population

Cluster Cluster Cluster


M1 M2 … … … MN Population
units units units N clusters

N Clusters

Cluster Cluster Cluster


M1 M2 … … … Mn Sample
units units units
n clusters

n Clusters

Based on this scheme, several estimators can be obtained to estimate the population mean. We consider
four type of such estimators.
12
1. Mean of cluster means:
Consider the simple arithmetic mean of the cluster means as
1 n
yc   yi
n i 1
1 N
E  yc   y i
N i 1
N
Mi
 Y (where Y   yi ).
i 1 M0

The bias of yc is

Bias  yc   E  yc   Y
1 N N
 Mi 

N

i 1
yi     yi
i 1  M 0 

1 N M N

  
M 0  i 1
M i yi  0
N
 y 
i 1
i

  N  N  
1 
N 

 M i   yi  
 i 1  
 
M 0  i 1
M i yi  i 1

N 
 
 
1 N
  (M i  M )( yi  Y )
M 0 i 1
 N 1 
   Smy
 M0 
Bias  yc   0 if M i and yi are uncorrelated .

The mean squared error is

MSE  yc   Var  yc    Bias  yc  


2

2
N  n 2  N 1  2
 Sb    S my
Nn  M 0 
where
1 N
Sb2   ( yi  Y )2
N  1 i 1
1 N
S my   (M i  M )( yi  Y ).
N  1 i 1
13
An estimate of Var  yc  is

  y   N  n s2
Var c b
Nn
1 n
  y c  yc  .
2
where sb2 
n  1 i 1

2. Weighted mean of cluster means


Consider the arithmetic mean based on cluster total as
1 n
yc*   M i yi
nM i 1
1 n 1
E ( yc* )   E ( yi M i )
n i 1 M
n 1 N
  M i yi
n M 0 i 1
1 N Mi

M0
 y
i 1 j 1
ij

Y.

Thus yc* is an unbiased estimator of Y . The variance of yc* and its estimate are given by

1 n M 
Var ( yc* )  Var   i yi 
 n i 1 M 
N  n *2
 Sb
Nn
 ( y * )  N  n s*2
Var c b
Nn
where
2
1 N  Mi 
S 
*2
b  
N  1 i 1  M
yi  Y 

2
1 n  Mi 
sb*2   
n  1 i 1  M
yi  yc* 

E ( sb*2 )  Sb*2 .

Note that the expressions of variance of yc* and its estimate can be derived using directly the theory of
SRSWOR as follows:

14
Mi 1 n
Let zi  yi , then yc*   zi  z .
M n i 1
Since SRSWOR is followed, so
N n 1 n
Var ( yc* )  Var ( z )  
Nn N  1 i 1
( zi  Y ) 2
2
N  n 1 N  Mi 
  
Nn N  1 i 1  M
yi  Y 

N  n *2
 Sb .
Nn
Since
 1 n 
E ( sb*2 )  E  
 n  1 i 1
( zi  z ) 2 

 1 n  Mi *
2

 E   y  yc  
 n  1 i 1  M
i
 
2
1 N  Mi 
  
N  1 i 1  M
yi  Y 

 Sb*2
So an unbiased estimator of variance can be easily derived.

3. Estimator based on ratio method of estimation


Consider the weighted mean of the cluster means as
n

M y i i
y **
c
i 1
n

M
i 1
i

It is easy to see that this estimator is a biased estimator of population mean. Before deriving its bias and
mean squared error, we note that this estimator can be derived using the philosophy of ratio method of
estimation. To see this, consider the study variable U i and auxiliary variable Vi as

15
M i yi
Ui 
M
Mi
Vi  i  1, 2,..., N
M
N

1 N 1 M i
V   Vi  i 1
1
N i 1 N M
1 n
u   ui
n i 1
1 n
v  vi .
n i 1
The ratio estimator based on U and V is

YˆR  V
u
v
n

u i
 i 1
n

v
i 1
i

n
M i yi
 M
 i 1n
Mi
i 1 M
n

M y i i
 i 1
n
.
 Mi
i 1

Since the ratio estimator is biased, so yc** is also a biased estimator. The approximate bias and mean

squared errors of yc** can be derived directly by using the bias and MSE of ratio estimator. So using the
results from the ratio method of estimation, the bias up to second order of approximation is given as
follows
N  n  Sv2 Suv 
Bias ( y )  **
  U
Nn  V 2 UV 
c

N  n  2 Suv 
  Sv  U
Nn  U 
1 N
1 N
where U 
N
U i 
i 1
 M i yi
NM i 1

16
1 N
Sv2   (Vi  V )2
N  1 i 1
2
1 N  Mi 
   1
N  1 i 1  M
1 N
Suv   (U i  U )(Vi  V )
N  1 i 1
1 N  M i yi 1 N
 Mi 
  
N  1 i 1  M

NM
 M y   M
i 1
i i  1

U 1 N
Ruv 
V
U 
NM
M y .
i 1
i i

The MSE of yc** up to second order of approximation can be obtained as follows:

N n 2
MSE ( yc** ) 
Nn
 Su  R 2 Sv2  2 RSuv 
2
1 N  M i yi 1 N

where S  2
 
N  1 i 1  M
u 
NM

i 1
M i yi 

Alternatively,
2
N n 1 N
MSE ( yc** )   U i  RuvVi 
Nn N  1 i 1
2
N  n 1 N  M i yi  1 N
M 
 
Nn N  1 i 1  M

 NM

i 1
M i yi  i 
M
2
 N

N  n 1 N  Mi  
2  M i yi 
     yi 
Nn N  1 i 1  M  
i 1

NM 
 .
 

An estimator of MSE can be obtained as


2
 ( y ** )  N  n 1  Mi 
n
MSE c  
Nn n  1 i 1  M 
 ( yi  yc ) .
** 2

The estimator yc** is biased but consistent.

17
4. Estimator based on unbiased ratio type estimation
1 n 1 Mi
Since yc  
n i 1
yi (where yi 
Mi
 y ) is a biased estimator of population mean and
i 1
ij

 N 1 
Bias ( yc )     Smy
 M0 
 N 1 
   Smy
 NM 
Since SRSWOR is used, so
1 n 1 n
smy   (M i  m)( yi  yc ),
n  1 i 1
m  Mi
n i 1
is an unbiased estimator of
1 N
S my   ( M i  M )( yi  Y ),
N  1 i 1
i.e., E ( smy )  S my .

So it follow that
 N 1 
E ( yc )  Y     E ( smy )
 NM 
  N 1  
or E  yc    smy   Y .
  NM  
So
 N 1 
yc**  yc    smy
 NM 
is an unbiased estimator of the population mean Y .

This estimator is based on unbiased ratio type estimator. This can be obtained by replacing the study
Mi Mi
variable (earlier yi ) by yi and auxiliary variable (earlier xi ) by . The exact variance of this
M M
estimate is complicated and does not reduces to a simple form. The approximate variance upto first order
of approximation is
2
1  M i   1  
Var  y  
N N

c
**
 
n( N  1) i 1  M
yi  Y )   
  NM
 yi  ( M i  M )  .

i 1 

18
A consistent estimate of this variance is
2
  n

n 
 1    M i 
 i yi  yc )   
  y **   1 M n
Var c 
n(n  1) i 1  M
 yi   M i  i 1
  nM i 1   n 
 .
  
  
** **
The variance of ycc will be smaller than that of yc (based on the ratio method of estimation) provided
M i yi M 1 N
1 N
the regression coefficient of
M
on i is nearer to
M N
 yi than to
i 1 M0
M y .
i 1
i i

Comparison between SRS and cluster sampling:


n
In case of unequal clusters, M
i 1
i is a random variable such that

 n 
E   M i   nM .
 i 1 
Now if a sample of size nM is drawn from a population of size NM , then the variance of corresponding
sample mean based on SRSWOR is
NM  nM S 2
Var ( ySRS ) 
NM nM
N n S 2
 .
Nn M
This variance can be compared with any of the four proposed estimators.
For example, in case of
1 n
yc* 
nM
M y
i 1
i i

N  n *2
Var ( yc* )  Sb
Nn
2
N  n 1 N  Mi 
  
Nn N  1 i 1  M
yi  Y  .

The relative efficiency of yc** relative to SRS based sample mean

Var ( ySRS )
E
Var ( yc* )
S2
 .
MSb*2

For Var ( yc* )  Var ( ySRS ), the variance between the clusters ( Sb*2 ) should be less. So the clusters should be
formed in such a way that the variation between them is as small as possible.

19
Sampling with replacement and unequal probabilities (PPSWR)
In many practical situations, the cluster total for the study variable is likely to be positively correlated with
the number of units in the cluster. In this situation, it is advantageous to select the clusters with probability
proportional to the number of units in the cluster instead of with equal probability, or to stratify the clusters
according to their sizes and then to draw a SRSWOR of clusters from each of the stratum. We consider
here the case where clusters are selected with probability proportional to the number of units in the cluster
and with replacement.

Suppose that n clusters are selected with ppswr, the size being the number of units in the cluster. Here
Pi is the probability of selection assigned to the i th cluster which is given by

Mi Mi
Pi   , i  1, 2,..., N .
M 0 NM
Consider the following estimator of the population mean:
1 n
Yˆc   yi .
n i 1
Then this estimator can be expressed as
1 N
Yˆc    i yi
n i 1

where  i denotes the number of times the i th cluster occurs in the sample. The random variables

1 ,  2 ,...,  N follow a multinomial probability distribution with


E ( i )  nPi , Var ( i )  nPi (1  Pi )
Cov( i ,  j )   nPP
i j , i  j.

Hence,
1 N
E (Yˆc )   E ( i ) yi
n i 1
1 N
  nPi yi
n i 1
N
M
  i yi
i 1 NM
N Mi

 y
i 1 j 1
ij

 Y.
NM

Thus Yˆc is an unbiased estimator of Y .


20
We now derive the variance of Yˆc .
1 N
From Yˆc    i yi ,
n i 1
1 N N 
Var (Yˆc )  2   Var ( i ) yi2   Cov( i ,  j ) yi y j 
n  i 1 i j 
1 N N 
  
n 2  i 1
Pi (1  Pi ) y i
2
  PPi j yi y j 
i j 
1 N  
2
 N
 2   Pi yi2    Pi yi  
n  i 1  i j  

1 N
 2  Pi  yi  Y 
2

n i 1
1 N

nNM
 M (y Y ) .
i 1
i i
2

An unbiased estimator of the variance of Yˆc is

 (Yˆ )  1 n
Var  ( yi  Yˆc ) 2
n(n  1) i 1
c

which can be seen to satisfy the unbiasedness property as follows:


Consider
 1 n

E  ( yi  Yˆc ) 2 
 n(n  1) i 1 
 1  n 
E   ( yi2  nYˆc ) 2  
 n(n  1)  i 1 
1   n 2 ˆ 2
  E    i yi   nVar (Yc )  nY 
n(n  1)   i 1  
where E ( i )  nPi , Var ( i )  nPi (1  Pi ), Cov ( i ,  j )   nPP
i j ,i  j

 1 n
 1 N 1 N 
E  ( yi  Yˆc ) 2     n P y 2
 n  Pi ( yi  Y ) 2  nY 2 
 n(n  1) i 1  n(n  1)  i 1
i i i
n i 1 
1  N
1 N

  
(n  1)  i 1
Pi ( yi2  Y 2)   Pi ( yi  Y ) 2 
n i 1 
1 N 1 N 
  
(n  1)  i 1
Pi ( yi  Y ) 2
 
n i 1
Pi ( yi  Y ) 2 

1 N
 
(n  1) i 1
Pi ( yi  Y ) 2

 Var (Yˆ ).
c

21
Chapter 10
Two Stage Sampling (Subsampling)

In cluster sampling, all the elements in the selected clusters are surveyed. Moreover, the efficiency in
cluster sampling depends on size of the cluster. As the size increases, the efficiency decreases. It
suggests that higher precision can be attained by distributing a given number of elements over a large
number of clusters and then by taking a small number of clusters and enumerating all elements within
them. This is achieved in subsampling.

In subsampling
- divide the population into clusters.
- Select a sample of clusters [first stage}
- From each of the selected cluster, select a sample of specified number of elements [second stage]

The clusters which form the units of sampling at the first stage are called the first stage units and the
units or group of units within clusters which form the unit of clusters are called the second stage units or
subunits.

The procedure is generalized to three or more stages and is then termed as multistage sampling.

For example, in a crop survey


- villages are the first stage units,
- fields within the villages are the second stage units and
- plots within the fields are the third stage units.

In another example, to obtain a sample of fishes from a commercial fishery


- first take a sample of boats and
- then take a sample of fishes from each selected boat.

Two stage sampling with equal first stage units:


Assume that
- population consists of NM elements.
- NM elements are grouped into N first stage units of M second stage units each, (i.e., N
clusters, each cluster is of size M )
- Sample of n first stage units is selected (i.e., choose n clusters)
1
- Sample of m second stage units is selected from each selected first stage unit (i.e., choose m
units from each cluster).
- Units at each stage are selected with SRSWOR.

Cluster sampling is a special case of two stage sampling in the sense that from a population of N
clusters of equal size m  M , a sample of n clusters are chosen.
If further M  m  1, we get SRSWOR.
If n  N , we have the case of stratified sampling.

yij : Value of the characteristic under study for the j th second stage units of the i th first stage

unit; i  1, 2,..., N ; j  1, 2,.., m.

1 M
Yi 
M
y
j 1
ij : mean per 2nd stage unit of i th 1st stage units in the population.

1 N M
1 N
Y 
MN
 yij 
i 1 j 1 N
y
i 1
i  YMN : mean per second stage unit in the population

1 m
yi  
m j 1
yij : mean per second stage unit in the i th first stage unit in the sample.

1 n m 1 n
y  ij n 
mn i 1 j 1
y 
i 1
yi  ymn : mean per second stage in the sample.

Advantages:
The principle advantage of two stage sampling is that it is more flexible than the one stage sampling. It
reduces to one stage sampling when m  M but unless this is the best choice of m , we have the
opportunity of taking some smaller value that appears more efficient. As usual, this choice reduces to a
balance between statistical precision and cost. When units of the first stage agree very closely, then
consideration of precision suggests a small value of m . On the other hand, it is sometimes as cheap to
measure the whole of a unit as to a sample. For example, when the unit is a household and a single
respondent can give as accurate data as all the members of the household.

A pictorial scheme of two stage sampling scheme is as follows:

2
Population (MN units)

Cluster Cluster Cluster Population


M units M units … … … M units N clusters
(large in
number)
N clusters

Cluster Cluster First stage


Cluster sample
M units M units … … …
M units n clusters
(small in
n clusters number)

Second stage
sample
m units
Cluster Cluster Cluster n clusters
… … …
m units m units m units (large number
of elements
mn units from each
cluster)

Note: The expectations under two stage sampling scheme depend on the stages. For example, the
expectation at second stage unit will be dependent on first stage unit in the sense that second stage unit
will be in the sample provided it was selected in the first stage.

To calculate the average


- First average the estimator over all the second stage selections that can be drawn from a fixed
set of n units that the plan selects.
- Then average over all the possible selections of n units by the plan.

3
In case of two stage sampling,
E (ˆ)  E1[ E2 (ˆ)]
  
average average average over
over over all all possible 2nd stage
all 1st stage selections from
samples samples a fixed set of units

In case of three stage sampling,


E (ˆ)  E1  E2 E3 (ˆ)  .
  

To calculate the variance, we proceed as follows:


In case of two stage sampling,
Var (ˆ)  E (ˆ   ) 2
 E E (ˆ   ) 2
1 2

Consider
E2 (ˆ   ) 2  E2 (ˆ 2 )  2 E2 (ˆ)   2
.
 
  E2 (ˆ  V2 (ˆ)   2 E2 (ˆ)   2
2

 

Now average over first stage selection as


2
E1 E2 (ˆ   ) 2  E1  E2 (ˆ)   E1 V2 (ˆ)   2 E1 E2 (ˆ)  E1 ( 2 )

 
 E1  E1 E2 (ˆ)   2   E1 V2 (ˆ) 
2

 
Var (ˆ)  V1  E2 (ˆ)   E1 V2 (ˆ)  .

In case of three stage sampling,

   
Var (ˆ)  V1  E E3 (ˆ)   E1 V2 E3 (ˆ)   E1  E2 V3 (ˆ)  .
 2       

4
Estimation of population mean:
Consider y  ymn as an estimator of the population mean Y .

Bias:
Consider
E ( y )  E1  E2 ( ymn ) 
 E1  E2 ( yim i )  (as 2nd stage is dependent on 1st stage)

 E1  E2 ( yim i )  (as yi is unbiased for Yi due to SRSWOR)

1 n 
= E1   Yi 
 n i 1 
1 N
  Yi
N i 1
Y .

Thus ymn is an unbiased estimator of the population mean.

Variance
Var ( y )  E1 V2 ( y i )   V1  E2 ( y / i ) 
 1 n   1 n 
 E1 V2   yi i   V1  E2   yi / i 
  n i 1    n i 1 
1 n  1 n 
 E1  2  V ( yi i )   V1   E2 ( yi / i ) 
 n i 1   n i 1 
1 n 1 1   1 n 
 E1  2     Si2   V1   Yi 
 n i 1  m M    n i 1 
1 n
1 1 
 2    E1 ( Si2 )  V1 ( yc )
n i 1  m M 
(where yc is based on cluster means as in cluster sampling)
11 1  N n 2
 n    S w2  Sb
n2
m M  Nn
1 1 1  1 1 
    S w2     Sb2
nm M  n N 
2
1 1
 Yij  Yi 
N N M
where S w2 
N
 Si2 
i 1 N ( M  1) i 1 j 1
1 N
Sb2  
N  1 i 1
(Yi  Y ) 2

5
Estimate of variance
An unbiased estimator of variance of y can be obtained by replacing Sb2 and S w2 by their unbiased

estimators in the expression of variance of y .

Consider an estimator of
1 N
S w2 
N
S
i 1
i
2

2
1 M
where Si2    yij  Yi 
M  1 j 1
1 n 2
as  si
sw2 
n i 1
1 m
where si2   ( yij  yi )2 .
m  1 j 1
So
E ( sw2 )  E1 E2  sw2 i 

1 n 
 E1 E2   si2 i 
 n i 1 
1 n
 E1   E2 ( si2 i ) 
n i 1
1 n 2
 E1  Si
n i 1
(as SRSWOR is used)

1 n
 
n i 1
E1 ( Si2 )

1 N
1 N


N
 
i 1  N
S
i 1
i
2


1 N

N
S
i 1
i
2

S 2
w

so sw2 is an unbiased estimator of S w2 .

Consider
1 n
sb2  
n  1 i 1
( yi  y ) 2

as an estimator of
1 N
Sb2  
N  1 i 1
(Yi  Y ) 2 .

6
So
1  n 
E ( sb2 )  E   ( yi  y ) 2 
n  1  i 1 
 n 
(n  1) E ( sb2 )  E   yi2  ny 2 
 i 1 
 n

 E   yi2   nE ( y 2 )
 i 1 
  n 
 E1  E2   yi2    n Var ( y )   E ( y ) 
2

  i 1    

 n   1 1  1 1 1 2 2
 E1   E2 ( yi2 ) i )   n    Sb2     Sw  Y 
 i 1   n N  m M n 
 n
 i 1
 2 



 1 1   1 1 1 
 E1   Var ( yi )   E ( yi    n    Sb2     S w2  Y 2 
 n N  m M n 
 n  1 1    1 1   1 1 1 
 E1      Si2  Yi 2   n    Sb2     S w2  Y 2 
 i 1  m M    n N  m M n 
1  n  1 1    1 1   1 1 1 
 nE1     Si2  Yi 2   n    Sb2     S w2  Y 2 
 n  i 1  m M    n N  m M n 
 1 1  1 N
1 N
  1 1   1 1 1 
 n     Si2   Yi 2   n    Sb2     S w2  Y 2 
 m M  N i 1 N i 1   n N  m M n 
 1 1  1 N
  1 1  2  1 1  1 2 2
 n    S w2 
 m M  N
Y
i 1
i
2
  n  n  N  S b   m  M  n S w  Y 
     

1 1  2 n N 2 1 1  2
 (n  1)    S w   Yi  nY  n    Sb
2

m M  N i 1 n N 
1 1  2 n N
2 1 1  2
 (n  1)    S w    Yi  NY   n    Sb
2

m M  N  i 1  n N 
1 1  n 1 1 
 (n  1)    S w2  ( N  1) Sb2  n    Sb2
m M  N n N 
1 1  2
 (n  1)    S w  (n  1) Sb .
2

m M 
1 1  2
 E ( sb2 )     S w  Sb
2

m M 
 1 1  2
or E  sb2     sw   S b .
2

 m M  

7
Thus

 ( y )  1  1  1  Sˆ 2   1  1  Sˆ 2
Var      b
nm M  n N 
1 1 1   1 1  1 1  2
    sw2      sb2     sw 
nm M   n N  m M  
1  1 1  2 1 1  2
    sw     sb .
N m M  n N 

Allocation of sample to the two stages: Equal first stage units:


The variance of sample mean in the case of two stage sampling is

( y)  1  1  1  2 1 1  2
Var   S w     Sb .
nm M  n N 
It depends on Sb2 , S w2 , n and m. So the cost of survey of units in the two stage sample depends on
n and m.

Case 1. When cost is fixed


We find the values of n and m so that the variance is minimum for given cost.

(I) When cost function is C = kmn


Let the cost of survey be proportional to sample size as
C  knm
where C is the total cost and k is constant.
C0
When cost is fixed as C  C0 . Substituting m  in Var ( y ), we get
kn
1  2 S w2  Sb2 1 kn 2
Var ( y )   Sb     Sw
n M  N n C0
1 S 2   S 2 kS 2 
  Sb2  w    b  w  .
n M   N C0 

 2 S w2 
This variance is monotonic decreasing function of n if  Sb    0. The variance is minimum
 M 

when n assumes maximum value, i.e.,


C0
nˆ  corresponding to m  1.
k

8
 S2 
If  Sb2  w   0 (i.e., intraclass correlation is negative for large N ) , then the variance is a monotonic
 M 

C0
increasing function of n , It reaches minimum when n assumes the minimum value, i.e., nˆ 
kM
(i.e., no subsampling).

(II) When cost function is C  k1n  k2 mn


Let cost C be fixed as C0  k1n  k2 mn where k1 and k2 are positive constants. The terms k1 and k2
denote the costs of per unit observations in first and second stages respectively. Minimize the variance of
sample mean under the two stage with respect to m subject to the restriction C0  k1n  k2 mn .

We have
 S2   S2   S2  k S2
C0 Var ( y )  b   k1  Sb2  w   k2 S w2  mk2  Sb2  w   1 w .
 N  M  M m
 S2 
When  Sb2  w   0, then
 M
2 2
 S2    S2     S2  k S2 
C0 Var ( y )  b    k1  Sb2  w   k2 S w2    mk2  Sb2  w   1 w 
N 
 
  M  M m 
 
which is minimum when the second term of right hand side is zero. So we obtain

k1 S w2
mˆ  .
k2  2 S w2 
 Sb  
 M

The optimum n follows from C0  k1n  k2 mn as

C0
nˆ  .
k1  k2 mˆ

 S2 
When  Sb2  w   0 then
 M

 S2   S2   S2  k S2
C0 Var ( y )  b   k1  Sb2  w   k2 S w2  mk2  Sb2  w   1 w
 N  M  M m

is minimum if m is the greatest attainable integer. Hence in this case, when


C0
C0  k1  k2 M ; mˆ  M and nˆ  .
k1  k2 M
C0  k1
If C0  k1  k2 M ; then mˆ  and nˆ  1.
k2
9
If N is large, then S w2  S 2 (1   )

S w2
S w2   S 2
M

k1  1 
mˆ    1 .
k2   

Case 2: When variance is fixed


Now we find the sample sizes when variance is fixed, say as V0 .

1 1 1  2 1 1  2
V0     S w     Sb
nm M  n N 
1 1 
Sb2     S w2
n m M 
S2
V0  b
N
So
 2 S w2 
 Sb   kS w2
C  kmn  km  M  .
2 2
 V  Sb  V  Sb
 0  0
 N  N
 S2 
If  Sb2  w   0, C attains minimum when m assumes the smallest integral value, i.e., 1.
 M

 S2 
If  Sb2  w   0 , C attains minimum when mˆ  M .
 M

Comparison of two stage sampling with one stage sampling


One stage sampling procedures are comparable with two stage sampling procedures when either
(i) sampling mn elements in one single stage or
mn
(ii) sampling first stage units as cluster without sub-sampling.
M
We consider both the cases.

10
Case 1: Sampling mn elements in one single stage
The variance of sample mean based on
- mn elements selected by SRSWOR (one stage) is given by
 1 1  2
V ( ySRS )    S
 mn MN 
- two stage sampling is given by
1 1 1  2 1 1  2
V ( yTS )     S w     Sb .
nm M  n N 

The intraclass correlation coefficient is

 N  1  2 Sw
2

  Sb 
M M ( N  1) Sb  NS w
2 2
 N  1
  ;   1 (1)
 NM  1  2 ( MN  1) S 2
M 1
 S
 NM 
and using the identity
N M N M N M

 ( yij  Y )2  ( yij  Yi )2   (Yi  Y )2


i 1 j 1 i 1 j 1 i 1 j 1

( NM  1) S 2  ( N  1) MSb2  N ( M  1) S w2 (2)

1 N M
1 M
where Y 
MN
 yij , Yi 
i 1 j 1 M
y .
j 1
ij

Now we need to find Sb2 and S w2 from (1) and (2) in terms of S 2 . From (1), we have

 MN  1   N 1 
S w2     MS   
2 2
 MSb . (3)
 N   N 

Substituting it in (2) gives


 N  1   MN  1  2 
( NM  1) S 2  ( N  1) MSb2  N ( M  1)   MSb    MS  
2

 N   N  
 ( N  1) MSb  ( M  1)( N  1) Sb   M ( M  1)( MN  1) S
2 2 2

 ( N  1) MSb2 [1  ( M  1)]   M ( M  1)( MN  1) S 2


 ( N  1) MSb2   M ( M  1)( MN  1) S 2
( MN  1) S 2
 Sb2  1  ( M  1)  
M 2 ( N  1)

Substituting it in (3) gives

11
N ( M  1) S w2  ( NM  1) S 2  ( N  1) MSb2
 ( MN  1) S 2 
 ( NM  1) S  ( N  1) M  2
2
1  ( M  1)   
 M ( N  1) 
 M  1  ( M  1)  
 ( NM  1) S 2  
 M
 ( NM  1) S 2 ( M  1)(1   )
 MN  1  2
 S w2    S (1   ).
 MN 
Substituting Sb2 and S w2 in Var ( yTS )

 MN  1  S  m(n  1) N n m M  m 
2
V ( yTS )    1  ( M  1)   .
 MN  mn  M ( N  1)  N 1 M M  
m
When subsampling rate is small, MN  1  MN and M  1  M , then
M
S2
V ( ySRS ) 
mn
S2   N n 
V ( yTS )  1   m  1  .
mn   N 1 
The relative efficiency of the two stage in relation to one stage sampling of SRSWOR is
Var ( yTS )  N n 
RE   1   m  1 .
Var ( ySRS )  N 1 
N n N n
If N  1  N and finite population correction is ignorable, then   1, then
N 1 N
RE  1   (m  1).

Case 2: Comparison with cluster sampling


mn
Suppose a random sample of clusters, without further subsampling is selected.
M
The variance of the sample mean of equivalent mn / M clusters is
M 1 2
Var ( ycl )     Sb .
 mn N 
The variance of sample mean under the two stage sampling is
1 1 1  2 1 1  2
Var ( yTS )     S w     Sb .
nm M  n N 
So Var ( ycl ) exceedes Var ( yTS ) by

1M  2 1 2 
  1   Sb  S w 
n m  M 
12
which is approximately
1M  2  2 S w2 
  1   S for large N and  Sb    0.
n m   M

MN  1 S 2
where Sb2  1   ( M  1)
M ( N  1) M
MN  1 2
S w2  S (1   )
MN
So smaller the m / M , larger the reduction in the variance of two stage sample over a cluster sample.
 S2 
When  Sb2  w   0 then the subsampling will lead to loss in precision.
 M

Two stage sampling with unequal first stage units:


Consider two stage sampling when the first stage units are of unequal size and SRSWOR is employed at
each stage.
Let
yij : value of j th second stage unit of the i th first stage unit.

M i : number of second stage units in i th first stage units (i  1, 2,..., N ) .


N
M 0   M i : total number of second stage units in the population.
i 1

mi : number of second stage units to be selected from i th first stage unit, if it is in the sample.
n
m0   mi : total number of second stage units in the sample.
i 1

1 mi
yi ( mi ) 
mi
yj 1
ij

1 Mi
Yi 
Mi
yj 1
ij

1 N
Y 
N
y
i 1
i  YN
N Mi N
 y ij M Y i i
1 N
Y 
i 1 j 1
N
 i 1
 u Y i i

M
MN N i 1
i
i 1

Mi
ui 
M
1 N
M   Mi
N i 1

13
The pictorial scheme of two stage sampling with unequal first stage units case is as follows:

Population (MN units)

Cluster Cluster Cluster Population


M1 M2 … … … MN N clusters
units units units

N clusters

Cluster Cluster Cluster First stage


M1 M2 … … … Mn sample
units units units n clusters
(small)
n clusters

Second stage
Cluster Cluster Cluster
… … … sample
m1 units m2 units mn units n clusters
(small)

14
Now we consider different estimators for the estimation of population mean.
1. Estimator based on the first stage unit means in the sample:
1 n
Yˆ  yS 2   yi ( mi )
n i 1

Bias:
1 n 
E ( yS 2 )  E   yi ( mi ) 
 n i 1 
1 n

 E1   E2 ( yi ( mi ) ) 
 n i 1 
1 n 
 E1   Yi  [Since a sample of size mi is selected out of M i units by SRSWOR]
 n i 1 
1 N
  Yi
N i 1
YN
 Y.

So yS 2 is a biased estimator of Y and its bias is given by

Bias ( yS 2 )  E ( yS 2 )  Y
1 N
1 N

N
 Yi 
i 1
 M iYi
NM i 1
1 N 1  N  N 
   M iYi    Yi    M i  
NM  i 1 N  i 1   i 1 
1 N
  (M i  M )(Yi  YN ).
NM i 1
This bias can be estimated by
N 1 n
( y )  
Bias S2  (M i  m)( yi ( mi )  yS 2 )
NM (n  1) i 1
which can be seen as follows:

 ( y )   N 1 E  1 
E2 ( M i  m)( yi ( mi )  yS 2 ) / n
n
E  Bias
 S 2  NM
1  
 n  1 i 1 
N 1  1 n 
 E 
NM  n  1 i 1
( M i  m)(Yi  yn ) 

1 N

NM
 (M
i 1
i  M )(Yi  YN )

 YN  Y

1 n
where yn   Yi .
n i 1

15
An unbiased estimator of the population mean Y is thus obtained as
N 1 1 n
yS 2   (M i  m)( yi ( mi )  yS 2 ) .
NM n  1 i 1
Note that the bias arises due to the inequality of sizes of the first stage units and probability of selection
of second stage units varies from one first stage to another.

Variance:
Var ( yS 2 )  Var  E ( yS 2 n)   E Var ( yS 2 n) 
1 n  1 n 
 Var   yi   E  2  Var ( yi ( mi ) i ) 
 n i 1   n i 1 
1 1  1 n  1 1  2
    Sb2  E  2     Si 
n N   n i 1  mi M i  
1 1  1 N  1 1  2
    Sb2 
n N 
  
Nn i 1  mi M i 
 Si

where S 2
b
1 N
 Yi  YN
N  1 i 1
 
2
1 Mi
S 
i
2
  yij  Yi  .
M i  1 j 1

The MSE can be obtained as

MSE ( yS 2 )  Var ( yS 2 )   Bias ( yS 2 )  .


2

Estimation of variance:
Consider mean square between cluster means in the sample
1 n
  
2
sb2  yi ( mi )  yS 2 .
n  1 i 1
It can be shown that

1 N
 1 1  2
E ( sb2 )  Sb2   m  Si .
N i 1  i Mi 
1 mi
Also si2  
mi  1 j 1
( yij  yi ( mi ) ) 2

1 Mi
E( s )  S 
2
i i
2

M i  1 j 1
( yij  Yi ) 2

1 n  1 1  2 1 N
 1 1  2
So E      si    m   Si .
 n i 1  i
m M i   N i 1  i Mi 

16
Thus
1 n  1 1  2
E ( sb2 )  Sb2  E      si 
 n i 1  i
m M i  
and an unbiased estimator of Sb2 is

1 n  1 1  2
Sˆb2  sb2      si .
n i 1  mi M i 

So an estimator of the variance can be obtained by replacing Sb2 and Si2 by their unbiased estimators as

 ( y )   1  1  Sˆ 2  1
N
 1 1  2
Var S2 
n N 
 b   
Nn i 1  mi M i
Sˆi .

2. Estimation based on first stage unit totals:


1 n M y
Yˆ  yS* 2   i i ( mi )
n i 1 M
1 n
  ui yi ( mi )
n i 1
Mi
where ui  .
M

Bias
1 n 
E ( yS* 2 )  E   ui yi ( mi ) 
 n i 1 
1 n 
 E   ui E2 ( yi ( mi ) i ) 
 n i 1 
1 n 
 E   uiYi 
 n i 1 
1 N

N
u Y
i 1
i i

Y.

Thus yS* 2 is an unbiased estimator of Y .

17
Variance:
Var ( yS* 2 )  Var  E ( yS* 2 n)   E Var ( yS* 2 n) 
1 n  1 n 
 Var   uiYi   E  2  ui2Var ( yi ( mi ) i ) 
 n i 1   n i 1 
1 1  1 N
 1 1  2
    Sb*2 
n N 
u 2
i    Si
nN i 1  mi M i 
1 Mi
where S  2

M i  1 j 1
i ( yij  Yi ) 2

1 N
Sb*2  
N  1 j 1
(uiYi  Y ) 2 .

3. Estimator based on ratio estimator:


n

M y i i ( mi )
Yˆ  yS**2  i 1
n

M
i 1
i

u y i i ( mi )
 i 1
n

u
i 1
i

yS* 2

un

Mi 1 n
where ui  , un   ui .
M n i 1
This estimator can be seen as if arising by the ratio method of estimation as follows:

Let yi*  ui yi ( mi )
Mi
xi*  , i  1, 2,..., N
M
be the values of study variable and auxiliary variable in reference to the ratio method of estimation. Then
1 n *
y   yi  yS* 2
*

n i 1
1 n *
x*   xi  un
n i 1
1 N
X* 
N
X
i 1
*
i  1.

18
The corresponding ratio estimator of Y is
y* y*
YˆR  X *  S 2 1  yS**2 .
x* un

So the bias and mean squared error of yS**2 can be obtained directly from the results of ratio estimator.
Recall that in ratio method of estimation, the bias and MSE of the ratio estimator upto second order of
approximation
is
N n
Bias ( yˆ R )  Y (Cx2  2  Cx C y )
Nn
Var ( x ) Cov( x , y ) 
Y  2
 
 X XY
MSE (YˆR )  Var ( y )  R 2Var ( x )  2 RCov( x , y ) 

Y
where R  .
X

Bias:
The bias of yS**2 up to second order of approximation is

Var ( xS*2 ) Cov( xS*2 , yS* 2 ) 


Bias ( yS**2 )  Y  2
 
 X XY 
1 n
where xS*2 is the mean of auxiliary variable similar to yS* 2 as xS*2   xi ( mi ) .
n i 1

Now we find Cov( xS*2 , yS* 2 ).

 1 n 1 n   1 n 1 n 
Cov( xS*2 , yS* 2 )  Cov  E   ui xi ( mi ) ,  ui yi ( mi )    E Cov   ui xi ( mi ) ,  ui yi ( mi )  
  n i 1 n i 1    n i 1 n i 1 
1 n
1 n
 1 n

 Cov   ui E ( xi ( mi ) ),  ui E ( yi ( mi ) )   E  2  ui2Cov( xi ( mi ) , yi ( mi ) ) i 
 n i 1 n i 1   n i 1 
1 n 1 n  1 n
 1 1  
 Cov   ui X i ,  uiYi   E  2 u 2
i   Sixy 
 n i 1 n i 1  n i 1  mi M i  
1 1  * 1 N
 1 1 
    Sbxy
n N 
 u 2
i   Sixy
nN i 1  mi M i 
where
1 N
*
Sbxy   (ui X i  X )(uiYi  Y )
N  1 i 1
1 Mi
Sixy   ( xij  X i )( yij  Yi ).
M i  1 j 1
19
Similarly, Var ( xS*2 ) can be obtained by replacing x in place of y in Cov( xS*2 , yS* 2 ) as

1 1  1 N
 1 1  2
Var ( xS*2 )     Sbx*2 
n N 
u 2
i   Six
nN i 1  mi M i 
1 N
where Sbx*2   (ui X i  X )2
N  1 i 1
1 Mi
Six*2  
M i  1 i 1
( xij  X i ) 2 .

Substituting Cov( xS*2 , yS* 2 ) and Var ( xS*2 ) in Bias ( yS**2 ), we obtain the approximate bias as

 1 1   Sbx*2 Sbxy
*
 1 N  2  1 1   Six2 Sixy  
Bias ( y )  Y     2 
**
S2   
XY  nN
 ui    2    .
 n N   X   mi M i   X
i 1  XY  

Mean squared error


MSE ( yS**2 )  Var ( yS* 2 )  2 R*Cov( xS*2 , yS* 2 )  R*2Var ( xS*2 )
1 1  1 N
 1 1  2
Var ( yS**2 )     Sby*2 
n N 
u 2
i    Siy
nN i 1  mi M i 
1 1  1 N
 1 1  2
Var ( x )     Sbx*2 
**
S2
n N 
u 2
i    Six
nN i 1  mi M i 
1 1  * 1 N
 1 1 
Cov( xS*2 , yS**2 )     Sbxy
n N 
 u 2
i    Sixy
nN i 1  mi M i 
where
1 N
Sby*2   (uiYi  Y )2
N  1 i 1
1 Mi
Siy*2   ( yij  Yi )2
M i  1 j 1
Y
R*  Y.
X
Thus

1 1  1   1 1  2 
MSE ( yS**2 )      Sby*2  2 R* Sbxy  R*2 Sbx*2     Siy  2 R Sixy  R Six  .
N

n N 
*
 u 2
i  
* *2 2

nN i 1   mi M i  
Also

1 1  1 N 2 1   1 1  2 
ui Yi  R* X i     Siy  2 R Sixy  R Six  .
N

  u
2
MSE ( y )    
**
S2
2
 
* *2 2

 n N  N  1 i 1
i
nN i 1   mi M i  

20
Estimate of variance
Consider
1 n
*
sbxy    ui yi ( mi )  yS* 2  ui xi ( mi )  xS*2  
n  1 i 1  
1 n
sixy    xij  xi ( mi )  yij  yi ( mi ) .
mi  1 j 1 
It can be shown that
1  1 1 
E  sbxy   Sbxy* 
N
*
u 2
i   Sixy
N i 1  mi M i 
E ( sixy )  Sixy .
So
1 n  1 1   1 N   1 1  
E   ui2    sixy    u 2
i    Sixy .
 n i 1  mi M i   N i 1   mi M i  
Thus

1 n  1 1 
Sˆbxy
*
 sbxy
*
  ui2    sixy
n i 1  mi M i 
1 n  1 1  2
Sˆbx*2  sbx
*2
  ui2    six
n i 1  mi M i 
1 n  1 1  2
Sˆby*2  sby
*2
  ui2    siy .
n i 1  mi M i 

Also
 1 n   1 1  2  1 N   1 1  2
E   ui2    six    u 2
i    Six 
 n i 1   mi M i   N i 1   mi M i  
 1 n   1 1  2  1 N   1 1  2
E   ui2    siy    u 2
i    Siy .
 n i 1   mi M i   N i 1   mi M i  

A consistent estimator of MSE of yS**2 can be obtained by substituting the unbiased estimators of

respective statistics in MSE ( yS**2 ) as

 ( y ** )   1  1   s*2  2r * s*  r *2 s*2 
MSE S2   by bxy bx
n N 
1 n 2 1 1  2
  ui     siy  2r sixy  r six 
* *2 2

nN i 1  mi M i 
1 1  1 n
  yi ( mi )  r * xi ( mi ) 
2
  
 n N  n  1 i 1
1 n  2 1 1  2  yS* 2
  ui     siy  2r sixy  r six   where r  * .
* *2 2 *

nN i 1   mi M i   xS 2

21

You might also like