43 Survey Sampling
43 Survey Sampling
43 Survey Sampling
Sample
Survey
Chapter 1
Introduction
Statistical tools can be used on a data set to draw statistical inferences. These statistical inferences are
in turn used for various purposes. For example, government uses such data for policy formulation for
the welfare of the people, marketing companies use the data from consumer surveys to improve the
company and to provide better services to the customer, etc. Such data is obtained through sample
surveys. Sample surveys are conducted throughout the world by governmental as well as non-
governmental agencies. For example, “National Sample Survey Organization (NSSO)” conducts
surveys in India, “Statistics Canada” conducts surveys in Canada, agencies of United Nations like
“World Health Organization (WHO), “Food and Agricultural Organization (FAO)” etc. conduct
surveys in different countries.
Sampling theory provides the tools and techniques for data collection keeping in mind the objectives to
be fulfilled and nature of population.
Sample surveys collect information on a fraction of total population whereas census collect information
on whole population. Some surveys e.g., economic surveys, agricultural surveys etc. are conducted
regularly. Some surveys are need based and are conducted when some need arises, e.g., consumer
satisfaction surveys at a newly opened shopping mall to see the satisfaction level with the amenities
provided in the mall .
1
Sampling unit:
An element or a group of elements on which the observations can be taken is called a sampling unit.
The objective of the survey helps in determining the definition of sampling unit.
For example, if the objective is to determine the total income of all the persons in the household, then
the sampling unit is household. If the objective is to determine the income of any particular person in
the household, then the sampling unit is the income of the particular person in the household. So the
definition of sampling unit depends and varies as per the objective of the survey. Similarly, in another
example, if the objective is to study the blood sugar level, then the sampling unit is the value of blood
sugar level of a person. On the other hand, if the objective is to study the health conditions, then the
sampling unit is the person on whom the readings on the blood sugar level, blood pressure and other
factors will be obtained. These values will together classify the person as healthy or unhealthy.
Population:
Collection of all the sampling units in a given region at a particular point of time or a particular period
is called the population. For example, if the medical facilities in a hospital are to be surveyed through
the patients, then the total number of patients registered in the hospital during the time period of survey
will the population. Similarly, if the production of wheat in a district is to be studied, then all the fields
cultivating wheat in that district will be constitute the population. The total number of sampling units in
the population is the population size, denoted generally by N. The population size can be finite or
infinite (N is large).
Census:
The complete count of population is called census. The observations on all the sampling units in the
population are collected in the census. For example, in India, the census is conducted at every tenth
year in which observations on all the persons staying in India is collected.
Sample:
One or more sampling units are selected from the population according to some specified procedure.
A sample consists only of a portion of the population units. Such a collection of units is called the
sample.
2
In the context of sample surveys, a collection of units like households, people, cities, countries etc. is
called a finite population.
A census is a 100% sample and it is a complete count of the population.
Representative sample:
When all the salient features of the population are present in the sample, then it is called a
representative sample,
It goes without saying that every sample is considered as a representative sample.
For example, if a population has 30% males and 70% females, then we also expect the sample to have
nearly 30% males and 70% females.
In another example, if we take out a handful of wheat from a 100 Kg. bag of wheat, we expect the
same quality of wheat in hand as inside the bag. Similarly, it is expected that a drop of blood will give
the same information as all the blood in the body.
Sampling frame:
The list of all the units of the population to be surveyed constitutes the sampling frame. All the
sampling units in the sampling frame have identification particulars. For example, all the students in a
particular university listed along with their roll numbers constitute the sampling frame. Similarly, the
list of households with the name of head of family or house address constitutes the sampling frame. In
another example, the residents of a city area may be listed in more than one frame - as per automobile
registration as well as the listing in the telephone directory.
3
2. Non-random sample or purposive sample:
The selection of units in the sample from population is not governed by the probability laws.
For example, the units are selected on the basis of personal judgment of the surveyor. The persons
volunteering to take some medical test or to drink a new type of coffee also constitute the sample on
non-random laws.
Another type of sampling is Quota Sampling. The survey in this case is continued until a
predetermined number of units with the characteristic under study are picked up.
For example, in order to conduct an experiment for rare type of disease, the survey is continued till
the required number of patients with the disease are collected.
2. Organizaton of work:
It is easier to manage the organization of collection of smaller number of units than all the units
in a census. For example, in order to draw a representative sample from a state, it is easier to
manage to draw small samples from every city than drawing the sample from the whole state at
a time. This ultimately results in more accuracy in the statistical inferences because better
organization provides better data and in turn, improved statistical inferences are obtained.
4
3. Greater accuracy:
The persons involved in the collection of data are trained personals. They can collect the data
more accurately if they have to collect smaller number of units than large number of units.
5. Feasibility:
Conducting the experiment on smaller number of units, particularly when the units are
destroyed, is more feasible. For example, in determining the life of bulbs, it is more feasible to
fuse minimum number of bulbs. Similarly, in any medical experiment, it is more feasible to use
less number of animals.
Type of surveys:
There are various types of surveys which are conducted on the basis of the objectives to be fulfilled.
1. Demographic surveys:
These surveys are conducted to collect the demographic data, e.g., household surveys, family size,
number of males in families, etc. Such surveys are useful in the policy formulation for any city, state or
country for the welfare of the people.
2. Educational surveys:
These surveys are conducted to collect the educational data, e.g., how many children go to school, how
many persons are graduate, etc. Such surveys are conducted to examine the educational programs in
schools and colleges. Generally, schools are selected first and then the students from each school
constitue the sample.
5
3. Economic surveys:
These surveys are conducted to collect the economic data, e.g., data related to export and import of
goods, industrial production, consumer expenditure etc. Such data is helpful in constructing the indices
indicating the growth in a particular sector of economy or even the overall economic growth of the
country.
4. Employment surveys:
These surveys are conducted to collect the employment related data, e.g., employment rate, labour
conditions, wages, etc. in a city, state or country. Such data helps in constructing various indices to
know the employment conditions among the people.
6. Agricultural surveys:
These surveys are conducted to collect the agriculture related data to estimate, e.g., the acreage and
production of crops, livestock numbers, use of fertilizers, use of pesticides and other related topics. The
government bases its planning related to the food issues for the people based on such surveys.
7. Marketing surveys:
These surveys are conducted to collect the data related to marketing. They are conducted by major
companies, manufacturers or those who provide services to consumer etc. Such data is used for
knowing the satisfaction and opinion of consumers as well as in developing the sales, purchase and
promotional activities etc.
8. Election surveys:
These surveys are conducted to study the outcome of an election or a poll. For example, such polls are
conducted in democratic countries to have the opinions of people about any candidate who is contesting
the election.
6
9. Public polls and surveys:
These surveys are conducted to collect the public opinion on any particular issue. Such surveys are
generally conducted by the news media and the agencies which conduct polls and surveys on the
current topics of interest to public.
2. Population to be sampled:
Based on the objectives of the survey, decide the population from which the information can be
obtained. For example, population of farmers is to be sampled for an agricultural survey whereas the
population of patients has to be sampled for determining the medical facilities in a hospital.
3. Data to be collected:
It is important to decide that which data is relevant for fulfilling the objectives of the survey and to
note that no essential data is omitted. Sometimes, too many questions are asked and some of their
outcomes are never utilized. This lowers the quality of the responses and in turn results in lower
efficiency in the statistical inferences.
7
4. Degree of precision required:
The results of any sample survey are always subjected to some uncertainty. Such uncertainty can be
reduced by taking larger samples or using superior instruments. This involves more cost and more time.
So it is very important to decide about the required degree of precision in the data. This needs to be
conveyed to the surveyor also.
5. Method of measurement:
The choice of measuring instrument and the method to measure the data from the population needs to
be specified clearly. For example, the data has to be collected through interview, questionnaire,
personal visit, combination of any of these approaches, etc. The forms in which the data is to be
recorded so that the data can be transferred to mechanical equipment for easily creating the data
summary etc. is also needed to be prepared accordingly.
6. The frame:
The sampling frame has to be clearly specified. The population is divided into sampling units such that
the units cover the whole population and every sampling unit is tagged with identification. The list of
all sampling units is called the frame. The frame must cover the whole population and the units must
not overlap each other in the sense that every element in the population must belong to one and only
one unit. For example, the sampling unit can be an individual member in the family or the whole
family.
7. Selection of sample:
The size of the sample needs to be specified for the given sampling plan. This helps in determining and
comparing the relative cost and time of different sampling plans. The method and plan adopted for
drawing a representative sample should also be detailed.
8. The Pre-test:
It is advised to try the questionnaire and field methods on a small scale. This may reveal some troubles
and problems beforehand which the surveyor may face in the field in large scale surveys.
8
9. Organization of the field work:
How to conduct the survey, how to handle business administrative issues, providing proper training to
surveyors, procedures, plans for handling the non-response and missing observations etc. are some of
the issues which need to be addressed for organizing the survey work in the fields. The procedure for
early checking of the quality of return should be prescribed. It should be clarified how to handle the
situation when the respondent is not available.
9
Variability control in sample surveys:
The variability control is an important issue in any statistical analysis. A general objective is to draw
statistical inferences with minimum variability. There are various types of sampling schemes which are
adopted in different conditions. These schemes help in controlling the variability at different stages.
Such sampling schemes can be classified in the following way.
10
2. Personal interview:
The surveyor is supplied with a well prepared questionnaire. The surveyor goes to the respondents and
asks the same questions mentioned in the questionnaire. The data in the questionnaire is then filled up
accordingly based on the responses from the respondents.
3. Mail enquiry:
The well prepared questionnaire is sent to the respondents through postal mail, e-mail, etc. The
respondents are requested to fill up the questionnaires and send it back. In case of postal mail, many
times the questionnaires are accompanied by a self addressed envelope with postage stamps to avoid
any non-response due to the cost of postage.
5. Registration:
The respondent is required to register the data at some designated place. For example, the number of
births and deaths along with the details provided by the family members are recorded at city municipal
office which are provided by the family members.
The methods in (1) to (5) provide primary data which means collecting the data directly from the
source. The method in (6) provides the secondary data which means getting the data from the primary
sources.
11
Chapter -2
Simple Random Sampling
1
Such process can be implemented through programming and using the discrete uniform distribution.
Any number between 1 and N can be generated from this distribution and corresponding unit can be
seleced into the sample by associating an index with each sampling unit. Many statistical softwares
like R, SAS, etc. have inbuilt functions for drawing a sample using SRSWOR or SRSWR.
Notations:
The following notations will be used in further notes:
1 n
y= ∑ yi : sample mean
n i =1
N
1
Y =
N
∑y
i =1
i : population mean
1 N 1 N
=
=∑
S2 (Yi =
N −1 i 1=
− Y )2 (∑ Yi 2 − NY 2 )
N −1 i 1
1 N 1 N
=
σ2 =
= ∑ (Yi − Y )2 =(∑ Yi 2 − NY 2 )
N i 1= N i1
n n
1 1
=
=
s 2
∑ ( yi =
n −1 i 1=
− y) 2
(∑ yi2 − ny 2 )
n −1 i 1
Pj (i=
) P1 (i ) + P2 (i ) + ... + Pn (i )
1 1 1
= + + ... + (n times )
N N N
n
=
N
Now if u1 , u2 ,..., un are the n units selected in the sample, then the probability of their selection is
Alternative approach:
The probability of drawing a sample in SRSWOR can alternatively be found as follows:
Let ui ( k ) denotes the ith unit drawn at the kth draw. Note that the ith unit can be any unit out of the N
units. Then so = (ui (1) , ui (2) ,..., ui ( n ) ) is an ordered sample in which the order of the units in which they
are drawn, i.e., ui (1) drawn at the first draw, ui (2) drawn at the second draw and so on, is also
Here P(ui ( k ) | ui (1)ui (2) ...ui ( k −1) ) is the probability of drawing ui ( k ) at the kth draw given that
ui (1) , ui (2) ,..., ui ( k −1) have already been drawn in the first (k – 1) draws.
3
Such probability is obtained as
1
P (ui ( k ) | ui (1)ui (2) ...ui ( k −1) ) = .
N − k +1
So
n
1 ( N − n)!
=P( so ) ∏
=
N − k +1
k =1 N!
.
( N − n)! 1
=
irrelevant n=
! .
N! N
n
2. SRSWR
When n units are selected with SRSWR, the total number of possible samples are N n . The
1
Probability of drawing a sample is .
Nn
Alternatively, let ui be the ith unit selected in the sample. This unit can be selected in the sample
either at first draw, second draw, …, or nth draw. At any stage, there are always N units in the
population in case of SRSWR, so the probability of selection of ui at any stage is 1/N for all i =
4
Probability of drawing an unit
1. SRSWOR
Let Ae denotes an event that a particular unit u j is not selected at the th draw. The
2. SRSWR
1
P[ selection of u j at kth draw] = .
N
SRSWOR
n
Let ti = ∑ yi . Then
i =1
n
1
E( y ) = E (∑ yi )
n i =1
1
= E ( ti )
n
N
1 1 n
= ∑ ti
n N i =1
n
N
1 1 n n
=
n N=
∑ ∑
i 1= i 1
yi .
n
When n units are sampled from N units by without replacement , then each unit of the population
can occur with other units selected out of the remaining ( N − 1) units is the population and each unit
N − 1 N
occurs in of the possible samples. So
n −1 n
N
n
n
N − 1 N
So ∑ ∑ y = n − 1 ∑ y .
i i
=i 1 =i 1
=i 1
Now
( N − 1)! n !( N − n)! N
E( y ) =
(n − 1)!( N − n)! nN!
∑i =1
yi
N
1
=
N
∑y
i =1
i
=Y.
6
Thus y is an unbiased estimator of Y . Alternatively, the following approach can also be adopted to
show the unbiasedness property.
n
1
E( y ) =
n
∑j =1
E( y j )
1 n N
= ∑ ∑
i 1
n=j 1 =
Yi Pj (i )
1 n N 1
= ∑ ∑ Yi .
i 1 N
n=j 1 =
n
1
=
n
∑Y
j =1
=Y
SRSWR
n
1
E( y ) = E (∑ yi )
n i =1
1 n
= ∑ E ( yi )
n i =1
1 n
= ∑ (Y1P1 + .. + YN P)
n i =1
1 n
=
n
∑Y
=Y.
1
where Pi = for all i = 1, 2,..., N is the probability of selection of a unit. Thus y is an unbiased
N
estimator of population mean under SRSWR also.
7
Variance of the estimate
Assume that each observation has some variance σ 2 . Then
V (=
y ) E ( y − Y )2
2
1 n
= E ∑ ( yi − Y )
n i =1
1 n 1 n n
= E 2 ∑ ( yi − Y ) 2 + 2 ∑∑ ( yi − Y )( y j − Y )
= n i 1 n i ≠j
n n n
1 1
= 2 ∑ E ( yi − Y ) 2 + 2 ∑∑ E ( yi − Y )( y j − Y )
n n i ≠j
1 n 2 K
=
n2
∑ σ + n2
N −1 2 K
= S + 2
Nn n
n n
where =
K ∑∑ E ( y − Y )( y − Y )
i ≠j
i i assuming that each observation has variance σ 2 . Now we find
SRSWOR
n n
=
K ∑∑ E ( y − Y )( y − Y ) .
i ≠j
i i
Consider
N N
1
E ( y=
i − Y )( y j − Y ) ∑∑ ( yk − Y )( ye − Y )
N ( N − 1) k ≠
Since
2
N N N N
∑ k − ∑ k
= − + ∑∑ ( yk − Y )( y − Y ))
2
( y Y ) ( y Y )
= k 1= i 1 k ≠
N N
0 =( N − 1) S 2 + ∑∑ ( yk − Y )( y − Y )
k ≠
N N
1
∑∑ ( y
k ≠
k − Y )( y=
−Y )
N ( N − 1)
[−( N − 1) S 2 ]
S2
= − .
N
8
S2
Thus K =
−n(n − 1) and so substituting the value of K , the variance of y under SRSWOR is
N
N −1 2 1 S2
V ( yWOR )= S − 2 n(n − 1)
Nn n N
N −n 2
= S .
Nn
SRSWR
N N
=
K ∑∑ E ( y − Y )( y − Y )
i ≠j
i i
N N
= ∑∑ E ( y − Y ) E ( y
i ≠j
i je −Y )
=0
because the ith and jth draws (i ≠ j ) are independent.
Thus the variance of y under SRSWR is
N −1 2
V ( yWR ) = S .
Nn
It is to be noted that if N is infinite (large enough), then
S2
V ( y) =
n
N −n
is both the cases of SRSWOR and SRSWR. So the factor is responsible for changing the
N
variance of y when the sample is drawn from a finite population in comparison to an infinite
N −n
population. This is why is called a finite population correction (fpc) . It may be noted that
N
N −n n N −n n
= 1 − , so is close to 1 if the ratio of sample size to population , is very small or
N N N N
n
negligible. The term is called sampling fraction. In practice, fpc can be ignored whenever
N
n
< 5% and for many purposes even if it is as high as 10%. Ignoring fpc will result in the
N
overestimation of variance of y .
9
Efficiency of y under SRSWOR over SRSWR
N −n 2
V ( yWOR ) = S
Nn
N −1 2
V ( yWR ) = S
Nn
N − n 2 n −1 2
= S + S
Nn Nn
= V ( yWOR ) + a positive quantity
Thus
Consider
1 n
=s2 ∑
n − 1 i =1
( yi − y ) 2
2
1 n
= ∑ ( yi − Y ) − ( y − Y )
n − 1 i =1
1 n
= ∑
n − 1 i =1
( yi − y ) 2 − n( y − Y ) 2
1 n
=
E (s 2 ) ∑
n − 1 i =1
E ( yi − Y ) 2 − nE ( y − Y ) 2
1 n 1
= ∑
n − 1 i =1
Var ( yi ) − nVar ( y ) =
n −1
nσ 2 − nVar ( y )
10
In case of SRSWOR
N −n 2
V ( yWOR ) = S
Nn
and so
n 2 N −n 2
=
E (s 2 ) σ − S
n − 1 Nn
n N −1 2 N − n 2
= S − S
n − 1 N Nn
= S2
In case of SRSWR
N −1 2
V ( yWR ) = S
Nn
and so
n 2 N −n 2
=
E (s 2 ) σ − S
n − 1 Nn
n N −1 2 N − n 2
= S − S
n − 1 N Nn
N −1 2
= S
N
=σ2
Hence
S 2 is SRSWOR
E (s2 ) = 2
σ is SRSWR
11
Standard errors
The standard error of y is defined as Var ( y ) .
In order to estimate the standard error, one simple option is to consider the square root of estimate of
variance of sample mean.
N −n
• under SRSWOR, a possible estimator is σˆ ( y ) = s.
Nn
N −1
• under SRSWR, a possible estimator is σˆ ( y ) = s.
Nn
( y) .
It is to be noted that this estimator does not possess the same properties as of Var
Reason being if θˆ is an estimator of θ , then θ is not necessarily an estimator of θ .
In fact, the σˆ ( y ) is a negatively biased estimator under SRSWOR.
Consider s as an estimator of S .
Let
S 2 + ε with E (ε ) =
s2 = 0, E (ε 2 ) =
S 2.
Write
s ( S 2 + ε )1/2
=
ε
1/2
= S 1 + 2
S
ε ε2
= S 1 + 2 − 4 + ...
2S 8S
assuming ε will be small as compared to S 2 and as n becomes large, the probability of such an
event approaches one. Neglecting the powers of ε higher than two and taking expectation, we have
12
Var ( s 2 )
E ( s=
) 1 −
8S 4
S
where
2S 4 n − 1
Var ( s ) =
2
1+ ( β 2 − 3) ) for large N .
(n − 1) 2n
N j
∑ (Y − Y )
1
=µj i
N i =1
µ4
β2 = : coefficient of kurtosis.
S4
Thus
1 β − 3
E (s) =
S 1 − − 2
4(n − 1) 8n
2
1 Var ( s 2 )
Var ( s ) = S − S 1 −
2 2
4
8 S
2
Var ( s )
=
4S 2
S 2 n −1
= 1+ ( β 2 − 3) .
2 ( n − 1) 2n
Note that for a normal distribution, β 2 = 3 and we obtain
S2
Var ( s ) = .
2 ( n − 1)
Both Var ( s ) and Var ( s 2 ) are inflated due to nonnormality to the same extent, by the inflation factor
n −1
1 + 2n ( β 2 − 3)
and this does not depends on coefficient of skewness.
This is an important result to be kept in mind while determining the sample size in which it is
assumed that S 2 is known. If inflation factor is ignored and population is non-normal, then the
reliability on s 2 may be misleading.
13
Alternative approach:
The results for the unbiasedness property and the variance of sample mean can also be proved in an
alternative way as follows:
(i) SRSWOR
With the ith unit of the population, we associate a random variable ai defined as follows:
Then,
E (ai ) = 1× Probability that the i th unit is included in the sample
n
= = , i 1, 2,..., N .
N
E (ai2 ) = 1× Probability that the i th unit is included in the sample
n
= =, i 1, 2,..., N
N
E (ai a j ) = 1× Probability that the i th and j th units are included in the sample
n(n − 1)
= = , i ≠ j 1, 2,..., N .
N ( N − 1)
From these results, we can obtain
n( N − n)
Var (ai ) = E (ai2 ) − ( E (ai ) ) = 2 , i =
2
1, 2,..., N
N
n( N − n)
Cov(ai= , a j ) E (ai a j ) − E (ai ) E=
(a j ) ,= i ≠ j 1, 2,..., N .
N 2 ( N − 1)
We can rewrite the sample mean as
1 N
y= ∑ ai yi
n i =1
Then
1 N
=E( y ) = ∑ E (ai ) yi Y
n i =1
and
1 N 1 N N
Var ( y ) = = 2
Var ∑ a
i =1=
i i
y 2 ∑
n i 1
Var ( ai ) yi
2
+ ∑ Cov(ai , a j ) yi y j .
n i≠ j
14
Substituting the values of Var (ai ) and Cov(ai , a j ) in the expression of Var ( y ) and simplifying, we
get
N −n 2
Var ( y ) = S .
Nn
To show that E ( s 2 ) = S 2 , consider
1 n 2 2 1 N
=
=s2 ∑ y
(n − 1) i 1 =
i −
= ny ∑
(n − 1) i 1
ai yi2 − ny 2 .
Hence, taking, expectation, we get
1 N
=
E (s 2 ) ∑ E (ai ) yi2 − n {Var ( y ) + Y 2 }
(n − 1) i =1
Substituting the values of E (ai ) and Var ( y ) in this expression and simplifying, we get E ( s 2 ) = S 2 .
(ii) SRSWR
Let a random variable ai associated with the ith unit of the population denotes the number of times
the ith unit occurs in the sample i = 1, 2,..., N . So ai assumes values 0, 1, 2,…,n. The joint
n! 1
P(a1 , a2 ,..., aN ) = N
.
Nn
∏a !
i =1
i
N
where ∑a i =1
i = n. For this multinomial distribution, we have
n
E (ai ) = ,
N
n( N − 1)
=
Var (ai ) = , i 1, 2,..., N .
N2
n
Cov(ai , a j ) =− 2 , i ≠ j = 1, 2,..., N .
N
We rewrite the sample mean as
1 N
y= ∑ ai yi .
n i =1
Hence, taking expectation of y and substituting the value of E (ai ) = n / N we obtain that
E( y ) = Y .
15
Further,
1 N N
2 ∑ ∑
=Var ( y ) Var ( ai ) yi
2
+ Cov(ai , a j ) yi y j
=n i 1 =i 1
Substituting, the values of Var (ai ) =
n( N − 1) / N 2 and Cov(ai , a j ) =
−n / N 2 and simplifying, we get
N −1 2
Var ( y ) = S .
Nn
N −1 2
To prove that=
E (s 2 ) = S σ 2 in SRSWR, consider
N
n N
(n − 1) s 2 =
=i 1 =i 1
∑ yi2 − ny 2 = ∑a y i
2
i − ny 2 ,
− n {Var ( y ) + Y 2 }
N
(n − 1) E ( s 2=
) ∑ E (a ) y
i =1
i
2
i
n N ( N − 1) 2
=∑ yi2 − n. S − nY 2
N i =1 nN
(n − 1)( N − 1) 2
= S
N
N −1 2
=
E (s 2 ) = S σ2
N
YˆT = NYˆ
= Ny .
16
Obviously
( )
E YˆT = NE ( y )
= NY
( )
Var YˆT = N 2 ( y )
2 N − n 2 N ( N − n) 2
N Nn S = S for SRSWOR
n
=
N 2 N − 1 S 2 = N ( N − 1) S 2 for SRSWOR
Nn n
N ( N − n) 2
s for SRSWOR
Var (YT ) =
ˆ n
N s2 for SRSWOR
n
y −Y
population is normally distributed N ( µ , σ 2 ) with mean µ and variance σ 2 . then
Var ( y )
follows N (0,1) when σ 2 is known. If σ 2 is unknown and is estimated from the sample then
y −Y
follows a t -distribution with (n − 1) degrees of freedom. When σ 2 is known, then the
Var ( y )
100( 1 − α ) % confidence interval is given by
y −Y
P −Zα ≤ ≤ Zα 1 α
=−
2 Var ( y ) 2
or P y − Z α Var ( y ) ≤ y ≤ y + Zα Var ( y ) =1 − α
2 2
and the confidence limits are
y − Zα Var ( y ), y + Z α Var ( y
2 2
17
α
when Z α denotes the upper % points on N (0,1) distribution. Similarly, when σ 2 is unknown,
2 2
then the 100(1- 1 − α ) % confidence interval is
y −Y
P −tα ≤ ≤ tα =1 − α
2 Varˆ( y ) 2
or P y − tα ≤ Varˆ( y ) ≤ y ≤ y + tα Varˆ( y ) =1 − α
2 2
and the confidence limits are
y − tα ≤ Varˆ( y ) ≤ y + tα Varˆ( y )
2 2
α
where tα denotes the upper % points on t -distribution with (n − 1) degrees of freedom.
2 2
An important constraint or need to determine the sample size is that the information regarding the
population standard derivation S should be known for these criterion. The reason and need for this
will be clear when we derive the sample size in the next section. A question arises about how to
have information about S before hand? The possible solutions to this issue are to conduct a pilot
survey and collect a preliminary sample of small size, estimate S and use it as known value of S
it. Alternatively, such information can also be collected from past data, past experience, long
association of experimenter with the experiment, prior information etc.
Now we find the sample size under different criteria assuming that the samples have been drawn
using SRSWOR. The case for SRSWR can be derived similarly.
18
1. Prespecified variance
The sample size is to be determined such that the variance of y should not exceed a given value, say
V. In this case, find n such that
Var ( y ) ≤ V
N −n
or ( y) ≤ V
Nn
N −n 2
or S ≤V
Nn
1 1 V
or − ≤ 2
n N S
1 1 1
or − ≤
n N ne
ne
n≥
n
1+ e
N
S2
where ne = .
v
It may be noted here that ne can be known only when S 2 is known. This reason compels to assume
that S should be known. The same reason will also be seen in other cases.
The smallest sample size needed in this case is
ne
nsmallest = .
ne
1+
N
It N is large, then the required n is
n ≥ ne and nsmallest = ne .
P y − Y ≤ e = (1 − α ).
19
N −n 2
Since y follows N (Y , S ) assuming the normal distribution for the population, we can write
Nn
y −Y e
P ≤ =1−α
Var ( y ) Var ( y )
or Z α2 Var ( y ) = e 2
2
N −n 2
or Z α2 S = e2
2 Nn
Z S 2
α2
e
or n =
Zα S
2
1 2
1+
N e
which is the required sample size. If N is large then
2
Zα S
n = 2e .
2 Z α Var ( y ) ≤ W
2
N −n
2Z α S ≤W
2 Nn
1 1
or 4Z α2 − S 2 ≤ W 2
2 n N
20
1 1 W2
or ≤ +
n N 4 Z α2 S 2
2
4 Z α2 S 2
2
or n ≥ W2 .
4 Z α2 S 2
1+ 2
NW 2
The minimum sample size required is
4 Z α2 S 2
2
nsmallest = W2
4 Z α2 S 2
1+ 2
NW 2
If N is large then
4Z α2 S 2
n≥ 2
W2
and the minimum sample size needed is
4Z α2 S 2
nsmallest = 2
.
W2
If it is desired that the the coefficient of variation of y should not exceed a given or prespecified
value of coefficient of variation, say C0 , then the required sample size n is to be determined such
that
CV ( y ) ≤ C0
Var ( y )
or ≤ C0
Y
21
N −n 2
S
or Nn 2 ≤ C02
Y
1 1 C02
or − ≤
n N C2
C2
Co2
or n ≥
C2
1+
NC02
S
is the required sample size where C = is the population coefficient of variation.
Y
The smallest sample size needed in this case is
C2
C02
nsmallest = .
C2
1+
NC02
If N is large, then
C2
n≥
C02
C2
and nsmalest = 2
C0
y −Y
as . If it is required that such relative estimation error should not exceed a prespecified value
Y
R with probability (1 − α ) , then such requirement can be satisfied by expressing it like such
requirement can be satisfied by expressing it like
y −Y RY
P ≤ =1−α.
Var ( y ) Var ( y )
N −n 2
Assuming the population to be normally distributed, y follows N Y , S .
Nn
22
So it can be written that
RY
= Zα .
Var ( y ) 2
N −n 2
or Z α2 S = R Y
2 2
2 Nn
1 1 R2
or − =
n N C Zα
2 2
2
Zα C
2
R
or n =
2
Zα C
1
1+ 2
N R
S
where C = is the population coefficient of variation and should be known.
Y
If N is large, then
2
zα C
n= 2 .
R
6. Prespecified cost
Let an amount of money C is being designated for sample survey to called n observations, C0 be
the overhead cost and C1 be the cost of collection of one unit in the sample. Then the total cost C
can be expressed as
= C0 + nC1
C
C − C0
Or n =
C1
is the required sample size.
23
Chapter 3
Sampling For Proportions and Percentages
In many situations, the characteristic under study on which the observations are collected are
qualitative in nature. For example, the responses of customers in many marketing surveys are
based on replies like ‘yes’ or ‘no’ , ‘agree’ or ‘disagree’ etc. Sometimes the respondents are
asked to arrange several options in the order like first choice, second choice etc. Sometimes
the objective of the survey is to estimate the proportion or the percentage of brown eyed
persons, unemployed persons, graduate persons or persons favoring a proposal, etc. In such
situations, the first question arises how to do the sampling and secondly how to estimate the
population parameters like population mean, population variance, etc.
Sampling procedure:
The same sampling procedures that are used for drawing a sample in case of quantitative
characteristics can also be used for drawing a sample for qualitative characteristic. So, the
sampling procedures remain same irrespective of the nature of characteristic under study -
either qualitative or quantitative. For example, the SRSWOR and SRSWR procedures for
drawing the samples remain the same for qualitative and quantitative characteristics. Similarly,
other sampling schemes like stratified sampling, two stage sampling etc. also remain same.
Consider a qualitative characteristic based on which the population can be divided into two
mutually exclusive classes, say C and C*. For example, if C is the part of population of
persons saying ‘yes’ or ‘agreeing’ with the proposal then C* is the part of population of persons
saying ‘no’ or ‘disagreeing’ with the proposal. Let A be the number of units in C and (N - A)
units in C* be in a population of size N. Then the proportion of units in C is
A
P=
N
and the proportion of units in C* is
N−A
Q= = 1 − P.
N
1
An indicator variable Y can be associated with the characteristic under study and then for i =
1,2,..,N
1 i th unit belongs to C
Yi =
0 i th unit belongs to C *.
∑Y i
A
Y= i =1
= = P.
N N
Suppose a sample of size n is drawn from a population of size N by simple random sampling .
Let a be the number of units in the sample which fall into class C and (n − a ) units fall in class
C*, then the sample proportion of units in C is
a
p= .
n
which can be written as
n
a ∑y i
p= = i =1
= y.
n n
N
Since ∑Y =
i =1
i
2
A= NP, so we can write S 2 and s 2 in terms of P and Q as follows:
1 N
=S 2
∑
N − 1 i =1
(Yi − Y ) 2
N
1
= (∑ Yi 2 − NY 2 )
N − 1 i =1
1
= ( NP − NP 2 )
N −1
N
= PQ.
N −1
n
Similarly, ∑y=
i =1
2
i a= np and
2
1 n
=s2 ∑ ( yi − y )2
n − 1 i =1
n
1
= (∑ yi2 − ny 2 )
n − 1 i =1
1
= (np − np 2 )
n −1
n
= pq.
n −1
Note that the quantities y , Y , s 2 and S 2 have been expressed as functions of sample and
population proportions. Since the sample has been drawn by simple random sampling and
sample proportion is same as the sample mean, so the properties of sample proportion in
SRSWOR and SRSWR can be derived using the properties of sample mean directly.
1. SRSWOR
Since sample mean y an unbiased estimator of population mean Y , i.e. E ( y ) = Y in case of
SRSWOR, so
E ( p=
) E( y=
) Y= P
and p is an unbiased estimator of P.
N −n 2
=
Var =
( p ) Var (y) S
Nn
N −n N
= . PQ
Nn N − 1
N − n PQ
= . .
N −1 n
Similarly, using the estimate of Var ( y ), the estimate of variance can be derived as
N −n 2
=
Var =
( p ) Var ( y) s
Nn
N −n n
= pq
Nn n − 1
N −n
= pq.
N (n − 1)
(ii) SRSWR
Since the sample mean y is an unbiased estimator of population mean Y in case of SRSWR,
so the sample proportion,
3
E ( p=
) E( y=
) Y= P,
i.e., p is an unbiased estimator of P.
Using the expression of variance of y and its estimate in case of SRSWR, the variance of p and
its estimate can be derived as follows:
N −1 2
=
Var =
( p ) Var ( y) S
Nn
N −1 N
= PQ
Nn N − 1
PQ
=
n
( p ) = n . pq
Var
n −1 n
pq
= .
n −1
p − Z α Var ( p ), p + Z α Var ( p ) .
2 2
4
It may be noted that in this case, a discrete random variable is being approximated by a
continuous random variable, so a continuity correction n/2 can be introduced in the confidence
limits and the limits become
n n
p − Z α Var ( p ) + , p + Z α Var ( p ) −
2
2 2
2
Consider a situation in which the sampling units in a population are divided into two mutually
exclusive classes. Let P and Q be the proportions of sampling units in the population belonging
to classes ‘1’ and ‘2’ respectively. Then NP and NQ are the total number of sampling units in
the population belonging to class ‘1’ and ‘2’, respectively and so NP + NQ = N. The
probability that in a sample of n selected units out of N units by SRS such that n1 selected
units belongs to class ‘1’ and n2 selected units belongs to class ‘2’ is governed by the
hypergeometric distribution and
NP NQ
P(n1 ) = n1 n2
.
N
n
As N grows large, the hypergeometric distribution tends to Binomial distribution and P(n1 ) is
approximated by
n
=
P (n1 ) p n1 (1 − p ) n2
n1
Inverse sampling
In general, it is understood in the SRS methodology for qualitative characteristic that the
attribute under study is not a rare attribute. If the attribute is rare, then the procedure of
estimating the population proportion P by sample proportion n / N is not suitable. Some such
situations are, e.g., estimation of frequency of rare type of genes, proportion of some rare type
5
of cancer cells in a biopsy, proportion of rare type of blood cells affecting the red blood cells
etc. In such cases, the methodology of inverse sampling can be used.
Let m denotes the predetermined number indicating the number of units possessing the
characteristic. The sampling is continued till m number of units are obtained. Therefore, the
sample size n required to attain m becomes a random variable.
6
Estimate of population proportion
m −1
Consider the expectation of .
n −1
m + NQ
m −1 m −1
E =
n −1
∑ n − 1 P(n)
n=m
NP NQ
m − 1 m − 1 n − m Np − m + 1
m + NQ
= ∑
n=m
n −1
N
.
N − n +1
n − 1
NP − 1 NQ
NP − m + 1 m − 2 n − m
m + NQ −1
= ∑
n=m
N − n +1
N − 1
n−2
which is obtained by replacing NP by NP – 1, m by (m – 1) and n by (n - 1) in the earlier
step. Thus
m −1
E = P.
n −1
m −1
So Pˆ = is an unbiased estimator of P.
n −1
Estimate of variance of P̂
Now we derive an estimate of variance of P̂ . By definition
2
=
Var ( Pˆ ) E ( Pˆ 2 ) − E ( Pˆ )
=E ( Pˆ 2 ) − P 2 .
Thus
( Pˆ=
Var ) Pˆ 2 − Estimate of P 2 .
(m − 1)(m − 2)
In order to obtain an estimate of P 2 , consider the expectation of , i.e.,
(n − 1)(n − 2)
(m − 1)(m − 2) (m − 1)(m − 2)
E = ∑ P ( n )
(n − 1)(n − 2) n≥ m (n − 1)(n − 2)
NP − 2 NQ
P( NP − 1) NP − m + 1 m − 3 n − m
= ∑
N − 1 n≥m N − n + 1 N − 2
n−3
7
where the last term inside the square bracket is obtained by replacing NP by ( NP − 2), N by
(n − 2) and m by (m - 2) in the probability distribution function of hypergeometric distribution.
This solves further to
(m − 1)(m − 2) NP 2 P
E = − .
(n − 1)(n − 2) N − 1 N − 1
Thus an unbiased estimate of P 2 is
N − 1 (m − 1)(m − 2) Pˆ
=
Estimate of P 2 +
N (n − 1)(n − 2) N
N − 1 (m − 1)(m − 2) 1 m −1
= + . .
N (n − 1)(n − 2) N n −1
= − . +
n − 1 N (n − 1)(n − 2) N n − 1
m − 1 m − 1 1 ( N − 1)(m − 2)
= + 1 − .
n − 1 n − 1 N n−2
For large N , the hypergeometric distribution tends to negative Binomial distribution with
n − 1 m n−m
probability density function P Q . So
m − 1
m −1
Pˆ =
n −1
and
m − 1)(n − m) Pˆ (1 − Pˆ )
( Pˆ ) (=
=
Var .
(n − 1) 2 (n − 2) n−2
8
Estimation of proportion for more than two classes
We have assumed up to now that there are only two classes in which the population can be
divided based on a qualitative characteristic. There can be situations when the population is to
be divided into more than two classes. For example, the taste of a coffee can be divided into
four categories very strong, strong, mild and very mild. Similarly in another example the
damage to crop due to storm can be classified into categories like heavily damaged, damaged,
minor damage and no damage etc.
These type of situations can be represented by dividing the population of size N into, say k,
mutually exclusive classes C1 , C2 ,..., Ck . Corresponding to these classes, let
C1 C2 Ck
=P1 = , P2 =
,..., Pk , be the proportions of units in the classes C1 , C2 ,..., Ck
N N n
respectively.
Let a sample of size n is observed such that c1 , c2 ,..., ck number of units have been drawn from
C1 C2 Ck
...
P (c1 , c2 ,..., ck ) = 1 2 k .
c c c
N
n
ci
The population proportions Pi can be estimated by =
pi = , i 1, 2,..., k .
n
It can be easily shown that
E=
( pi ) P=
i, i 1, 2,..., k ,
N − n PQ
Var ( pi ) = i i
N −1 n
and
( p ) = N − n pi qi
Var
N n −1
i
and
(Cˆ ) = N 2 Var
Var ( p ).
i i
9
The confidence intervals can be obtained based on single pi as in the case of two classes.
10
Chapter 4
Stratified Sampling
An important objective in any estimation problems is to obtain an estimator of a population parameter
which can take care of all salient features of the population. If the population is homogeneous with
respect to the characteristic under study, then the method of simple random sampling will yield a
homogeneous sample and sample mean will serve as a good estimator of population mean. Thus if the
population is homogeneous with respect to the characteristic under study, then the sample drawn
through simple random sampling is expected to provide a representative sample. Moreover, the
variance of sample mean not only depends on the sample size and sampling fraction but also on the
population variance. In order to increase the precision of an estimator is to use a sampling scheme
which reduces the heterogeneity in the population. If the population is heterogeneous with respect to
the characteristic under study, then one such sampling procedure is stratified sampling.
Example: In order to find the average height of students in a school of class 1 to class 12, the height
varies a lot as the students in class 1 are of age around 6 years and students in class 10 one of age
around 16 years. So one can divide all the students into different subpopulations or strata such as
Notations:
1
We use the following symbols and notations:
N : Population size
k : Number of strata
N1 : Number of sampling units in ith strata
k
N Ni
i 1
Stratam are constructed such that they are nonoverlapping and homogeneous with respect to the
k
characteristic under study such that N
i 1
i N.
Draw a sample of size ni from ith ( i 1, 2,..., k ) stratum using SRS (preferably WOR)
independently from each stratum.
All the sampling units drawn from each stratum will constitute a stratified sample of size
k
n ni .
i 1
In cluster sampling, the clusters are constructed such that they are
within heterogeneous and
among homogeneous.
[Note: We consider cluster sampling later]
2
Issue in estimation in stratified sampling
Note that there are k independent samples drawn through SRS of sizes n1 , n2 ,..., nk . So one can have k
estimators of a parameter based sizes n1 , n2 ,..., nk . The ultimate goal is not to have k different
estimators of the parameters but a single estimator. In this case, the issue is how to combine the
different sample information together into one estimator which is good enough to provide the
information about the parameter.
ni
1
yi
ni
y
j 1
ij : sample mean from ith stratum or stratum mean.
1 k k
Y ii
N i 1
N Y
i 1
wY
i i : population mean
Note that the population mean is defined as the weighted arithmetic mean of stratum means in case of
stratified sampling where the weights are provided in terms of strata sizes.
1 k
Based on the expression Y NiYi , one may choose the sample mean
N i 1
1 k
y ni yi ,
n i 1
as a possible estimator of Y .
3
1 k
E( y ) ni E ( yi )
n i 1
1 k
ni Yi
n i 1
k
1
N
n Y
i 1
i i
Y
and y turns out to be a biased estimator of Y . Based on this, one can modify y so as to obtain an
unbiased estimator of Y . Consider the stratum mean which is defined as the weight arithmetic mean of
strata sample means with strata sizes as weights.
1 k
y st Ni yi .
N i 1
Now
k
1
E ( yst )
N
N E( y )
i 1
i i
k
1
N
N Y
i 1
i i
Y
Variance of yst
k k k
Var ( yst ) w Var ( yi ) wi w j Cov( yi , y j )
2
i
i 1 i j
Since all the samples have been drawn independently from each strata by SRSWOR so
Cov( yi , y j ) 0
Ni ni 2
Var ( yi ) Si
Ni ni
4
where
1 Ni
Si2
N i 1 j 1
(Yij Y i ) 2
Thus
k
N i ni 2
Var ( yst ) wi2 Si
i 1 N i ni
k
n Si2
w 1 i 2
i .
i 1 Ni ni
Observe that Var ( yst ) is small when Si2 is small. This observation suggest how to construct the
strata . If Si2 is small for all I = 1,2,...,k, the Var ( yst ) will also be small . That is why it was mentioned
earlier that the strata are to be constructed such that they are within homogeneous, i.e., Si2 is small and
among heterogeneous.
For example, the units in geographical proximity will tend to be more close. The consumption pattern
in households will be similar within a lower income group housing society and within a higher income
group housing society whereas they will differ a lot between the two housing societies based on
income.
Estimate of Variance
Since the samples have been drawn by SRSWOR, so
E ( si2 ) Si2
1 ni
where si2
ni 1 j 1
( yij yi ) 2
N i ni 2
and Var ( yi ) si
wi ni
k
so Var ( yst ) wi2 Var ( yi )
j 1
N n
wi2 i i si2
i 1 N i ni
Note: If SRSWR is used instead of SRSWOR for drawing the samples from stratum, then appropriate
changes can be made at required steps.
In this case
5
k
yst wi yi
i 1
E ( yst ) Y
k
N 1 k
2
Var ( yst ) wi2 i Si2 wi2 i
i 1 N i ni 1 ni
k
w2 s 2
Var ( yst ) i i
i 1 ni
ni
1
where i2
ni
(y
j 1
ij yi ) 2 .
6
7. If population is large, then it is convenient to sample separately from the strata rather then the
entire population.
8. The population mean or population total can be estimated with higher precision by suitably
providing the weights to the estimates obtained from each stratum.
Note: The sample size can not be determined by minimizing both the cost and variability
simultaneously. The cost function is directly proportional to the sample size whereas variability is
inversely proportional to the sample size.
1. Equal allocation
Choose the sample size ni to be same for all strata.
Draw sample of equal size from each strata.
Let n be the sample size and k be the number of strata.
n
ni for all i 1, 2,..., k.
k
2. Proportional allocation
For fixed k, select ni such that it is proportional to stratum size N i , i.e.,
ni Ni
or ni CNi
Where C is constant of proportionality.
7
k k
ni CNi
i 1 i 1
or n CN
n
C
N
n
Thus ni N i
N
Such allocation arises from the considerations like operational convenience..
ni C * Ni Si
i 1 i 1
k
or n C * N i Si
i 1
n
or C * k
N S
i 1
i i
nN i Si
Thus ni k
N Si 1
i i
This allocation arises when the Var y st is minimized subject to the constraint
k
n
i 1
i (prespecified).
where
C : total cost
8
C0 : overhead cost, e.g., setting up of office, training people etc
To find ni under this cost function, consider the Lagrangian function with Lagrangian
multiplier as
Var ( yst ) 2 (C C0 )
k
1 1 k
wi2 Si2 2 Ci ni
i 1 ni N i i 1
2 2
k
wi Si k k
wi2 Si2
Ci ni
2
i 1 ni i 1 i 1 Ni
2
k w S
i i Ci ni terms independent of ni .
i 1
ni
How to determine ?
There are two ways to determine .
(i) Minimize variability for fixed cost and
(ii) Minimize cost for given variability.
We consider both the cases
(i) Minimize variability for fixed cost
Let C C0* be fixed.
9
k
so C ni 1
i i C0*
k
wi Si
So or C
i 1
i
Ci
C0*
Ci wi Si
or i 1
C0*
wS C0*
ni* i i k .
Ci
Ci wi Si
i 1
The required sample size to estimate Y such that the variance is minimum for given cost C C0* is
k
n ni*
i 1
1 2 2
k
1
i 1 i
n wi Si V0
Ni
k
w2 S 2 k
w2 S 2
or i i V0 i i
i 1 ni i 1 Ni
k
Ci k
wi2 Si2
or WS
i 1
wi2 Si2 V0
i 1 Ni
i i
1 wi Si
(Substituting ni from equation (*)).
Ci
k
wi2 Si2
V0
Ni
or k
i 1
.
w S
i 1
i i Ci
10
k 2 2
wi Si
wi Si Ci
ni i 1 ..
Ci k
wi2 Si2
0 N
V
i 1 i
So the required sample size to estimate Y such that cost C is minimum for a
k
prespecified variance V0 is n ni .
i 1
n
Under proportional allocation, ni Ni nwi
N
k
So C0 n wi Ci
i 1
C0
or n k .
wC
i 1
i i
Co wi
Thus ni .
wiCi
11
k
1 1 2 2
n
i 1 i
wi Si V0
Ni
k
w2 S 2 k
w2 S 2
or i i V0 i i
i 1 ni i 1 Ni
k
wi2 Si2 k
wi2 Si2
or
i 1 nWi
V0
i 1 Ni
(using ni nwi )
k
w S 2
i i
2
or n i 1
k
wi2 Si2
V0
i 1 Ni
k
w S i i
2
or ni wi i 1
k
wi2 Si2
V0
i 1 Ni
This is known Bowley’s allocation.
Now we derive the variance of yst under proportional and optimum allocations.
n
k Ni Ni 2
Ni 2
Varprop ( y ) st N
Si
Ni Ni N
i 1
n
N
N n k
N i Si2
N n i 1 N
N n k
wi Si2
Nn i 1
12
(ii) Optimum allocation
nN i Si
ni k
N S
i 1
i i
k
1 1
Vopt ( yst ) wi2 Si2
i 1 ni Ni
k
w2 S 2 k w2 S 2
i i i i
i 1 ni i 1 Ni
k
k N i Si k w2 S 2
wi Si i 1
2 2
i i
i 1 nN i Si i 1 N i
k
1 N i Si k k wi2 Si2
. 2 N i Si
i 1 n N i 1 i 1 N i
2
1 k N i Si k
wi2 Si2
n i 1 N i 1 N i
2
1 k 1 k
wi Si w S 2
i i
2
.
n i 1 N i 1
In order to compare VSRS ( y ) and Vprop ( yst ), first we attempt to express S 2 as a function of Si2 .
Consider
13
k Ni
( N 1) S 2 (Y ij Y )2
i 1 j 1
k Ni 2
(Y ij Y ) (Yi Y )
i 1 j 1
k Ni k Ni
(Yij Y )2 (Y Y ) i
2
i 1 j 1 i 1 j 1
k k
( N i 1) Si2 N (Y Y ) i i
2
i 1 i 1
N 1 2 N 1 2 k k
Ni
S i Si (Yi Y ) 2 .
N i 1 N i 1 N
Ni 1 N 1
1 and 1.
Ni N
Thus
k
Ni 2 k Ni
S2 Si (Yi Y ) 2
i 1 N i 1 N
N n 2 N n k
Ni 2 N n k
Ni
or
Nn
S
Nn
i 1 N
Si
Nn
i 1 N
(Yi Y ) 2
N n k
VarSRS (Y ) V prop ( y st )
Nn
w (Y Y )
i 1
i i
2
k
Since w (Y Y )
i 1
i i
2
0,
14
N n k 2
1 k
2
1 k
V prop ( yst ) Vopt ( yst ) wi Si wi Si wS
2
i i
Nn i 1 n i 1 N i 1
1 k
2
k
wi Si wi Si
2
n i 1 i 1
1 k 1
n i 1
wi Si2 S 2
n
1
n
wi ( Si S ) 2
where
k
S wi Si
i 1
1 ni
si2
ni 1 j 1
( yij yi )2 .
In stratified sampling,
k
Ni ni 2
Var ( yst ) wi2 Si .
i 1 Ni ni
The second term represents the reduction due to finite population correction.
15
The confidence limits of Y can be obtained as
Assuming y st is normally distributed and Var ( yst ) is well determined so that t can be read from
normal distribution tables. If only few degrees of freedom are provided by each stratum, then t values
are obtained from the table of student’s t-distribution.
The distribution of Var ( yst ) is generally complex. An approximate method of assigning an effective
Ni ( Ni ni )
where gi
ni
k
and Min(ni 1) ne (n 1)
i 1
i
n1 N1
(n N1 ) wi Si
and ni k
; i 1, 2,3,..., k .
w S
i 2
i i
In such cases, the formula for minimum variance of yst need to be modified as
( * wi Si )2 *
wi Si2
Min Var ( y st )
n* N
where *
denotes the summation over the strata in which ni Ni and n* is the revised total sample
17
1 when j th unit belongs to i th stratum is in C
Yij
0 otherwise
and yst pst .
Ni
Here Si2 Pi Qi
Ni 1
where Qi 1 Pi .
k
Ni ni 2 2
Also Var ( y st ) wi Si .
i 1 Ni ni
1 k Ni2 ( Ni ni ) PQ
2
So Var ( pst ) i i
.
N i 1 Ni 1 ni
If the finite population correction can be ignored, then
k
PQ
Var ( pst ) wi2 i i
.
i 1 ni
N n 1 k Ni2 PQ
Var ( pst ) prop i i
N Nn i 1 N i 1
N n k
wi PQ
N i 1
i i
The best choice of ni such that it minimizes the variance for fixed total sample size is
Ni PQ
ni N i i i
Ni 1
N i PQ
i i
Ni PQ
Thus ni n k
i i
.
N
i 1
i PQ
i i
k
Similarly, the best choice of ni such that the variance is minimum for fixed cost C C0 Ci ni is
i 1
18
PQ
i i
nN i
Ci
ni k
.
PQ
N
i 1
i
i i
Ci
k Ni 2
(Yij Y ) (Yi Y )
i 1 j 1
k Ni k
(Yij Y ) 2 N i (Yi Y ) 2
i 1 j 1 i 1
k k
( N i 1) Si2 N i (Yi Y ) 2
i 1 i 1
k
k
( N i 1) Si2 N WiYi 2 Y 2 .
i 1 i 1
E ( si2 ) Si2
So Sˆi2 si2 .
19
Var ( yi ) E ( yi 2 ) [ E ( yi )]2
E ( yi 2 ) Yi 2
Yi 2 E ( yi 2 ) Var yi
An unbiased estimate of Yi 2 is
Yˆi 2 yi 2 Var ( yi )
N n 2
yi 2 i i si
Ni ni
So an estimate of Y 2 is
N 1 i 1 N 1 i 1 N i ni i 1 N i ni
1 k 2 N k 2 k N i ni 2
N 1 i 1
N i 1 si wi wi (1 wi )
N 1 i 1
si .
i 1 N i ni
Thus
N n ˆ2
Var SRS ( y ) S
Nn
N n k 2 N ( N n) k 2 k Ni ni 2
N ( N 1)n i 1
( N i 1) si wi wi (1 wi )
nN ( N 1) i 1
si
i 1 Ni ni
and
20
k
Ni ni 2 2
Var ( yst ) wi si .
i 1 Ni ni
Substituting these expressions in
Interpenetrating subsamping
Suppose a sample consists of two or more subsamples which are drawn according to the same
sampling scheme. The samples are such that each subsample yields an estimate of parameter. Such
subsamples are called interpenetrating subsamples.
The subsamples need not necessary be independent. The assumption of independent subsamples helps
in obtaining an unbiased estimate of the variance of the composite estimator. This is even helpful if
the sample design is complicated and the expression for variance of the composite estimator is
complex.
Note that
21
1 g
E Var ( t ) E (t j ) 2 g ( t ) 2
g ( g 1)
j 1
1 g
E Var (t j ) g Var ( t )
g ( g 1) j 1
1
( g 2 g )Var ( t )
g ( g 1)
Var ( t )
If distribution of each estimator tj is symmetric as about , then the confidence interval of can be
obtained by
g 1
1
P Min(t1 , t2 ,..., t g ) Max(t1 , t2 ,..., t g ) 1 .
2
Let Yˆij (tot ) be the unbiased estimator of total of jth stratum based on the ith subsample ,
i = 1,2,...,L; j = 1,2,...,k.
22
L k
1
(Yˆij (tot ) Yˆj (tot ) )2 .
L( L 1) i 1 j 1
Post Stratifications
Sometimes the stratum to which a unit belongs to may be known after the field survey only. For
example, the age of persons, their educational qualifications etc. can not be known in advance. In such
cases, we adopt the post stratification procedure to increase the precision of the estimates.
In post stratification
draw a sample by simple random sampling from the population and carry out the survey.
after the completion of survey, stratify the sampling units to increase the precision of the
estimates.
Assume the stratum size N i is fairly accurately known. Let
m
i 1
i n.
Note that mi is a random variable (and that is why we are not using the symbol ni as earlier).
Assume n is large enough or the stratification is such that the probability that some mi 0 is
negligibly small. In case, mi 0 for some strata, two or more strata can be combined to make the
sample size to be non-zero before evaluating the final estimates.
23
1 k
E N i E ( yi / m1 , m2 ,..., mk )
E ( y post )
N i 1
1 k
E N i yi
N i 1
Y
Var ( y post ) E Var ( y post / m1 , m2 ,..., mk ) Var E ( y post / m1 , m2 ,..., mk )
k 1 1
E wi2 Si2 Var (Y )
i 1 mi N i
k 1 1
wi2 E Si2 (Var (Y ) 0).
i 1 mi Ni
1 1
To find E , proceed as follows :
mi Ni
Consider the estimate of ratio based on ratio method of estimation as
n N
y
yi Y
Y i
Rˆ
j 1 j 1
n
, R N
.
x X
x X
j i
j 1 j 1
We know that
N n RS X2 S XY
E ( Rˆ ) R . .
Nn X2
1 of j th unit belongs to i th stratum
Let xj
0 otherwise
y j 1 for all j = 1,2,...,N.
n
Rˆ
ni
N
R
Ni
1 Ni 2
S x2 Ni
N 1 N
24
n N N ( N n)( N N i )
E ( Rˆ ) R E .
ni Ni nNi2 ( N 1)
Thus
1 1 N N ( N n)( N N i ) 1
E
ni N i nN i n 2
N i
2
( N 1) Ni
( N n) N N 1
1 .
n( N 1) N i N i n n
1 1 ( N n) N N 1
E 1
mi Ni n( N 1) Ni Ni n n
k 1 1 2
Var ( y post ) wi2 E Si
i 1 mi Ni
k N n N N 1
wi2 Si2 . 1
i 1 ( N 1) n Ni nN i n
N n k 2 2 1 1 1
n( N 1) i 1
wi Si 1
wi nwi n
N n k 2 2 1
n ( N 1) i 1
2
wi Si n 1
wi
N n k
n ( N 1) i 1
2
(nwi 1 wi ) Si2
N n k N n k
n ( N 1) i 1
2
w S
i i
2
n ( N 1) i 1
2
(1 wi ) Si2
Assuming N 1 N .
N n n 2 2 N n n
V ( y post )
Nn i 1
wi Si N 2n i 1
(1 wi ) Si2
N n n
2
V prop ( yst ) (1 wi ) Si2
Nn i 1
The second term is the contribution in the variance of y post due to mi ' s not being proportionately
distributed.
2 w 1)
(1 wi ) Si2 Si (k 1) ( Since i
Nn i 1 Nn 2 i 1
k 1 N n 2
Sw
n Nn
k 1
Var ( yst ).
n
n
The increase in variance over Varprop ( yst ) is small if the average sample size n per stratum.
2
Thus a post stratification with a large sample produces an estimator which is almost precise as an
estimator in stratified sampling with proportional allocation.
26
Chapter 5
Ratio and Product Methods of Estimation
An important objective is any statistical estimation procedure is to obtain the estimators of parameters of
interest with more precision. It is also well understood that incorporation of more information in the
estimation procedure yields better estimators, provided the information is valid and proper. Use of such
auxiliary information through made through the ratio method of estimation to obtain an improved
estimator of population mean. In ratio method of estimation, auxiliary information on a variable is
available which is linearly related to the variable under study and is utilized to estimate the population
mean.
Let Y be the variable under study and X be any auxiliary variable which is correlated with Y . The
observation xi on X and yi on Y are obtained for each sampling unit. The population mean X of X
(or equivalently the population total X tot ) must be known. For example, xi ' s may be the values of
yi ' s from .
- some earlier completed census,
- some earlier surveys,
- some characteristic on which it is easy to obtain information etc.
For example, if yi is the quantity of fruits produced in the ith plot, then xi can be the area of ith plot or
the production of fruit in the same plot is previous error.
Let ( x1 , y1 ),( x2 , y2 ),...,( xn , yn ) be the random sample of size n on paired variable (X, Y) drawn,
preferably by SRSWOR, from a population of size N. The ratio estimate of population mean Y is
YˆR X RX
y ˆ
x
N
assuming the population mean X is known. The ratio estimator of population total Ytot Yi is
i 1
y
YˆR (tot ) tot X tot
xtot
1
N n
where X tot X i is the population total of X which is assumed to be known, ytot yi and
i 1 i 1
n
xtot xi are the sample totals of Y and X respectively. The YˆR (tot ) can be equivalently expressed as
i 1
y
YˆR (tot ) X tot
x
ˆ .
RX tot
Ytot
Looking at the structure of ratio estimators, note that the ratio method estimates the relative change
X tot
yi
that occurred after ( xi , yi ) were observed. It is clear that if the variation among the values of and is
xi
ytot y
nearly same for all i = 1,2,...,n then values of (or equivalently ) varies little from sample to
xtot x
sample and ratio estimate will be of high precision.
y y2
Moreover it is difficult to find the exact expression for E and E 2 . So we approximate them and
x x
proceed as follows:
Let
y Y
0 y (1 o )Y
Y
xX
1 x (1 1 ) X .
X
Since SRSWOR is being followed , so
2
E ( 0 ) 0
E (1 ) 0
1
E ( 02 ) E ( y Y )2
Y2
1 N n 2
2 SY
Y Nn
f SY2
n Y2
f
CY2
n
N n 2 1 N S
where f
N
, SY
N 1 i 1
(Yi Y )2 and CY Y is the coefficient of variation related to Y.
Y
Similarly,
f 2
E (12 ) CX
n
1
E ( 01 ) E[( x X )( y Y )]
XY
1 N n 1 N
XY Nn N 1 i 1
( X i X )(Yi Y )
1 f
. S XY
XY n
1 f
S X SY
XY n
f S S
X Y
n X Y
f
C X CY
n
SX
where C X is the coefficient of variation related to X and is the correlation coefficient between X
X
and Y.
YˆR X
y
x
(1 0 )Y
X
(1 1 ) X
(1 0 )(1 1 ) 1Y
3
Assuming 1 1, the term (1 1 )1 may be expanded as an infinite series and it would be convergent.
xX
Such assumption means that 1, i.e., all possible estimate x of population mean X lies
X
between 0 and 2 X , This is likely to hold true if the variation in x is not large. In order to ensures that
variation in x is small, assume that the sample size n it is fairly large. With this assumption,
In case, when sample size is large, then 0 and 1 are likely to be small quantities and so the terms
involving second and higher powers of 0 and 1 would be negligibly small. In such a case
YˆR Y Y ( 0 1 )
and
E (YˆR Y ) 0.
So the ratio estimator is an unbiased estimator of population mean to the first order of approximation.
If we assume that only terms of 0 and 1 involving powers more than two are negligibly small (which is
more realistic them assuming powers more than one are negligibly small), then
4
Bias (YˆR ) 0
if E (12 01 ) 0
Var ( x ) Cov( x , y )
or if 0
X2 XY
1 Y
or if 2 Var ( x ) Cov( x , y ) 0
X X
or if Var ( x ) RCov( x , y ) 0 (assuming X 0)
Y Cov( x , y )
or if R
X Var ( x )
which is satisfied when the regression line of Y on X passes through origin.
Under the assumption 1 1 and the terms of 0 and 1 involving powers more than two are negligible
small,
f
MSE (YˆR ) Y 2 C X2 CY2
f 2f
C X CY
n n n
2
Y f
C X2 CY2 2 C X C y
n
Up to the second order of approximation.
1 CX
or if .
2 CY
5
Thus ratio estimator is more efficient than sample mean based on SRSWOR if
1 CX
if R 0
2 CY
1 CX
and if R 0.
2 CY
It is clear from this expression that the success of ratio estimator depends on how close is the auxiliary
information to the variable under study.
Cov( Rˆ , x ) E ( Rx
ˆ ) E ( Rˆ ) E ( x )
y
E x E ( Rˆ ) E ( x )
x
Y E ( Rˆ ) X .
Thus
Y Cov( Rˆ , x )
E ( Rˆ )
X X
Cov( Rˆ , x )
R
X
ˆ ˆ
Bias ( R ) E ( R ) R
Cov( Rˆ , x )
X
Rˆ , x Rˆ x
X
where Rˆ , x is the correlation between Rˆ and x ; Rˆ and x are the standard errors of Rˆ and x
respectively.
Thus
Rˆ , x Rˆ x
Bias( Rˆ )
X
Rˆ x
X
Rˆ , x
1 .
6
assuming X 0. Thus
Bias ( Rˆ ) x
Rˆ X
Bias ( Rˆ )
or CX
Rˆ
when C X is the coefficient of variation of X. If C X < 0.1, then the bias in R̂ may be safely regarded as
(Y RX ) (Y Y ) (Y RX )
i 1
i i
2
i 1
i i
N 2
(Yi Y ) R( X i X ) (Using Y RX )
i 1
N N N
(Yi Y ) 2 R 2 ( X i X ) 2 2 R ( X i X )(Yi Y )
i 1 i 1 i 1
N
1
N 1 i 1
(Yi RX i ) 2 SY2 R 2 S X2 2 RS XY .
The MSE of YˆR has already been derived which is now expressed again as follows:
fY 2 2
MSE (YˆR ) (CY C X2 2 C X CY )
n
f S2 S2 S
Y 2 Y2 X2 2 XY
n Y X XY
f Y2 2 Y2 2 Y
S 2 S X 2 S XY
2 Y
nY X X
SY2 R 2 S X2 2 RS XY
f
n
N
f
n( N 1) i 1
(Yi RX i ) 2
N n N
nN ( N 1) i 1
(Yi RX i ) 2 .
7
Estimate of MSE (YˆR )
f 1 N
MSE (YˆR ) (U i U ) 2
n N 1 i 1
f
= SU2
n
1 N
where SU2
N 1 i 1
(U i U ) 2 .
1 n
(ui u ) Rˆ ( xi x )
n 1 i 1
s 2 Rˆ 2 s 2 2 Rs
y x
ˆ ,
xy
y
Rˆ .
x
Based on the expression
N
MSE (YˆR )
f
n( N 1) i 1
(Yi RX i )2 ,
8
Confidence interval of ratio estimator
If the sample is large so that the normal approximation is applicable, then the 100(1- )% confidence
ˆ ˆ ˆ ˆ
YR Z Var (YR ), YR Z Var (YR )
2 2
and
ˆ
R Z Var ( Rˆ ), Rˆ Z Var ( Rˆ )
2 2
respectively where Z is the normal derivate to be chosen for given value of confidence coefficient
2
(1 ).
This can also be used for finding confidence limits, see Cochran (1997, Chapter 6, page 156) for more
details.
(i) the relationship between yi and xi is linear passing through origin., i.e.
yi xi ei ,
where ei ' s are independent with E (ei / xi ) 0 and is the slope parameter
9
n
Proof. Consider the linear estimate of as ˆ i yi where yi xi ei .
i 1
n
So E ( ˆ ) when i 1
x 1.
i i
n
Consider the minimization of it Var ( yi / xi ) subject to condition for unbiased estimator
i 1
x 1 using
i i
Now
0 i xi xi , i 1, 2,.., n
i
n
0 i xi 1
i 1
n
Using i 1
x 1
i i
n
or x
i 1
i 1
1
or .
nx
1
i
nx
n
y i
y
and so ˆ i 1
.
nx x
Thus ˆ is not only superior to y but also best in the class of linear and unbiased estimators.
10
Alternative approach:
This result can alternatively be derived as follows:
y Y
The ratio estimator Rˆ is the best linear unbiased estimator of R if the following two
x x
conditions a hold:
(i) For fixed x, E ( y) x, i.e., the line of regression of y on x is a straight line passing through
the origin.
(ii) For fixed x , Var ( x) x, i.e., Var ( x) x where is constant of proportionality.
Proof: Let y ( y1) , y2 ,..., yn ) ' and x ( x1, x2 ,..., xn ) ' be two vectors of observations on
where diag( x1 , x2 ,..., xn ) is the diagonal matrix with x1 , x2 ,..., xn as the diagonal elements.
S 2 ( y x ) ' 1 ( y x )
n
( yi xi ) 2
.
i 1 xi
Solving
S 2
0
n
( yi ˆ xi ) 0
i 1
y ˆ
or ˆ R.
x
ˆ Yˆ is the best
Thus R̂ is the best linear unbiased estimator of R Consequently, RX R
11
Ratio estimator in stratified sampling
Suppose a population of size N is divided into k strata. The objective is to estimate the population mean
Y using ratio method of estimation.
In such situation, a random sample of size ni is being drawn from ith strata of size N i on variable under
study Y and auxiliary variable X using SRSWOR.
Let
yij : jth observation on Y from ith strata
An estimator of Y based on the philosophy of stratified sampling can be devised in following two
possible ways:
i 1 N
k
wiYˆRi
i 1
k
yi
wi Xi
i 1 xi
ni
1
where yi
ni
Yj 1
ij : sample mean of Y from ith strata
ni
1
xi
ni
xj 1
ij : sample mean of X from ith strata
Ni
1
Xi
Ni
x
j 1
ij : mean of all the units in ith stratum
12
No assumption is made that the true ratio remains constant from stratum to stratum. It depends on
information on each X i .
YˆRc st X
y
xst
N
where X is the population mean of X based on all the N Ni units. It does not depend on individual
i 1
i i and YRs wY
Note that there is an analogy between Y wY i Ri .
i 1 i 1
E (YˆR ) Y (Cx2 C X CY ) .
Yf
n
N i ni 2 Siy
2
Six2
fi , Ciy 2 , Cix 2 ,
2
Ni Yi Xi
1 Ni 1 Ni
Siy2
N i 1 j 1
(Yij Yi ) 2 , Six2
N i 1 j 1
( X ij X i ) 2 ,
13
i : correlation coefficient between the observation on X and Y in ith stratum
Cix : coefficient of variation of X values in ith sample.
Thus
k
E (YRs ) wi E (YˆRi )
i 1
k
f
wi Yi Yi i (Cix2 ix CixCiy
i 1 ni
k
wY f
Y i i i (Cix2 i Cix Ciy )
i 1 ni
Assuming finite population correction to be approximately 1, ni n / k and Cix , Ciy and i are all same for
Now we derive the MSE of YˆRs . We already have derived the MSE of YˆR earlier as
Y2f
MSE (YˆR ) (C X2 CY2 2 CxC y )
n
N
f
n( N 1) i 1
(Yi RX i ) 2
Y
where R .
X
Thus for ith stratum
MSE (YˆRi )
fi
(CiX2 CiY2 2 CiX CiY )
ni ( Ni 1)
Ni
fi
ni ( Ni 1) i 1
(Yij Ri X ij ) 2
and so
14
k
MSE (YˆRs ) wi2 MSE (YˆRi )
i 1
k
w2 f
i i Yi 2 (CiX2 CiY2 2 CiX CiY )
i 1 ni
k Ni
fi
wi2 (Yij Ri X ij ) 2
i 1 ni ( N i 1) j 1
An estimate of MSE (YˆRs ) can be found by substituting the unbiased estimators of SiX2 , SiY2 and SiY2 as
six2 , siy2 and sixy respectively for ith stratum and Ri Yi / X i can be estimated by ri yi / xi .
ˆ
k
wi2 fi 2
MSE (YRs ) ( siy ri 2 six2 2ri sixy ) .
i 1 ni
Also .
k
wi2 fi ni
MSE (YˆRs ) ( yij ri xij ) 2
i 1 ni ( ni 1) j 1
Here
k
w y i i
YˆRC
yst
i 1
k
X X Rˆc X .
w x
xst
i i
i 1
It is difficult to find the exact expression of bias and mean squared error of YˆRc , so we find their
approximate expressions.
Define
yst Y
1
Y
x X
2 st
X
E (1 ) 0
E ( 2 ) 0
15
k
N i ni wi2 SiY2 k
f i wi2 SiY2
E (12 )
i 1 N i ni Y 2 i 1 ni Y2
fi wi2 SiX2
k
E ( )
2
2
i 1 ni Y2
k
fi SiXY
E (1 2 ) .
i 1 ni XY
Thus assuming 2 1,
(1 1 )Y
YˆRC X
(1 2 ) X
Y (1 1 )(1 2 22 ...)
Y (1 1 2 1 2 22 ...)
Retaining the terms upto order two due to same reason as in the case of YˆR ,
YˆRC Y (1 1 2 1 2 22 )
Yˆ Y ( 2 )
RC 1 2 1 2 2
16
MSE (YˆRc ) E (YˆRc Y ) 2
Y 2 E (1 2 1 2 2 ) 2
Y 2 E (12 22 21 2 )
k
f S 2 S 2 2S
Y i wi2 iX2 iY2 iXY
i 1 ni X Y XY
k
f S 2 S 2 2 S S
Y 2 i wi2 iX2 iY2 i iX iY
i 1 ni X Y X Y
Y2 k fi 2 Y 2 2 Y
2 wi 2 SiX SiY 2 i SiX SiY
2
Y i 1 ni X X
k
f
i wi2 ( R 2 SiX2 SiY2 2 i RSiX SiY ) .
i 1 ni
An estimate of MSE (YRc ) can be obtained by replacing SiX2 , SiY2 and SiXY by their unbiased estimators
Y y
six2 , siy2 and sixy respectively whereas R is replaced by r as follows: Thus the following estimate
X x
is obtained:
k 2 six2 siy2 sixy
w f
MSE (YRc ) Y 2 1 i 2 2
i 1
ni X Y 2
XY
w2 f
i i r 2 six2 siy2 2rsixy
k
i 1 ni
where X is known.
in MSE (YˆRs )
yi
Ri
xi
in MSE (YˆRc ).
Y
R
X
Thus
17
MSE (YˆRc ) MSE (YˆRs )
k
w2 f
i i ( R 2 Ri2 ) SiX2 2( Ri R) i SiX SiY
i 1 ni
k
w2 f
i i ( R Ri2 ) 2 SiX2 2( R Ri )( Ri SiX2 i SiX SiY ) .
i 1 ni
The difference depends on
(i) The magnitude of the difference between the strata ratios ( Ri ) and population ratio as whole
(R).
(ii) The value of ( Ri Six2 i Six Siy ) is usually small and vanishes when the regression line of y on
x is linear and passes through origin within each stratum. In such a case
So unless Ri varies considerably, the use of YˆRc would provide an estimate of Y with negligible bias
If Ri R, YˆRc can be as precise as YˆRs but its bias will be small. It also does not require
knowledge of X1 , X 2 ,..., X k .
The ratio type estimators that are unbiased or have smaller bias them Rˆ , YˆR or YˆRc (tot ) are useful in sample
surveys . There are several approaches to derive such estimators. We consider here two such approaches:
18
Yi
Let R , i 1, 2,.., N ,
Xi
then
1 n
YˆR 0 Ri X
n i 1
rX
where
1 n
r Ri
n i 1
Bias (YˆR 0 ) E (YˆR 0 ) Y
E (rX ) Y
E (r ) X Y .
Since
1 n 1 N
E (r ) (
n i 1 N
R )
i 1
i
n
1
N
R
i 1
i
R.
So Bias (YˆR 0 ) RX Y .
N n
Using the result that under SRSWOR, Cov( x , y ) S XY , it also follows that
Nn
N n 1 N
Cov(r , x ) ( Ri R )( X i X )
Nn N 1 i 1
N n 1 N
( Ri X i NRX )
Nn N 1 i 1
N n 1 N
Y
( i X i NRX )
Nn N 1 i 1 X i
N n 1
( NY NRX )
Nn N 1
N n 1
[ Bias (YˆR 0 )].
Nn N 1
N n
Thus using the result that in SRSWOR, Cov( x , y ) S XY , we have
Nn
19
Nn( N 1)
Bias (YˆRo ) Cov(r , x )
N n
Nn( N 1) N n
S RX
N n Nn
( N 1) S RX
1 N
where S RX ( Ri R)( X i X ).
N 1 i 1
The following result helps is obtaining an unbiased estimator of population mean:.
Since under SRSWOR set up,
E ( sxy ) S xy
1 n
where sxy ( xi x )( yi y ),
n 1 i 1
1 N
S xy ( X i X )(Yi Y )
N 1 i 1
20
2. Jackknife method for obtaining a ratio estimate with lower bias
Jackknife method, is used to get rid of the term of order 1/n from the bias of an estimator. Suppose the
Let Rˆi* *
yi
where the *
denote that the summation is on all values of the
*
xi
sample except the ith group. So Rˆi* is based on a simple random sample of size m(g - 1),
so we can express
a1 a
E ( Rˆi* ) R 2 2 2 ...
m( g 1) m ( g 1)
or
a a
E ( g 1) Rˆi* ( g 1) R 1 2 2 ...
m m ( g 1)
Thus
a2
E gRˆ ( g 1) Rˆi* R ...
g ( g 1)m 2
or
a g
E gRˆ ( g 1) Rˆi* R 22 ...
n g 1
1
Hence the bias of gRˆ ( g 1) Rˆi* is of order 2 .
n
Now g estimates of this form can be obtained, one estimator for each group. Then the jackknife or
Quenouille’s estimator is the average of these of estimators
g
Rˆ i
RˆQ gRˆ ( g 1) i 1
.
g
A large sample variance of YˆHR is obtained as follows. We assume n and N are large enough so that
21
n
1 and make r R . Then
n 1
YˆHR ( y Rx ).
1 f
Var (YˆHR ) S y2 R 2 S x2 2 RS xy .
n
1 Cx
usually the case. This shows that if auxiliary information is such that , then we cannot use the
2 Cy
ratio method of estimation to improve the sample mean as an estimator of population mean. So there is
need of another type of estimator which also makes use of information on auxiliary variable x . Product
estimator is an attempt in this direction.
The product estimator of the population mean Y is defined as
YˆP
yx
.
X
y Y xX
Let 0 , 1 ,
Y X
We write Yˆp as
Yˆp
yx
Y (1 0 )(1 1 )
X
Y (1 1 2 1 0 ).
Bias(Yˆp ) Cov( y , x )
1 f
S xy ,
X nX
22
which shows that bias of Yˆp decreases as n increases. Bias of Yˆp can be estimated by
Bias(Yˆp )
f
sxy .
nX
Writing Yˆp is terms of 0 and 1 , we find that the variance of the product estimator Yˆp upto second order
of approximation is given by
Here terms in (1 , 2 ) of degrees greater than two are assumed to be negligible. Using we find that
which shows that Yˆp is more efficient than the simple mean y for
1 Cx
if R 0
2 Cy
and for
1 Cx
if R 0.
2 Cy
23
Multivariate Ratio Estimator
Let y be the study variable and X1 , X 2 ,..., X p be p auxiliary variables assumed to be corrected with y .
Further it is assumed that X1 , X 2 ,..., X p are independent. Let Y , X1 , X 2 ,..., X p be the population means of
the variables y , X1 , X 2 ,..., X p . We assume that a SRSWOR of size n is selected from the population of
24
(ii) Variance of the multivariate ratio estimator:
ˆ f 2 p 2 2
Var (YMR ) Y wi (C0 Ci2 2 iC0Ci ).
n i 1
25
Chapter 6
Regression Method of Estimation
The ratio method of estimation uses the auxiliary information which is correlated with the study variable to
improve the precision which results in improved estimators when the regression of y on x if is linear and
passes through origin. When the regression of if on X is linear, it is not necessary that the line should
always pass through origin. Under such conditions, it is more appropriate to use the regression type
estimators.
In ratio method, the conventional estimator sample mean y was improved by multiplying it by a a factor
X
where x is an unbiased estimator of population mean X which is chosen as population mean of
x
auxiliary variable. Now we consider another idea based on difference.
Yˆ * y ( x X )
which is an unbiased estimator of Y and is any constant. Now find such that the Var (Yˆ * ) is
minimum
1
Note that the value of regression coefficient is a linear regression model y x e of y on x
n
Cov( x, y ) S xy
obtained by minimizing e
i 1
2
i based on n data sets ( xi , yi ), i 1, 2,.., n is
Var ( x)
2 . Thus the
Sx
optimum value of is same as the regression coefficient of y on x with a negative sign, i.e.,
.
Yˆreg y ( X x )
which is the regression estimator of Y and the procedure of estimation is regression method of estimation.
where ( x , y ) is the correlation coefficient between x and y . So Yˆreg would be efficient of x and y
ˆ
are highly correlated. The estimator Yreg is more efficient than Y of ( x , y ) 0 which generally holds.
Yˆreg y 0 ( X x ) .
Bias of Yreg :
E (Yˆreg ) E ( y ) 0 X E ( x )
Y 0 X X )
Y
ˆ
Thus Yreg is an unbiased estimator of Y when is known.
2
Variance of Yˆreg
2
Var (Yˆreg ) E Yˆreg E (Yˆreg )
2
E y 0 ( X x ) Y
2
E ( y Y ) 0 ( x X )
E ( y Y ) 2 02 ( x X ) 2 0 E ( x X )( y Y
Var ( y ) 02Var ( x ) 2 0Cov( x , y )
f
SY2 02 S X2 2 0 S XY
n
f
SY2 02 S X2 2 0 S X SY
n
where
N n
f
N
1 N
S X2 ( X i X )2
N 1 i 1
1 N
SY2
N 1 i 1
(Yi Y ) 2
If 02 S X2 20 S XY 0
2S XY
or 0 S X2 0 0
S X2
2S 2S
which is possible when either 0 0 and 0 2XY 0 2XY 0 0 .
SX SX
2S 2S 2S
or 0 0 and 0 2XY ) 0 0 0 2XY 0 2XY .
SX SX SX
3
Optimal value of
So
Var (Yˆreg )
SY2 2 S X2 2 S X SY 0
S S
Y XY2 .
SX SX
S
The minimum value of variance of Yˆreg with optimum value of opt Y is
SX
f S2
Varmin (Yˆreg ) SY2 2 Y2 S X2 2 Y S X SY
S
n SX SX
f
SY2 (1 2 ).
n
Since 1 1, so
Departure from :
SY
where opt .
SX
4
Estimate of variance
n 2
Var (Yˆreg )
f
( yi y ) 0 ( xi x )
n(n 1) i 1
n
f
n
(s
i 1
2
y 02 sx2 2 0 sxy ).
Yˆreeg y ˆ ( X x ).
ˆ
It is difficult to find the exact expressions of E (Yreg ) and Var (Yreg ). So we approximate then using the
5
E ( 0 ) 0
E (1 ) 0
E ( 2 ) 0
E ( 3 ) 0
f 2
E ( 02 ) CY
n
f
E (12 ) C X2
n
f
E ( 0 1 ) C X CY
n
and
sxy
Yreg y (X x)
sx2
sxy (1 2 )
Y (1 0 ) (1 X )
sx2 (1 3 )
S XY
where is the population regression coefficient.
S X2
Assuming 3 1,
Retaining the terms upto second power of ' s and ignoring other terms, we have
ˆ
Bias of Yreg
ˆ
Now the bias of Yreg is
6
N n
where f , (r , s)th cross product moment is
N
rs E ( x X )r ( y Y )s
so
21 E ( x X )2 ( y Y )
30 E ( x X )3
So
f 21 30
Bias(Yˆreg ) .
n S XY S X2
Also,
Bias(Yˆreg ) E ( y ) E[ ˆ ( X x )]
Y XE ( ˆ ) E ( ˆ x )
Y E ( x ) E ( ˆ ) E ( ˆ x )
Y Cov( ˆ , x )
Bias(Yˆreg ) E (Yˆreg ) Y Cov( ˆ , x )
ˆ
MSE of Yreg
ˆ
To obtain the MSE of Yreg , consider
Retaining the terms of ' s upto the second power second and ignoring others, we have
ˆ
Comparison of Yreg with ratio estimate and SRS sample mean estimate
So regression estimate is always superior to ratio estimate upto second order of approximation.
Under the set up of stratified sampling, let the population of N sampling units is divided into k strata.
k
The strata sizes are N1 , N2 ,.., N k such that N
i 1
i N . A sample of size ni on ( xij , yij ), j 1, 2,.., ni ,
is drawn from ith strata (i = 1,2,..,k) by SRSWOR where xij and yij denotes the jth unit from ith strata on
Yˆreg y 0 ( X x )
from each stratum separately i.e., the regression estimate in the ith stratum is
Yˆreg (i ) yi i ( X i xi ).
k N Yˆ
ˆ
Ysreg i reg (i )
i 1 N
k
[ wi { yi i ( X i xi )}]
i 1
Sixy Ni
where i 2
, wi .
S ix N
In this approach , the regression estimator is separately obtained in each stratum and then combined
ˆ
using the philosophy of stratified sample. So Ysreg is termed as separate regression estimator,
ˆ
Another strategy is to estimate x and y in the Yreg as respective stratified mean. Replacing x by
k k
xst wi xi and y by yst wi yi , we have
i 1 i 1
In this case, all the sample information is combined first and then implemented in regression
ˆ
estimator, so Yreg is termed as combined regression estimator.
- when is preassigned as 0
9
sxy
We consider here the case that is preassigned as 0 . Other case when is estimated as can
sx2
be dealt using the same approach based on defining various ' s and using the approximation theory as in
Y.
2
Var (Yˆs reg ) E Yˆs reg E (Yˆs reg )
k k
wi yi i wi 0i ( X i xi ) Y ]2
i 1 i 1
k k
wi ( yi Y ) wi 0i ( xi X i )]2
i 1 i 1
k k k
wi2 ( yi Yi ) 2 wi2 0i E ( xi X i )]2 wi2 0i E ( xi X i )( yi Yi )
i 1 i 1 i 1
k k k
wi2Var (Yi ) wi2 0iVar ( xi ) 2 wi2 0i Cov( xi , yi )
i 1 i 1 i 1
k 2
w f 2
( SiY oi2 SiX2 2 0i SiXY )]
i i
i 1 ni
Thus an unbiased estimator of variance can be obtained by replacing SiX2 and SiY2 by their respective
k
w2 f
Var (Yˆs reg ) i i ( siy2 oi2 six2 2 0i sixy )
i 1 ni
and
k
w2 f
Var min (Yˆs reg ) i i ( siy2 oi2 six2 )
i 1 ni
Y 0 ( X X )
Y.
ˆ
Thus Yc reg is an unbiased estimator of Y .
11
Var (Yˆc reg ) is minimum when
ˆ ˆ
Comparison of Ys reg and Yc reg :
Note that
k 2
Var (Yˆc reg ) Var (Yˆs reg ) ( io 02 ) i i SiX2
w f
i 1 ni
k
fi
( io 0 ) 2 wi2 SiX2
i 1 ni
0.
So if regression line of y on x is approximately linear and the regression coefficient do
not vary much among strata, then separate regression estimate is more efficient is more
efficient than combined regression estimator.
12
Chapter 7
Varying Probability Sampling
The simple random sampling scheme provides a random sample where every unit in the population has
equal probability of selection. Under certain circumstances, more efficient estimators are obtained by
assigning unequal probabilities of selection to the units in the population. This type of sampling is
known as varying probability sampling scheme.
If Y is the variable under study and X is an auxiliary variable related to Y, then in the most commonly
used varying probability scheme, the units are selected with probability proportional to the value of X,
called as size. This is termed as probability proportional to a given measure of size (pps) sampling. If
the sampling units vary considerably in size, then SRS does not takes into account the possible
importance of the larger units in the population. A large unit, i.e., a unit with large value of Y contributes
more to the population total than the units with smaller values, so it is natural to expect that a selection
scheme which assigns more probability of inclusion in a sample to the larger units than to the smaller
units would provide more efficient estimators than the estimators which provide equal probability to all
the units. This is accomplished through pps sampling.
Note that the “size” considered is the value of auxiliary variable X and not the value of study variable Y.
For example in an agriculture survey, the yield depends on the area under cultivation. So bigger areas are
likely to have larger population and they will contribute more towards the population total, so the value
of the area can be considered as the size of auxiliary variable. Also, the cultivated area for a previous
period can also be taken as the size while estimating the yield of crop. Similarly, in an industrial survey,
the number of workers in a factory can be considered as the measure of size when studying the industrial
output from the respective factory.
1
stage by taking the probabilities of selection into account, then it is possible to obtain unbiased
estimators.
In pps sampling, there are two possibilities to draw the sample, i.e., with replacement and without
replacement.
PPS without replacement (WOR) is more complex than PPS with replacement (WR) . We consider both
the cases separately.
In selection of a sample with varying probabilities, the procedure is to associate with each unit a set of
consecutive natural numbers, the size of the set being proportional to the desired probability.
If X 1 , X 2 ,..., X N are the positive integers proportional to the probabilities assigned to the N units in the
population, then a possible way to associate the cumulative totals of the units. Then the units are selected
based on the values of cumulative totals. This is illustrated in the following table:
2
Units Size Cumulative
1 X1 T1 = X 1
• If Ti −1 ≤ R ≤ Ti , then
2 X2 T= X1 + X 2
2 ith unit is selected
Select a random with probability
number R Xi
i −1 , i = 1,2,…, N .
between 1 and TN
i −1 X i −1 Ti −1 = ∑ X j TN
j =1 by using random
number table.
i • Repeat the procedure
i Xi Ti = ∑ X j n times to get a
j =1
sample of size n.
N
XN = ∑ X j
N
N TN = ∑ X j
j =1 j =1
Drawback : This procedure involves writing down the successive cumulative totals. This is time
consuming and tedious if the number of units in the population is large.
Lahiri’s method:
Let M = Max X i , i.e., maximum of the sizes of N units in the population or some convenient
i =1,2,..., N
3
Probability of selection of ith unit at a trial depends on two possible outcomes
– either it is selected at the first draw
– or it is selected in the subsequent draws preceded by ineffective draws. Such probability is given by
P(1 ≤ i ≤ N ) P(1 ≤ j ≤ M | i )
1 Xi
= = . Pi * , say.
N M
1 N
Xi
Probability that no unit is selected at a=
trial
N
∑ 1 − M
i =1
1 NX
= N −
N M
X
=−
1 = Q, say.
M
Probability that unit i is selected at a given draw (all other previous draws result in the non selection of
unit i)
=Pi * + QPi * + Q 2 Pi * + ...
Pi *
=
1− Q
X i / NM Xi Xi
= = = ∝ Xi.
X /M NX X total
Thus the probability of selection of unit i is proportional to the size X i . So this method generates a pps
sample.
Advantage:
1. It does not require writing down all cumulative totals for each unit.
2. Sizes of all the units need not be known before hand. We need only some number greater than the
maximum size and the sizes of those units which are selected by the choice of the first set of
random numbers 1 to N for drawing sample under this scheme.
Disadvantage: It results in the wastage of time and efforts if units get rejected.
X
The probability of rejection = 1 − .
M
M
The expected numbers of draws required to draw one unit = .
X
This number is large if M is much larger than X .
4
Example: Consider the following data set of 10 number of workers in the factory and its output. We
illustrate the selection of units using the cumulative total method.
2 5 60 T2 = 2 + 5 = 7
3 10 12 T3 = 2 + 5 + 10 = 17
4 4 6 T4 = 17 + 4 = 21
5 7 8 T5 = 21 + 7 = 28
6 12 13 T6 = 28 + 12 = 30
7 3 4 T7 = 30 + 3 = 33
8 14 17 T8 = 33 + 14 = 47
9 11 13 T9 = 47 + 11 = 58
10 6 8 T10 = 58 + 6 = 64
2. Second draw:
- Draw a random number between 1 and 64
- Suppose it is 38
- T7 < 38 < T8
5
Selection of sample using Lahiri’s Method
In this case
=M =
Max X i 14
i =1,2,...,10
Pi : probability of selection of ith unit in the population at any given draw and is proportional to size X i .
Consider the varying probability scheme and with replacement for a sample of size n. Let yr be the
value of rth observation on study variable in the sample and pr be its initial probability of selection.
Define
yr
=zr = , r 1, 2,..., n,
Npr
then
1 n
z= ∑ zi
n i =1
6
2
σ z2 N
Y
is an unbiased estimator of population mean Y , variance of z is =
where σ ∑ Pi i − Y and
2
z
n i =1 NPi
s2 1 n
an unbiased estimate of variance of =
z is z ∑ ( zr − z ) 2 .
n n − 1 r =1
Proof:
Note that zr can take any one of the N values out of Z1 , Z 2 ,..., Z N with corresponding initial probabilities
P1 , P2 ,..., PN , respectively. So
N
E ( zr ) = ∑ Z i Pi
i =1
N
Yi
=∑ Pi
i =1 NPi
=Y.
Thus
1 n
E(z ) = ∑ E ( zr )
n i =1
1 n
= ∑Y
n i =1
=Y.
The variance of z is
1 n
Var ( z ) = 2 Var ∑ zr
n r =1
n
1
=
n2
∑Var ( z )
r =1
r ( zr' s are independent in WR case).
Now
( zr ) E [ zr − E ( zr ) ]
Var=
2
2
= E zr − Y
N
∑(Z − Y ) Pi
2
= i
i =1
2
Y N
= ∑ i − Y Pi
i =1 NPi
= σ z2 (say) .
7
Thus
n
1
Var ( z ) =
n2
∑σ
r =1
2
z
σ z2
= .
n
sz2
To show that is an unbiased estimator of variance of z , consider
n
n
(n − 1) E ( sz2 ) = E ∑ ( zr − z ) 2
r =1
n 2
= E ∑ zr − nz 2
r =1
n
= ∑ E ( zr2 ) − nE ( z ) 2
r =1
n
= ∑ Var ( z ) + {E ( z )} − n Var ( z ) + {E ( z )}
2 2
r r
r =1
2
) − n( ) Yi
∑ (σ
n N
= 2 2 σ z2
+Y 2
using +Y ) ∑
Var ( z = − Y P= σ 2
z n r
i
NP
i z
r 1 =i 1
= (n − 1)σ z2
E ( sz2 ) = σ z2
s2 σ z2
or E z=
= Var ( z )
n n
1 n yr
2
sz2
⇒ Var ( z ) == ∑ − nz 2
.
n n(n − 1) r =1 Npr
1
Note: If Pi = , then z = y ,
N
2
1 1 N Yi σ y2
=
Var ( z ) ∑ =
n N i =1 N . 1
−Y
n
N
which is the same as in the case of SRSWR.
8
Estimation of population total:
An estimate of population total is
1 n yr
=Yˆtot = ∑ N z. .
n r =1 pr
1 n Y1 Y2 YN
=
E (Yˆtot ) ∑ P1 + P2 + ... +
n r =1 P1 P2
PN
PN
1 n N
= ∑ ∑ Yi
n=r 1 =i 1
1 n
= ∑ Ytot
n r =1
= Ytot .
n i =1 N Pi
2
1 N Yi
= ∑ − Ytot Pi
n i =1 Pi
1 N Yi 2
= ∑ − Ytot2 .
n i =1 Pi
9
Let U i : i th unit,
∑ P =1
i =1
i
Pi (1) = Pi .
Consider
Pi (2) = Probability of selection of U i at 2nd draw.
Pi P Pi Pi Pi
=
Pi (2) P1 + P2 i + ... + Pi −1 + Pi +1 + ... + PN
1 − P1 1 − P2 1 − Pi −1 1 + Pi +1 1 − PN
N
Pi
= ∑
j ( ≠i ) =
1
Pj
1 − Pj
N
Pi P P
= ∑
j ( ≠i ) =
1
Pj
1 − Pj
+ Pi i − Pi i
1 − Pi 1 − Pi
N
Pi Pi
= ∑ P 1− P
j =1
j − Pi
1 − Pi
j
N P P
= Pi ∑ j − i
j =1 1 − Pj 1 − Pi
1
Pi (2) ≠ Pi (1) for all i unless Pi = .
N
10
y
Pi (2) will, in general, be different for each i = 1,2,…, N . So E i will change with successive draws.
pi
y1
This makes the varying probability scheme WOR more complex. Only will provide an unbiased
Np1
yi
estimator of Y . In general, (i ≠ 1) will not provide an unbiased estimator of Y .
Npi
Ordered estimates
To overcome the difficulty of changing expectation with each draw, associate a new variate with each
draw such that its expectation is equal to the population value of the variate under study. Such
estimators take into account the order of the draw. They are called the ordered estimates. The order of
the value obtained at previous draw will affect the unbiasedness of population mean.
We consider the ordered estimators proposed by Des Raj, first for the case of two draws and then
generalize the result.
respectively. Note that any one out of the N units can be the first unit or second unit, so we use the
notations U i (1) and U i (2) instead of U1 and U 2 . Also note that y1 and y2 are not the values of the first two
units in the population. Further, let p1 and p2 denote the initial probabilities of selection of U i(1) and
U i(2) , respectively.
1 y2
=z2 y1 +
N p2 / (1 − p1 )
1 (1 − p1 )
= y1 + y2
N p2
z1 + z2
z= .
2
p2
Note that is the probability P(U i (2) | U i (1) ).
1 − p1
11
Estimation of Population Mean:
First we show that z is an unbiased estimator of Y .
E(z ) = Y .
N
Note that ∑ P = 1.
i =1
i
Consider
1 y1 y1 Y Y Y
E ( z1 ) = E Note that can take any one of out of the N values 1 , 2 ,..., N
N p1 p1 P1 P2 PN
1 Y1 Y2 YN
= P1 + P2 + ... + PN
N P1 P2 PN
=Y
1 (1 − p1 )
=
E ( z2 ) E y1 + y2
N p2
1 (1 − P1 )
= E ( y1 ) + E1 E2 y2 U i (1) (Using E (Y ) =
E X [ EY (Y | X )].
N p2
where E 2 is the conditional expectation after fixing the unit U i (1) selected in the first draw.
y2 Y
Since can take any one of the (N – 1) values (except the value selected in the first draw) j with
p2 Pj
Pj
probability , so
1 − P1
(1 − P1 ) y * Y P
E2 y2 U i (1) = (1 − P1 )∑ j j . j .
(1 − P1 ) E2 2 U i (1) =
p2 p2 Pj 1 − P1
where the summation is taken over all the values of Y except the value y 1 which is selected at the first
draw. So
(1 − P1 )
∑
*
E2 y2 U i (1)=
j
Y=
j Ytot − y1.
p2
Substituting it is E ( z2 ), we have
1
E ( z2=
) [ E ( y1 ) + E1 (Ytot − y1 )]
N
1
= [ E ( y1 ) + E (Ytot − y1 )]
N
1 Ytot
= =
E (Ytot ) = Y.
N N
12
E ( z1 ) + E ( z2 )
Thus E ( z ) =
2
Y +Y
=
2
=Y.
Variance:
The variance of z for the case of two draws is given as
1 N 2 1 N Yi
2 2
1 N 2 Yi
1 − 2 ∑ Pi 2 N 2 ∑ Pi P − Ytot − 4 N 2 ∑ Pi P − Ytot
Var ( z ) =
= i 1= i 1 i
= i 1 i
The variance of z is
= ( z ) E ( z 2 ) − [ E ( z )]
2
Var
2
1 y1 y2 (1 − p1 )
= E + y1 + − Y
2
2 N p1 p2
2
1 y (1 + p1 ) y2 (1 − p1 )
= 2
E 1 + −Y
2
4N p1 p2
↓ ↓
nature of nature of
variable variable
depends depends
only on upon1st and
1st draw 2nd draw
1 N Yi (1 + Pi ) Y j (1 − Pi ) PP
2
= ∑ +
i j
−Y 2
4 N i ≠ j =1 Pi 1 − Pi
2
Pj
1 N Yi 2 (1 + Pi ) 2 PP Y j2 (1 − Pi ) 2 PP (1 − Pi 2 ) PP
i j
= ∑ P2 1 − P
4 N 2 i ≠ j =1
i j
+ 2
i j
−
+ 2YY
i j
−
− Y
2
i i Pj 1 P i PPi j 1 Pi
1 N Y 2 (1 + P ) 2 Pj Y j2 (1 − Pi ) 2 Pi
∑ + + 2YY
i j (1 + Pi ) − Y .
i i 2
=
4N 2 i ≠ j =1 Pi 1 − Pi P j 1 − Pi
13
Using the property
N N N
∑ aib j
= ∑ ai ∑ b j − bi , we can write
=
i≠ j 1 =i 1 = j 1
1 N Yi 2 (1 + Pi ) 2 N N N Y j2 Yi 2 N N
2 ∑ ∑ j i ∑ i i ∑ ∑ i i ∑ Y j − Yi )] − Y
Var ( z ) P − P + P (1 − P ) − + 2 Y (1 + P )( 2
= (1 − Pi ) j 1 =
4 N i 1 Pi= i1= j 1 Pj = Pi i 1 =j 1
1 N Yi 2 N Y j Yi
N 2 2 N N
2 ∑ ∑ ∑ ∑ i ∑ j
= (1 + Pi
2
+ 2 Pi ) + Pi (1 − Pi ) − + 2 Yi (1 + P )( Y − Yi −Y
) 2
=4 N i 1 i
P =i 1 = j 1 j =
P Pi
i 1 =j 1
1 N Yi 2 N 2 N N N Y2 N N N Y2
2 ∑
= + ∑ Yi P + 2∑ Yi + ∑ Pi ∑ − ∑ Yi − ∑ Pi ∑ j
2 j 2 2
4N = i 1 Pi =i 1 =i 1 =i 1 =j 1 Pj =i 1 =i 1 =j 1 Pj
N N N N N N
+ ∑ PY
i i + 2∑ Yi ∑ Y j − 2∑ Yi Pi + 2∑ Yi Pi ∑ Y j − 2∑ Yi ] − Y
2 2 2 2
i ==
i 1 j 1 =i 1 =i=
1 j 1 =i 1
1 N Yi 2 N 2 N Y j
2 N N
2 ∑ ∑ i ∑ ∑ tot ∑ Yi Pi − Y
= 2 − P − Yi
2
+ 2Y 2
tot + 2Y 2
4N = i 1 P=
i i 1 =j 1 P=
j i 1 =i 1
1 N 1 N Yi 2 2 2 1 N 2 N
2 2
= 2 1 − ∑ Pi 2 2 ∑
−Ytot + Ytot − 2 ∑ i
Y − 2Y 2
tot − 2Ytot ∑ Yi Pi + 4 N Y
= 2 i 1= 4 N i 1 Pi = 4N i 1 i =1
2
1 N 1 N Yi 1 N N
= 1 − ∑ Pi 2 2 ∑ Pi − Ytot − 2
( ∑ Yi
2
− 2Ytot ∑ Yi Pi − 2Ytot2 + 4Ytot2 )
= 2 i 1= 2 N i 1 Pi = 4 N i 1 =i 1
1 N 1
+ 1 − ∑ Pi 2 Y2
2 tot
2 i =1 2 N
2
1 N 1 N Yi 1 N N
= 1 − ∑ Pi 2 2 ∑ i
P − Ytot − 2 ∑ i
( Y 2
− 2Ytot ∑ Yi Pi + 2Ytot − 2Ytot + ∑ Pi Ytot )
2 2 2 2
2 i =1 = 2 N i 1 Pi = 4 N i 1 =i 1 i
2
1 N 2 1 N Yi
1 2 ∑ Pi 2 N 2 ∑ Pi P − Ytot − 4 N 2 ∑ (Yi − 2YtotYi Pi + Pi Ytot )
1 N
=− 2 2 2
= i 1= i 1 i = i 1
2
1 1 N 2 N Yi 1 N 2 Yi
= 2
1 − ∑ Pi ∑ i
P
2 N 2 i =1 i 1 =
− Ytot − 2 ∑ i
P − Ytot `
= Pi 4 N i 1 Pi
2 2 2
1 N Yi 1 N 2 N Yi 1 N 2 Yi
= ∑ Pi
2 i 1 NPi
− Y − 2 ∑ i ∑
P − Ytot − 2 ∑ i
P − Ytot
= 4N=i 1 =i 1 Pi 4N =i 1 Pi
↓ ↓
variance of WR reduction of variance
case for n = 2 in WR with varying
probability
14
Estimation of Var ( z )
=
Var ( z ) E ( z 2 ) − ( E ( z )) 2
= E(z 2 ) − Y 2
Since
E ( z1 z2 ) = E [ z1 E ( z2 | u1 ) ]
= E z1Y
= YE ( z1 )
= Y 2.
Consider
E z 2 − z1 z2 = E ( z 2 ) − E ( z1 z2 )
= E(z 2 ) − Y 2
= Var ( z )
(z ) =
⇒ Var z 2 − z1 z2 is an unbiased estimator of Var ( z )
Alternative form
( z=
Var ) z 2 − z1 z2
z +z
2
= 1 2 − z1 z2
2
( z1 − z2 ) 2
=
4
2
1 y1 y y 1 − p1
= − 1− 2
4 Np1 N N p2
2
1 y1 y2 (1 − p1 )
= (1 − p1 ) −
4N 2 p1 p2
2
(1 − p1 ) 2 y1 y2
= − .
4 N 2 p1 p2
where U i ( r ) denotes that the ith unit is drawn at the rth draw. Let ( y1 , y2 ,.., yr ,..., yn ) and
( p1 , p2 ,..., pr ,..., pn ) be the values of study variable and corresponding initial probabilities of selection,
respectively. Further, let Pi (1) , Pi (2) ,..., Pi ( r ) ,..., Pi ( n ) be the initial probabilities of
1 yr
=
zr y1 + y2 + ... + yr −1 + (1 − p1 − ... − pr −1 ) for=
r 2,3,..., n.
N pr
1 n
Consider z = ∑ zr as an estimator of population mean Y .
n r =1
1
E ( zr ) = E1 E2 zr U i (1) , U i (2) ,..., U i ( r −1)
N
where E 2 is the conditional expectation after fixing the units U i (1) , U i (2) ,..., U i ( r −1) drawn in the first (r -
1) draws.
Consider
y y
E r (1 − p − ... −=
p ) E E r (1 − p − ... − p ) U ,U ,...,U
p 1 r −1 1 2p 1 r − 1 i(1) i(2) i(r − 1)
r r
y
= E (1 − P − P ... − P ) E r U ,U ,...,U .
1 i(1) i(2) i(r − 1) 2 p i(1) i(2) i(r − 1)
r
y Y
r j
Since conditionally can take any one of the ( N - r -1) values , j = 1, 2,..., N with probabilities
p P
r j
P
j
, so
1 − P − P ... − P
i(1) i(2) i(r − 1)
y N Yj P
j
E r (1 − p − ... − p ) E (1 − P − P ... − P
= ) ∑ * .
p 1 r −1 1 i(1) i(2) i(r − 1) P (1 − P − P ... − P )
r j = 1 j i(1) i(2) i(r − 1)
N
= E ∑ *Y
1 j
j =1
N *
where ∑ denotes that the summation is taken over all the values of y except the y values selected in the first (r -1) draws
j =1
N
like as ∑ , i.e., except the values y , y ,..., y
1 2 r −1
which are selected in the first (r -1) draws.
j=
1(≠ i(1), i(2),..., i(r − 1))
16
Thus now we can express
1 y
=
E ( zr ) E1E2 y1 + y2 + ... + yr −1 + r (1 − p1 − ... − pr −1 )
N pr
1 N
= E1 Yi (1) + Yi (2) + ... + Yi ( r −1) + ∑ *Y j
N j =1
1 N
= E1 Yi (1) + Yi (2) + ... + Yi ( r −1) +
N ∑ Yj
j=
1( ≠i (1),i (2),...,i ( r −1))
=
1
{ (
E1 Yi (1) + Yi (2) + ... + Yi ( r −1) + Ytot − Yi (1) + Yi (2) + ... + Yi ( r −1)
N
)}
1
= E Y
N 1 tot
Y
= tot
N
= Y=
for all r 1, 2,..., n.
Then
1 n
E(z ) = ∑ E ( zr )
n r =1
1 n
= ∑Y
n r =1
=Y.
Thus z is an unbiased estimator of population mean Y .
The expression for variance of z in general case is complex but its estimate is simple.
Estimate of variance:
=
Var (z ) E(z 2 ) − Y 2 .
Consider for r < s,
E ( zr zs ) = E [ zr E ( zs | U1 , U 2 ,..., U s −1 ) ]
= E zrY
= YE ( zr )
=Y2
17
Further, for s < r ,
E ( zr zs ) = E [ zs E ( zr | U1 , U 2 ,..., U r −1 ) ]
= E zsY
= YE ( zs )
= Y 2.
Consider
1 n n 1 n n
E
−
∑ ∑ z r s
z =
−
∑ ∑ E ( zr z s )
n ( n 1) r ( ≠ s ) = 1 s = 1 n ( n 1) r ( ≠ s ) = 1 s = 1
1
= n(n − 1)Y 2
n(n − 1)
= Y 2.
Substituting Y 2 in Var ( z ), we get
Var ( z ) = E ( z 2 ) − Y 2
1 n n
= E(z 2 ) − E ∑ ∑ E ( zr z s )
n(n − 1) r ( ≠ s ) = 1 s = 1
n n
(z ) = 1
⇒ Var z2 − ∑ ∑ zr z s .
n(n − 1) r ( ≠ s ) = 1 s = 1
2
n n n n
Using ∑ = zr ∑ zr2 + ∑ ∑z z r s
r= 1 r= 1 r ( ≠ s )= 1 s = 1
n n n
⇒ ∑ ∑ zr z s =
r ( ≠ s )= 1 s = 1
n 2 z 2 − ∑ zr2 ,
r= 1
(z ) = 1 2 2 n 2
Var z2 − n z − ∑ zr
n(n − 1) r =1
1 n 2
= ∑
n(n − 1) r =1
zr − nz 2
n
1
= ∑ ( zr − z ) 2 .
n(n − 1) r =1
18
Unordered estimator:
In ordered estimator, the order in which the units are drawn is considered. Corresponding to any ordered
estimator, there exist an unordered estimator which does not depend on the order in which the units are
drawn and has smaller variance than the ordered estimator.
N
In case of sampling WOR from a population of size N , there are unordered sample(s) of size n .
n
Corresponding to any unordered sample(s) of size n units, there are n ! ordered samples.
For example, for n = 2 if the units are u1 and u2 , then
Moreover,
Probability of unordered Probability of ordered Probability of ordered
= +
sample (u1 , u2 ) sample (u1 , u 2 ) sample (u2 , u 1 )
For n = 3, there are three units u1 , u2 , u3 and
-there are following 3! = 6 ordered samples:
(u1 , u2 , u3 ), (u1 , u3 , u2 ), (u2 , u1 , u3 ), (u2 , u3 , u1 ), (u3 , u1 , u2 ), (u3 , u2 , u1 )
N
Let=
zsi , s 1, 2,..,= 2,..., n !( M ) be an estimator of population parameter θ based on ordered
, i 1,=
n
sample si . Consider a scheme of selection in which the probability of selecting the ordered sample
( si ) is psi . The probability of getting the unordered sample(s) is the sum of the probabilities, i.e.,
M
ps = ∑ psi .
i =1
For a population of size N with units denoted as 1, 2,…, N , the samples of size n are n − tuples. In the
nth draw, the sample space will consist of N ( N − 1)...( N − n + 1) unordered sample points.
19
1
psio [selection of any ordered sample]
P=
N ( N − 1)...( N − n + 1)
n! selection of any
psiu P [selection of any
= unordered sample ] = n!P
N ( N − 1)...( N − n + 1) ordered sample
M ( = n!)
n !( N − n)! 1
then=ps ∑ psio
=
i =1
=
N! N
.
n
N
θˆ0 z=
Theorem := si , s 1, 2,..., =
; i 1, 2,...,=
M ( n !)
n
M
and θˆu = ∑ zsi psi'
i =1
where zsi is a function of si th ordered sample (hence a random variable) and psi is the probability of
psi
selection of si th ordered sample and ps' i = .
ps
N
Proof: Total number of ordered sample = n !
n
N
n M
(i ) E (θˆ0 ) = ∑∑ zsi psi
=s 1 =i 1
N
n
M
E (θˆu ) = ∑ ∑ zsi psi' ps
=s 1 =i1
p
= ∑ ∑ zsi si ps
s i ps
= ∑∑ zsi psi
s i
= E (θˆ0 )
N
(ii) Since θˆ0 = zsi , so θˆ02 = zsi2 with probability
= psi , i 1,=
2,..., M , s 1, 2,..., .
n
2
M M
Similarly, θˆu
= '
si si
=i 1 =i 1
2
u ∑=
z p , so θˆ ∑ zsi psi' with probability ps
20
Consider
2
(θˆ0 ) E (θˆ02 ) − E (θˆ0 )
Var=
∑∑ z
2
= 2
si psi − E (θˆ0 )
s i
2
(θˆu ) E (θˆu2 ) − E (θˆu )
Var=
2
= ∑ ∑ zsi psi' ps − E (θˆ0 )
2
s i
2
Var (θˆ0 ) − Var (θˆ=
u) ∑∑
s i
zsi2 psi − ∑ ∑ zsi psi' ps
s i
2
= ∑∑ z psi + ∑ ∑ zsi psi′ ps 2
si
s i s i
− 2∑ ∑ zsi psi' ∑ zsi psi′ ps
s i i
2
'
=∑s ∑i si si ∑i si si ∑i psi − 2 ∑i zsi psi' ∑i zsi psi ps
z 2
p + z p
2
'
=∑ ∑ zsi psi + ∑ zsi psi psi − 2 ∑ zsi psi' zsi psi
2
s i i i
= ∑∑
s i
( zsi − ∑ zsi psi ) psi ≥ 0
i
' 2
⇒ Var (θ 0 ) − Var (θu ) ≥ 0
ˆ ˆ
(θˆ ) = (θˆ ) −
Var u Var 0 ∑∑
s i
( zsi − ∑ zsi psi ) psi
i
' 2
(θˆ ) − p ' (
= p ' Var ∑
i
z − z p ' )2 .
si 0 ∑i
si si ∑ i
si si
Based on this result, now we use the ordered estimators to construct an unordered estimator. It follows
from this theorem that the unordered estimator will be more efficient than the corresponding ordered
estimators.
21
Murthy’s unordered estimator corresponding to Des Raj’s ordered estimator for the
sample size 2
Suppose yi and y j are the values of units U i and U j selected in the first and second draws respectively
with varying probability and WOR in a sample of size 2 and let pi and p j be the corresponding initial
probabilities of selection. So now we have two ordered estimates corresponding to the ordered samples
s1* and s2* as follows
s1* = ( yi , y j ) with (U i , U j )
s2* = ( y j , yi ) with (U j ,U i )
1 yi yj
z (=
s1* ) (1 + pi ) + (1 − pi )
2 N pi p j
1 yi y j (1 − pi )
yi + +
2 N pi pj
and
1 yj yi
z (=
s2* ) (1 + p j ) + (1 − p j )
2 N pj pi
1 y j yi (1 − p j )
yj + + .
2 N pj pi
pi p j
p ( s1* ) =
1 − pi
p j pi
p ( s2* ) =
1− p j
=
p ( s ) p ( s1* ) + p ( s2* )
pi p j (2 − pi − p j )
=
(1 − pi )(1 − p j )
22
1− p j
p '( s1* ) =
2 − pi − p j
1 − pi
p '( s2* ) = .
2 − pi − p j
Murthy’s unordered estimate z (u ) corresponding to the Des Raj’s ordered estimate is given as
1 y y j yj y
(1 + pi ) i + (1 − pi ) (1 − p j ) + (1 + p j ) + (1 − p j ) i (1 − pi )
=
2N pi p j pi pi
(1 − p j ) + (1 − pi )
1
(1 − p j ) {(1 + pi ) + (1 − pi )} + (1 − pi ) {(1 − p j ) + (1 + p j }
yi yj
2 N pi pj
=
2 − pi − p j
yi y
(1 − p j ) + (1 − pi ) j
pi pj
= .
N (2 − pi − p j )
Unbiasedness:
Note that yi and pi can take any one of the values out of Y1 , Y2 ,..., YN and P1 , P2 ,..., PN ,
respectively. Then y j and p j can take any one of the remaining values out of Y1 , Y2 ,..., YN and
P1 , P2 ,..., PN , respectively, i.e., all the values except the values taken at the first draw. Now
23
Y Y j PP PP
(1 − Pj ) i + (1 − Pi ) + i j
i j
Pi Pj 1 − Pi 1 − Pj
E [ z (u ) ] = ∑
1
N i< j 2 − Pi − Pj
Y Y j PP P P
(1 − Pj ) i + (1 − Pi ) + j i
i j
Pi Pj 1 − Pi 1 − Pj
2∑
1
=
2 N i< j 2 − Pi − Pj
Y Y j PP P P
(1 − Pj ) i + (1 − Pi ) + j i
i j
1 Pi Pj 1 − Pi 1 − Pj
= ∑
2 N i≠ j 2 − Pi − Pj
1 Yi Y j PP
= ∑
i≠ j
(1 − Pj ) + (1 − Pi )
−
i j
−
2N Pi P j
(1 Pi )(1 Pj )
1 Yi Pj Y j Pi
= ∑ 1 − P + 1 − P
2N i≠ j i j
N N
N
=
Using result ∑ aib j ∑ ai ∑ b j − bi , we have
=
i≠ j 1 =i 1 = j 1
1 N Yi N N Y N
E [ z (u ) ]
= ∑ (∑ Pj − Pi ) + ∑ j (∑ Pi −Pj )
1 − Pi j 1
2 N i 1 =
= = 1 − Pj i 1
j 1 =
1 N Yi N Y
= ∑ (1 − Pi ) + ∑ j (1 − Pj )
= 1 − Pi
2 N i 1 = j 1 1 − Pj
1 N N
= ∑ Yi + ∑ Y j
N i 1 =j 1
2=
Y +Y
=
2
=Y.
24
Variance: The variance of z (u ) can be found as
2
1 N (1 − Pi − Pj )(1 − Pi )(1 − Pj ) Yi Y j PP
i j (2 − Pi − Pj )
Var [ z (u ) ] ∑ −
2 i ≠ j =1 N (2 − Pi − Pj )
2
Pi Pj (1 − Pi )(1 − Pj )
2
1 N PP
i j (1 − Pi − Pj ) Yi Yj
∑ −
2 i ≠ j =1 N (2 − Pi − Pj ) Pi Pj
2
Unbiased estimator of V [ z (u )]
value of characteristic under study and a sample of size n is drawn by WOR using arbitrary probability
of selection at each draw.
Thus prior to each succeeding draw, there is defined a new probability distribution for the units available
at that draw. The probability distribution at each draw may or may not depend upon the initial
probability at the first draw.
25
nyi
=
Let zi = , i 1...N assuming E (α i ) > 0 for all i
NE (α i )
where
E (α=
i) 1.P(Yi ∈ s ) + 0.P (Yi ∉ s )
= πi
is the probability of including the unit i in the sample and is called as inclusion probability.
1 n
zn Yˆ=
= HT ∑ zi
n i =1
1 N
= ∑ α i zi .
n i =1
Unbiasedness
1 N
E (YˆHT ) = ∑ E ( ziα i )
n i =1
1 N
= ∑ zi E (α i )
n i =1
1 N nyi
= ∑ E (α i )
n i =1 NE (α i )
1 N nyi
= = ∑
n i =1 N
Y
Variance
V (YˆHT ) = V ( zn )
= E ( zn2) − [ E ( zn ) ]
2
= E ( zn2) − Y 2 .
Consider
2
1 N
E ( z ) = 2 E ∑ α i zi
n
2
n i =1
1 N N N
= 2 E ∑ α i2 zi2 + ∑ ∑ α iα j zi z j
n i= 1 i ( ≠ j )= 1 j = 1
1 N N N
= 2 ∑ zi2 E (α i2 ) + ∑ ∑ zi z j E (α iα j ) .
n i= 1 i ( ≠ j )= 1 j = 1
26
If S = {s} is the set of all possible samples and π i is probability of selection of ith unit in the sample s
then
E (α=
i) 1 P( yi ∈ s ) + 0.P( yi ∉ s )
= 1.π i + 0.(1 − π i ) = π i
E (α=) 12. P( yi ∈ s ) + 02.P( yi ∉ s )
i
2
= πi.
So
E (α i ) = E (α i2 )
N N N
1
=
E(z )
n
n
2
∑ z π + ∑ ∑ π ij zi z j
2
=i 1 =
2
i i
i (# j ) i 1
where π ij is the probability of inclusion of ith and jth unit in the sample. This is called as second order
inclusion probability.
Now
Y 2 = [ E ( zn ) ]
2
2
1 N
= 2 E ∑ α i zi
n i =1
1 N 2 2
N N
∑ z [ E (α ) ] + ∑ ∑ zi z j E (α i ) E (α j )
n 2 i = 1
i i
i ( ≠ j )= 1 j = 1
1 N 2 2 N N
2 ∑ i i ∑ ∑
= z π + π iπ j zi z j .
n i= 1 i ( ≠ j )= 1 j = 1
Thus
1 N N N
Var (YˆHT )
= 2 ∑
n i= 1
π i zi
2
+ ∑ ∑ π ij zi z j
i ( ≠ j )= 1 j = 1
1 N N N
− 2 ∑ π i2 zi2 + ∑ ∑ π iπ j zi z j
n i= 1 i ( ≠ j )= 1 j = 1
1 N N N
= 2 ∑ π i (1 − π i ) zi2 + ∑ ∑ (π ij − π iπ i ) zi z j
n i= 1 i ( ≠ j )= 1 j = 1
1 N n yi2 2 N N n 2 yi y j
= 2 ∑ π i (1 − π i ) 2 2 + ∑ ∑ (π ij − π iπ i ) 2
n i =1 N π i i ( ≠ j )= 1 j = 1 N π iπ j
1 N 1− π 2 N N π −π π
∑ + ∑ ∑ yi y j
ij i i
i
i
y
N2 i = 1 π i i ( ≠ j )= 1 j = 1 π iπ j
27
Estimate of variance
1 n y 2 (1 − π ) n n π −π π yi y j
∑
= =
Vˆ1 Var (YˆHT )
N2
i
πi2
i
+ ∑ ∑
ij
πi j
i j
π π
.
i = 1 i ( ≠ j )= 1 j = 1 i j
This is an unbiased estimator of variance .
yi
Drawback: It does not reduces to zero when all are same, i.e., when yi ∝ π i .
πi
Consequently, this may assume negative values for some samples.
A more elegant expression for the variance of yˆ HT has been obtained by Yates and Grundy.
Since there are exactly n values of α i which are 1 and ( N − n) values which are zero, so
N
∑α
i =1
i = n.
∑ E (α ) = n.
i =1
i
Also
2
N N N N
∑αi
E =
i= 1
∑ E (α i2 ) +
i= 1
∑ ∑ E (α α
i ( ≠ j )= 1 j = 1
i j )
N N N
∑ E (α i ) + ∑ ∑ E (α iα J ) (using E (α i ) =
E (n) = E (α i2 ))
2
i= 1 i ( ≠ j )= 1 j = 1
N N
n 2= n + ∑ ∑ E (α α
i ( ≠ j )= 1 j = 1
i J )
N N
∑ ∑ E (α α=)
i ( ≠ j )= 1 j = 1
i J n(n − 1)
Thus E (α iα
= j) (α i 1,=
P= α j 1)
α i 1) P(=
= P(= α j 1=
α i 1)
(α i ) E (α j α i 1)
= E=
28
Therefore
N
∑ E (α i α j ) − E (α i ) E (α j )
j ( ≠i ) =
1
N
= ∑ E (α i ) E (α j | α i= 1) − E (α i ) E (α j )
j ( ≠i ) =
1
N
= E (α i ) ∑
j ( ≠i ) =
1
E (α j | α i= 1) − E (α j )
= E (α i ) [ (n − 1) − (n − E (α i ) ]
− E (α i ) [1 − E (α i ) ]
=
−π i (1 − π i )
= (1)
Similarly
N
∑
i ( ≠ j )=
1
E (α i α j ) − E (α i ) E (α j ) =
−π j (1 − π j ). (2)
1 N N N
ˆ )
Var (Y=HT ∑
n2 i= 1
π i (1 − π i ) zi
2
+ ∑ ∑ (π ij − π i π j ) zi z j
i ( ≠ j )= 1 j = 1
Using (1) and (2) in this expression, we get
1 N N N N
2 ∑ i ∑ ∑ ∑
YˆHT )
Var (= π (1 − π i ) zi
2
+ π j (1 − π j ) z 2
j − 2 (π iπ j − π ij ) z i z j
2n i= 1 j= 1 i ≠ j= 1 j= 1
1 N N
2 ∑ ∑
= − E (α iα j ) − E (α i ) E (α j ) zi2
2n =i 1 j (=
≠i ) 1
N N
− ∑ ∑ E (α iα j ) − E (α i ) E (α j ) z 2j − 2 ∑ ∑ { E (α i ) E (α j ) − E (α iα j )} zi z j
N n
=j 1 i ( ≠ =
j) 1 i ( ≠ j )= 1 j = 1
1 N N N N N N
= ∑ ∑ −π + π π + ∑ ∑ − π + π π + ∑ ∑ (π ij − π iπ i ) zi z j
2 2
( ij i i ) zi ( ij i i ) z j 2
2n 2 i ( ≠ j ) = 1 j = 1 i ( ≠ j )= 1 j = 1 i ( ≠ j )= 1 j = 1
1 N N
2 ∑ ∑
(π iπ j − π ij )( zi2 + z 2j − 2 zi z j ) .
2n i ( ≠ j ) = 1 j = 1
The expression for π i and π ij can be written for any given sample size.
For example, for n = 2 , assume that at the second draw, the probability of selecting a unit from the units
available is proportional to the probability of selecting it at the first draw. Since
29
E (α i ) = Probability of selecting Yi in a sample of two
= Pi1 + Pi 2
where Pir is the probability of selecting Yi at r th draw (r = 1, 2). If Pi is the probability of selecting the
ith unit at first draw (i = 1, 2,..., N ) then we had earlier derived that
Pi1 = Pi
yi is not selected yi is selected at 2nd draw|
Pi 2 = P st P
yi is not selected at 1 draw
st
at 1 draw
N Pj Pi
= ∑
1 1 − Pj
j ( ≠i ) =
N P P
= ∑ j − i Pi .
j =1 1 − Pj 1 − Pi
So
N Pj P
= E (α i ) Pi ∑ − i
j =1 1 − Pj 1 − Pi
Again
E (α iα j ) = Probability of including both yi and y j in a sample of size two
= Pi1 Pj 2|i + Pj1 Pi 2| j
Pj Pi
= Pi + Pj
1 − Pi 1 − Pj
1 1
i j
=PP + .
1 − Pi 1 − Pj
Estimate of Variance
The estimate of variance is given by
(Yˆ ) 1 n n π iπ j − π ij
=Var HT
2n 2
∑∑
i(≠ j ) j=
1 π
( zi −z j ) 2 .
ij
30
Midzuno system of sampling:
Under this system of selection of probabilities, the unit in the first draw is selected with unequal
probabilities of selection (i.e., pps) and remaining all the units are selected with SRSWOR at all
subsequent draws.
(n − 1) N − n n−2
= ( Pi + Pj ) +
( N − 1) N − 2 N − 2
n −1 N − n n−2
=π ij ( Pi + Pj ) + .
N −1 N − 2 N − 2
Similarly,
E (α iα jα=
k) π=
ijk Probability of including U i , U j and U k in the sample
(n − 1)(n − 2) N − n n−3
( Pi + Pj + Pk ) + .
( N − 1)( N − 2) N − 3 N − 3
31
By an extension of this argument, if U i , U j ,..., U r are the r units in the sample of size n(r < n), the
Similarly, if U1 , U 2 ,..., U q be the n units, the probability of including these units in the sample is
(n − 1)(n − 2)...1
E (α iα j ...α q=
) π ij ...= ( Pi + Pj + ... + Pq )
( N − 1)( N − 2)...( N − n + 1)
q
1
= ( Pi + Pj + ... + Pq )
N − 1
n −1
which is obtained by substituting r = n .
Thus if Pi ' s are proportional to some measure of size of units in the population then the probability of
selecting a specified sample is proportional to the total measure of the size of units included in the
sample.
Substituting these π i , π ij , π ijk etc. in the HT estimator, we can obtain the estimator of population’s mean
(Yˆ ) 1 n n π iπ j − π ij
=Var HT
2n 2
∑∑
i ≠ j= 1 j= 1 π ij
( zi − z j ) 2
where
N −n n −1
π=
iπ j − π ij ( N − n) PP
i j + (1 − Pi − Pj ) .
( N − 1) 2 N −2
The main advantage of this method of sampling is that it is possible to compute a set of revised
probabilities of selection such that the inclusion probabilities resulting from the revised probabilities are
proportional to the initial probabilities of selection. It is desirable to do so since the initial probabilities
can be chosen proportional to some measure of size.
32
Chapter 8
Double Sampling (Two Phase Sampling)
The ratio and regression methods of estimation require the knowledge of population mean of auxiliary
variable ( X ) to estimate the population mean of study variable (Y ). If information on the auxiliary
variable is not available, then there are two options – one option is to collect a sample only on study
variable and use sample mean as an estimator of population mean.
An alternative solution is to use a part of the budget for collecting information on auxiliary variable to
collect a large preliminary sample in which xi alone is measured. The purpose of this sampling is to
furnish a good estimate of X . This method is appropriate when the information about xi is on file
cards that have not been tabulated. After collecting a large preliminary sample of size n ' units from
the population, select a smaller sample of size n from it and collect the information on y . These two
estimates are then used to obtain an estimator of population mean Y . This procedure of selecting a
large sample for collecting information on auxiliary variable x and then selecting a sub-sample from
it for collecting the information on the study variable y is called double sampling or two phase
sampling. It is useful when it is considerably cheaper and quicker to collect data on x than y and
there is high correlation between x and y.
In this sampling, the randomization is done twice. First a random sample of size n ' is drawn from a
population of size N and then again a random sample of size n is drawn from the first sample of size
n'.
So the sample mean in this sampling is a function of the two phases of sampling. If SRSWOR is
utilized to draw the samples at both the phases, then
- number of possible samples at the first phase when a sample of size n is drawn from a
N
population of size N is M 0 , say.
n '
- number of possible samples at the second phase where a sample of size n is drawn from the
n '
first phase sample of size n ' is M 1 , say.
n
1
Population of X (N units)
Sample
(Large) M 0 samples
n ' units
Subsample
(small) M 1 samples
n units
Then the sample mean is a function of two variables. If is the statistic calculated at the second
phase such that ij , i 1, 2,..., M 0 , j 1, 2,..., M 1 with Pij being the probability that ith sample is chosen
where E2 ( ) denotes the expectation over second phase and E1 denotes the expectation over the first
phase. Thus
M 0 M1
E ( ) Pij ij
i 1 j 1
M 0 M1
PP
i j / i ij (using P( A B) P( A) P( B / A))
i 1 j 1
M0 M1
Pi Pj / i ij
i 1 j 1
st
1 stage 2nd stage
2
Variance of
Var ( ) E E ( )
2
E ( E2 ( )) ( E2 ( ) E ( ))
2
E E2 ( ) E2 ( ) E ( ) 0
2 2
2
E1 E2 E2 ( )]2 [ E2 ( ) E ( )
E1 E2 E2 ( ) E1 E2 E2 ( ) E ( )
2 2
constant for E2
E1 V2 ( ) E1 E2 ( ) E1 ( E2 ( ))
2
E1 V2 ( ) V1 E2 ( )
Note: The two phase sampling can be extended to more than two phases depending upon the need and
objective of the experiment. Various expectations can also be extended on the similar lines .
YˆRd x ' .
y
x
The exact expressions for the bias and mean squared error of YˆRd are difficult to derive. So we find
their approximate expressions using the same approach mentioned while describing the ratio method
of estimation.
3
Let
y Y xX x ' X
0 , 1 , 2
Y X X
E ( 0 ) E (1 ) E ( 2 ) 0
1 1
E (12 ) Cx2
n N
1
E (1 2 ) 2 E ( x X )( x ' X )
X
1
2 E1 E2 ( x X )( x ' X ) | n '
X
1
2 E1 ( x ' X ) 2
X
2
1 1S
x2
n' N X
1 1
Cx2
n' N
E ( 22 ).
1
E ( 0 2 ) Cov( y , x ')
XY
1 1
Cov E ( y | n '), E ( x ' | n ') E Cov( y , x ') | n '
XY XY
1 1
Cov Y , X E Cov( y ', x ')
XY XY
1
Cov ( y ', x '
XY
1 1S
xy
n ' N XY
1 1 S Sy
x
n' N X Y
1 1
CxC y
n' N
where y ' is the sample mean of y ' s based on the sample size n '.
4
1
E ( 01 ) Cov( y , x )
xy
1 1 S
xy
n N XY
1 1 S S
x y
n N X Y
1 1
CxC y
n N
1
E ( 02 ) Var ( y )
Y2
1
2 V1 E2 ( y | n ') E1 V2 ( yn | n ')
Y
1 1 1
2 V1 ( yn' ) E1 s '2y
Y n n '
1 1 1 2 1 1 2
Sy Sy
Y 2 n ' N n n'
2
1 1 S
y2
n N Y
1 1
C y2
n N
where s '2y is the mean sum of squares of y based on initial sample of size n '.
1
E (1 2 ) Cov( x , x ')
X2
1
2 Cov E ( x | n '), E ( x ' | n ') 0
X
1
2 Var ( X ')
X
where Var ( X ') is the variance of mean of x based on initial sample of size n ' .
5
Estimation error of YˆRd
Write YˆRd as
(1 0 )
YˆRd
Y
(1 2 ) X
(1 1 ) X
Y (1 0 )(1 2 )(1 1 ) 1
Y (1 0 )(1 2 )(1 1 12 ...)
Y (1 0 2 0 2 1 o1 1 2 12 )
upto the terms of order two. Other terms of degree greater than two are assumed to be negligible.
Bias of YRd
1 1 1 1 1 1 1 1 1 1
Y 2 C y2 Cx2 C x2 2 Cx C y 2 C x C y
n N n N n' N n' N n N
1 1 1 1
Y 2 C x2 C y2 2 Cx C y Y 2 Cx (2 C y C x )
n N n' N
1 1
MSE (ratio estimator) Y 2 2 Cx C y Cx2 .
n' n
6
The second term is the contribution of second phase of sampling. This method is preferred over ratio
method if
2 CxC y Cx2 0
1 Cx
or
2 Cy
The cost function is C0 nC n ' C ' where C and C ' are the costs per unit for selecting the samples
Now we find the optimum sample sizes n and n ' for fixed cost C0 . The Lagrangian function is
V V'
(nC n ' C ' C0 )
n n'
V
0 C 2
n n
V'
0 C ' 2 .
n ' n'
Thus Cn 2 V
V
or n
C
or nC VC .
Similarly n ' C ' V ' C '.
Thus
VC V ' C '
C0
and so
7
C0 V
Optimum n nopt , say
VC V ' C ' C
C0 V'
Optimum n ' nopt
'
, say
VC V ' C ' C '
V'
Varopt (YˆRd )
V
'
nopt nopt
( VC V ' C ') 2
C0
Yˆregd y ˆ ( x ' x )
n
s ( xi x )( yi y )
S xy
where ˆ xy2 i 1 is an estimator of based on the sample of size n .
n
S x2
(x x )
sx 2
i
i 1
8
It is difficult to find the exact properties like bias and mean squared error of Yˆregd , so we derive the
approximate expressions.
Let
xX
1 x (1 1 ) X
X
x ' X
2 x ' (1 2 ) X
X
s S xy
3 xy sxy (1 3 ) S xy
S xy
sx2 S x2
4 2
sx2 (1 4 ) S x2
Sx
E (1 ) 0, E ( 2 ) 0, E ( 3 ) 0, E ( 4 ) 0
Define
21 E ( x X )2 ( y Y )
3
30 E x X
Estimation error:
Then
Yˆregd y ˆ ( x ' x )
S xy (1 3 )
y ( 2 1 ) X
S x2 (1 4 )
S xy
yX 2
(1 3 )( 2 1 )(1 4 )1
S x
Retaining the powers of ' s upto order two assuming 3 1, (using the same concept as detailed in
Yˆregd y X ( 2 2 3 2 4 1 1 3 1 4 ).
9
Bias:
The bias of Yˆ upto the second order of approximation is
regd
1 1 1 ( x ' X )( sxy S xy )
X
n ' N N XS xy
1 1 1 ( x X )( sxy S xy )
n N N
XS xy
1 1 1 ( x X )( sx2 S x2 )
n N N
XS x2
1 1 1 1 1 1 1 1
X 21 302 21 302
n ' N XS xy n ' N XS x n N XS xy n N XS x
1 1
21 302 .
n n ' S xy S x
Retaining the powers of ' s upto order two, the mean squared error upto the second order of
approximation is
10
MSE (Yˆregd ) E ( y Y ) X ( 2 2 3 2 4 1 1 3 1 4 )
2
Clearly, Yˆregd is more efficient than sample mean SRS, i.e. when no auxiliary variable is used.
Now we address the issue that whether the reduction in variability is worth the extra expenditure
required to observe the auxiliary variable.
where C1 and C2 are the costs per unit observing the study variable y and auxiliary variable x
respectively.
Now minimize the MSE (Yˆregd ) for fixed cost C0 using Lagrangian function with Lagranagian
multiplier as
11
S y2 (1 2 ) 2 S y2
(C1n C2 n ' C0 )
n n'
1
0 2 S y2 (1 2 ) C1 0
n n
1
0 2 S y2 2 C2 0
n ' n'
S y2 (1 2 )
Thus n
C1
Sy
and n' .
C2
Substituting these values in the cost function, we have
C0 C1n C2 n '
S y2 (1 2 ) 2 S y2
C1 C2
C1 C2
or C0 C1S y2 (1 2 ) C2 2 S y2
1 2
.
or S C (1 2
) S C
C02
y 1 y 2
The optimum mean squared error of Yˆregd is obtained by substituting n nopt and n ' nopt
'
as
S y2 2 C2 S y
C1 (1 2 ) S y C2
S y C0
1 2
S y C1 (1 2 ) S y C2
C0
S y2 2
C1 (1 2 ) C2
C0
12
The optimum variance of y under SRS for SRS where no auxiliary information is used is
C1S y2
Var ( ySRS )opt
C0
1
2
C2
1
2
C1
1.
Thus the double sampling in regression estimator will lead to gain in precision if
C1 2
.
C2 1 1 2 2
Theorem:
(1) An unbiased estimator of population mean Y is given as
x' n y
Yˆ tot i ,
n ' n i 1 xi
'
where xtot denotes the total for x in the first sample.
13
2
1 1 (n ' 1) N
(2) Var (Yˆ ) S y2
xi yi
n' N
N ( N 1)nn ' i 1 X tot xi
Ytot
, where X tot and Ytot denote the totals
X
tot
of x and y respectively in the population.
Proof. Before deriving the results, we first mention the following result proved in varying probability
scheme sampling.
Result: In sampling with varying probability scheme for drawing a sample of size n from a
population of size N and with replacement .
1 n yi
(i) z zi is an unbiased estimator of population mean
n i 1
y where zi
Npi
, pi being the
probability of selection of ith unit. Note that yi and pi can take anyone of the N values Y1 , Y2 ,..., YN
Let E2 denote the expectation of Yˆ , when the first sample is fixed. The second is selected with
xi
probability proportional to x , hence using the result (i) with Pi '
, we find that
xtot
14
Yˆ 1 n y
E2 E2 i
n' n i 1 n ' xi
'
xtot
x' n y
E2 tot i
nn ' i 1 xi
y'
where y ' is the mean of y for the first sample. Hence
E (Yˆ ) E1 E2 Yˆ | n '
E1 ( yn ' )
Yˆ ,
which proves the part (1) of the theorem. Further,
Var (Yˆ ) V1 E2 Yˆ | n ' E1V2 Yˆ | n '
S E V Yˆ | n ' .
1 1 2
y 1 2
n' N
Now, using the result (ii), we get
2
n'
1
V2 Yˆ | n ' '2 ' i i ytot
x y '
nn i 1 xtot xi
x'
tot
2
1 n' n' y yj
'2
nn
xi x j i ,
x x
i 1 i j i j
and hence
2
n '(n ' 1)
using the probability of a specified pair of units being selected in the sample is . So we can
N ( N 1)
express
15
2
1 n '(n ' 1) N
E1V2 Yˆ / n ' '2
nn N ( N 1) i 1
xi y
i Ytot .
X tot xtot
X
tot
Substituting this in V2 Yˆ | n ' , we get
2
1 1 (n ' 1) N
xi yi
Var (Yˆ ) S y2 Ytot .
n' N nn ' N ( N 1) i 1 X tot xi
X
tot
This proves the second part (2) of the theorem.
We now consider the estimation of Var (Yˆ ). Given the first sample, we obtain
1 n y2 n'
E2 i yi2 ,
n i 1 pi i 1
xi
where pi '
. Also, given the first sample,
xtot
1 n
yi
2
Hence
yi
2
1 2
n
ˆ ˆ
E2 Y
2
( 1)
'
Y y' .
2
n n i 1 n pi
ˆ
'
xtot n
yi xi
Substituting Y and pi ' the expression becomes
n ' n i 1 xi xtot
Using
1 n y2 n'
E2 i yi2 ,
n i 1 pi i 1
we get
16
1 n x' '2
xtot n'
E2 yi2 tot ( A B) yi2 n ' y '2
n i 1 xi nn '(n 1) i 1
2
n yi n
yi2
where A , and B 2 which further simplifies to
i 1 xi i 1 xi
where s '2y is the mean sum of squares of y for the first sample. Thus, we obtain
Thus
2
1 yi ˆ xi yi
2
n
xtot
'
(n ' 1) N
E1 E2
n(n 1) i 1 n ' xi
Y Y (2)
nn ' N ( n 1) i 1 X tot xi
tot
X
tot
when gives an unbiased estimator of
2
(n ' 1) N
xi yi
nn ' N ( N 1) i 1 X tot xi
Ytot
.
X
tot
It is one of the basic assumptions in any sampling procedure that the population can be divided into a finite
number of distinct and identifiable units, called sampling units. The smallest units into which the
population can be divided are called elements of the population. The groups of such elements are called
clusters.
In many practical situations and many types of populations, a list of elements is not available and so the
use of an element as a sampling unit is not feasible. The method of cluster sampling or area sampling can
be used in such situations.
In cluster sampling
- divide the whole population into clusters according to some well defined rule.
- Treat the clusters as sampling units.
- Choose a sample of clusters according to some procedure.
- Carry out a complete enumeration of the selected clusters, i.e., collect information on all the
sampling units available in selected clusters.
Area sampling
In case, the entire area containing the populations is subdivided into smaller area segments and each
element in the population is associated with one and only one such area segment, the procedure is called as
area sampling.
Examples:
In a city, the list of all the individual persons staying in the houses may be difficult to obtain or even
may be not available but a list of all the houses in the city may be available. So every individual
person will be treated as sampling unit and every house will be a cluster.
The list of all the agricultural farms in a village or a district may not be easily available but the list
of village or districts are generally available. In this case, every farm in sampling unit and every
village or district is the cluster.
1
Moreover, it is easier, faster, cheaper and convenient to collect information on clusters rather than on
sampling units.
In both the examples, draw a sample of clusters from houses/villages and then collect the observations on
all the sampling units available in the selected clusters.
In a closed segment, the sum of the characteristic under study, i.e., area, livestock etc. for all the elements
associated with the segment will account for all the area, livestock etc. within the segment.
Construction of clusters:
The clusters are constructed such that the sampling units are heterogeneous within the clusters and
homogeneous among the clusters. The reason for this will become clear later. This is opposite to the
construction of the strata in the stratified sampling.
There are two options to construct the clusters – equal size and unequal size. We discuss the estimation of
population means and its variance in both the cases.
2
Case of equal clusters
Suppose the population is divided into N clusters and each cluster is of size n .
Select a sample of n clusters from N clusters by the method of SRS, generally WOR.
So
total population size = NM
total sample size = nM .
Let
yij : Value of the characteristic under study for the value of j th element ( j 1, 2,..., M ) in the i th cluster
(i 1, 2,..., N ).
1 M
yi
M
y
j 1
ij mean per element of i th cluster .
N Clusters
n Clusters
3
Estimation of population mean:
First select n clusters from N clusters by SRSWOR.
Based on n clusters, find the mean of each cluster separately based on all the units in every cluster. So we
have the cluster means as y1 , y2 ,..., yn . Consider the mean of all such cluster means as an estimator of
population mean as
1 n
ycl yi .
n i 1
Bias:
1 n
E ( ycl ) E ( yi )
n i 1
1 n
Y
n i 1
(since SRS is used)
Y.
Variance:
The variance of ycl can be derived on the same lines as deriving the variance of sample mean in
SRSWOR. The only difference is that in SRSWOR, the sampling units are y1 , y2 ,..., yn whereas in case
N n 2 N n 2
Note that is case of SRSWOR, Var ( y ) Nn S and Var ( y ) Nn s ,
( y ) N n s2
Var cl b
Nn
1 n
where sb2
n 1 i 1
( yi ycl ) 2 is the mean sum of squares between cluster means in the sample .
4
Comparison with SRS :
If an equivalent sample of nM units were to be selected from the population of NM units by SRSWOR,
the variance of the mean per element would be
NM nM S 2
Var ( ynM ) .
NM nM
2
f S
.
n M
N -n 1 N M
where f
N
and S 2
NM 1 i 1 j 1
( yij Y ) 2 .
N n 2
Also Var ( ycl ) Sb
Nn
f
Sb2 .
n
Consider
N M
( NM 1) S 2 ( yij Y ) 2
i 1 j 1
N M 2
( yij yi ) ( yi Y )
i 1 j 1
N M N M
( yij yi ) 2 ( yi Y ) 2
i 1 j 1 i 1 j 1
N ( M 1) S M ( N 1) Sb2
2
w
where
1 N
S w2
N
S
i 1
i
2
is the mean sum of squares within clusters in the population
1 M
Si2 ( yij yi )2 is the mean sum of squares for the ith cluster.
M 1 j 1
5
Thus the relative efficiency increases when S w2 is large and Sb2 is small. So cluster sampling will be
efficient if clusters are so formed that the variation the between cluster means is as small as possible while
variation within the clusters is as large as possible.
MN ( M 1) i 1 j 1 k ( j ) 1
( yij Y )( yik Y )
1 N M
( yij Y )2
MN i 1 j 1
1 N M M
MN ( M 1) i 1 j 1 k ( j )1
( yij Y )( yik Y )
MN 1 2
S
MN
N M M
i 1 j 1 k ( j ) 1
( yij Y )( yik Y )
.
( MN 1)( M 1) S 2
Consider
2
N 1 N M
( yi Y ) 2
i 1 M
( yij Y )
i 1 j 1
2
N 1 M
1 M M
2 ( yij Y ) 22
( yij Y )( yik Y )
i 1 M j 1 M j 1 k ( j ) 1
N M M N N M
( yij Y )( yik Y ) M 2 ( yi Y ) 2 ( yij Y ) 2
i 1 j 1 k ( j ) 1 i 1 i 1 j 1
or
( MN 1)( M 1) S 2 M 2 ( N 1) Sb2 ( NM 1) S 2
( MN 1)
or Sb2 1 ( M 1) S 2 .
M ( N 1)
2
6
The variance of ycl now becomes
N n 2
Var ( ycl ) Sb
N
N n MN 1 S 2
1 ( M 1) .
Nn N 1 M 2
MN 1 N n
For large N , 1, N 1 N , 1 and so
MN N
1 S2
Var ( ycl ) 1 ( M 1) .
nM
The variance of sample mean under SRSWOR for large N is
S2
Var ( ynM ) .
nM
The relative efficiency for large N is now given by
Var ( ynM )
E
Var ( ycl )
S2
nM
S2
1 ( M 1)
nM
1 1
; 1.
1 ( M 1) M 1
If M 1 then E 1, i.e., SRS and cluster sampling are equally efficient. Each cluster will consist
of one unit, i.e., SRS.
If M 1, then cluster sampling is more efficient when
E 1
or ( M 1) 0
or 0.
If 0, then E 1 , i.e., there is no error which means that the units in each cluster are arranged
randomly. So sample is heterogeneous.
In practice, is usually positive and decreases as M increases but the rate of decrease in
is much lower in comparison to the rate of increase in M . The situation that 0 is possible
when the nearby units are grouped together to form cluster and which are completely enumerated.
There are situations when 0.
7
Estimation of relative efficiency:
The relative efficiency of cluster sampling relative to an equivalent SRSWOR is obtained as
S2
E .
MSb2
1 n
Since ycl yi is the mean of n means yi from a population of N means yi , i 1, 2,..., N which
n i 1
are drawn by SRSWOR, so from the theory of SRSWOR,
1 n
E ( sb2 ) E ( yi yc ) 2
n i 1
1 N
N 1 i 1
( yi Y ) 2
Sb2 .
1 n 2
Since sw2
n i 1
Si is the mean of n mean sum of squares Si2 drawn from the population of N mean
sums of squares Si2 , i 1, 2,..., N , so it follows from the theory of SRSWOR that
1 n 1 n 1 n 1 N
E ( sw2 ) E Si2 E ( Si2 ) S i
2
n i 1 n i 1 n i 1 N i 1
1 N
Si2
N i 1
S w2 .
N M 2
or ( MN 1) S ( yij yi ) ( yi Y )
2
i 1 j 1
N M
( yij yi ) 2 ( yi Y ) 2
i 1 j 1
N
( M 1) Si2 M ( N 1) Sb2
i 1
N ( M 1) S w2 M ( N 1) Sb2 .
8
An unbiased estimator of S 2 can be obtained as
1
Sˆ 2 N ( M 1) sw2 M ( N 1) sb2 .
MN 1
So
( y ) N n s2
Var cl b
Nn
ˆ2
( y ) N n S
Var nM
Nn M
1 n
where sb2 ( yi ycl )2 .
n 1 i 1
S2
An estimate of efficiency E is
MSb2
N ( M 1) sw2 M ( N 1) sb2
Eˆ .
M ( NM 1) sb2
1 M 1 S w2
E
M M MSb2
and its estimate is
1 M 1 sw2
Eˆ .
M M Msb2
Suppose that a sample of n clusters is drawn from N clusters by SRSWOR. Defining yij 1 if the j th
unit in the i th cluster belongs to the specified category (i.e. possessing the given attribute) and yij 0
9
yi Pi ,
1 N
Y Pi P,
N i 1
MPQ
Si2 i i
,
( M 1)
N
M PQ
i i
S w2 i 1
,
N ( M 1)
NMPQ
S2 ,
NM 1)
1 N
Sb2 ( Pi P)2 ,
N 1 i 1
1 N 2
N 1 i 1
Pi NP 2
1 N N
Pi (1 Pi ) Pi NP 2
( N 1) i 1 i 1
1 N
( N 1)
NPQ
i 1
i i,
PQ
where Pi is the proportion of elements in the i th cluster, belonging to the specified category and
Qi 1 Pi , i 1, 2,..., N and Q 1 P. Then, using the result that ycl is an unbiased estimator of Y , we
find that
1 n
Pˆcl Pi
n i 1
is an unbiased estimator of P and
N
( N n)
NPQ PQ
i i
.
Var ( Pˆcl ) i 1
Nn ( N 1)
N n PQ
Var ( Pˆcl ) [1 ( M 1) ],
N 1 nM
M ( N 1) Sb2 NS w2
( MN 1)
by substituting Sb2 , S w2 and S 2 in , we obtain
10
N
M 1 PQ i i
1 i 1
.
( M 1) N PQ
( Pˆ ) N n s 2
Var cl b
nN
N n 1 n
( Pi Pˆcl )2
nN (n 1) i 1
N n ˆ ˆ n
Nn(n 1)
nP Q
cl cl
i 1
PQ
i i
M ( N 1) 1
E
( MN 1) 1 ( M 1)
( N 1) NPQ
.
NM 1 N
NPQ PQ i i
i 1
1
If N is large, then E .
M
An estimator of the total number of elements belonging to a specified category is obtained by multiplying
Pˆcl by NM , i.e. by NMPˆcl . The expressions of variance and its estimator are obtained by multiplying the
11
Let there be N clusters and M i be the size of i th cluster, let
N
M0 Mi
i 1
1 N
M
N
M
i 1
i
1 Mi
yi
Mi
y
j 1
ij : mean of i th cluster
1 N Mi
Y
M0
y
i 1 j 1
ij
N
Mi
yi
i 1 M0
1 N
Mi
N
M i 1
yi
0
Suppose that n clusters are selected with SRSWOR and all the elements in these selected clusters are
surveyed. Assume that M i ’s (i 1, 2,..., N ) are known.
Population
N Clusters
n Clusters
Based on this scheme, several estimators can be obtained to estimate the population mean. We consider
four type of such estimators.
12
1. Mean of cluster means:
Consider the simple arithmetic mean of the cluster means as
1 n
yc yi
n i 1
1 N
E yc y i
N i 1
N
Mi
Y (where Y yi ).
i 1 M0
The bias of yc is
Bias yc E yc Y
1 N N
Mi
N
i 1
yi yi
i 1 M 0
1 N M N
M 0 i 1
M i yi 0
N
y
i 1
i
N N
1
N
M i yi
i 1
M 0 i 1
M i yi i 1
N
1 N
(M i M )( yi Y )
M 0 i 1
N 1
Smy
M0
Bias yc 0 if M i and yi are uncorrelated .
2
N n 2 N 1 2
Sb S my
Nn M 0
where
1 N
Sb2 ( yi Y )2
N 1 i 1
1 N
S my (M i M )( yi Y ).
N 1 i 1
13
An estimate of Var yc is
y N n s2
Var c b
Nn
1 n
y c yc .
2
where sb2
n 1 i 1
Y.
Thus yc* is an unbiased estimator of Y . The variance of yc* and its estimate are given by
1 n M
Var ( yc* ) Var i yi
n i 1 M
N n *2
Sb
Nn
( y * ) N n s*2
Var c b
Nn
where
2
1 N Mi
S
*2
b
N 1 i 1 M
yi Y
2
1 n Mi
sb*2
n 1 i 1 M
yi yc*
E ( sb*2 ) Sb*2 .
Note that the expressions of variance of yc* and its estimate can be derived using directly the theory of
SRSWOR as follows:
14
Mi 1 n
Let zi yi , then yc* zi z .
M n i 1
Since SRSWOR is followed, so
N n 1 n
Var ( yc* ) Var ( z )
Nn N 1 i 1
( zi Y ) 2
2
N n 1 N Mi
Nn N 1 i 1 M
yi Y
N n *2
Sb .
Nn
Since
1 n
E ( sb*2 ) E
n 1 i 1
( zi z ) 2
1 n Mi *
2
E y yc
n 1 i 1 M
i
2
1 N Mi
N 1 i 1 M
yi Y
Sb*2
So an unbiased estimator of variance can be easily derived.
M y i i
y **
c
i 1
n
M
i 1
i
It is easy to see that this estimator is a biased estimator of population mean. Before deriving its bias and
mean squared error, we note that this estimator can be derived using the philosophy of ratio method of
estimation. To see this, consider the study variable U i and auxiliary variable Vi as
15
M i yi
Ui
M
Mi
Vi i 1, 2,..., N
M
N
1 N 1 M i
V Vi i 1
1
N i 1 N M
1 n
u ui
n i 1
1 n
v vi .
n i 1
The ratio estimator based on U and V is
YˆR V
u
v
n
u i
i 1
n
v
i 1
i
n
M i yi
M
i 1n
Mi
i 1 M
n
M y i i
i 1
n
.
Mi
i 1
Since the ratio estimator is biased, so yc** is also a biased estimator. The approximate bias and mean
squared errors of yc** can be derived directly by using the bias and MSE of ratio estimator. So using the
results from the ratio method of estimation, the bias up to second order of approximation is given as
follows
N n Sv2 Suv
Bias ( y ) **
U
Nn V 2 UV
c
N n 2 Suv
Sv U
Nn U
1 N
1 N
where U
N
U i
i 1
M i yi
NM i 1
16
1 N
Sv2 (Vi V )2
N 1 i 1
2
1 N Mi
1
N 1 i 1 M
1 N
Suv (U i U )(Vi V )
N 1 i 1
1 N M i yi 1 N
Mi
N 1 i 1 M
NM
M y M
i 1
i i 1
U 1 N
Ruv
V
U
NM
M y .
i 1
i i
N n 2
MSE ( yc** )
Nn
Su R 2 Sv2 2 RSuv
2
1 N M i yi 1 N
where S 2
N 1 i 1 M
u
NM
i 1
M i yi
Alternatively,
2
N n 1 N
MSE ( yc** ) U i RuvVi
Nn N 1 i 1
2
N n 1 N M i yi 1 N
M
Nn N 1 i 1 M
NM
i 1
M i yi i
M
2
N
N n 1 N Mi
2 M i yi
yi
Nn N 1 i 1 M
i 1
NM
.
17
4. Estimator based on unbiased ratio type estimation
1 n 1 Mi
Since yc
n i 1
yi (where yi
Mi
y ) is a biased estimator of population mean and
i 1
ij
N 1
Bias ( yc ) Smy
M0
N 1
Smy
NM
Since SRSWOR is used, so
1 n 1 n
smy (M i m)( yi yc ),
n 1 i 1
m Mi
n i 1
is an unbiased estimator of
1 N
S my ( M i M )( yi Y ),
N 1 i 1
i.e., E ( smy ) S my .
So it follow that
N 1
E ( yc ) Y E ( smy )
NM
N 1
or E yc smy Y .
NM
So
N 1
yc** yc smy
NM
is an unbiased estimator of the population mean Y .
This estimator is based on unbiased ratio type estimator. This can be obtained by replacing the study
Mi Mi
variable (earlier yi ) by yi and auxiliary variable (earlier xi ) by . The exact variance of this
M M
estimate is complicated and does not reduces to a simple form. The approximate variance upto first order
of approximation is
2
1 M i 1
Var y
N N
c
**
n( N 1) i 1 M
yi Y )
NM
yi ( M i M ) .
i 1
18
A consistent estimate of this variance is
2
n
n
1 M i
i yi yc )
y ** 1 M n
Var c
n(n 1) i 1 M
yi M i i 1
nM i 1 n
.
** **
The variance of ycc will be smaller than that of yc (based on the ratio method of estimation) provided
M i yi M 1 N
1 N
the regression coefficient of
M
on i is nearer to
M N
yi than to
i 1 M0
M y .
i 1
i i
n
E M i nM .
i 1
Now if a sample of size nM is drawn from a population of size NM , then the variance of corresponding
sample mean based on SRSWOR is
NM nM S 2
Var ( ySRS )
NM nM
N n S 2
.
Nn M
This variance can be compared with any of the four proposed estimators.
For example, in case of
1 n
yc*
nM
M y
i 1
i i
N n *2
Var ( yc* ) Sb
Nn
2
N n 1 N Mi
Nn N 1 i 1 M
yi Y .
The relative efficiency of yc** relative to SRS based sample mean
Var ( ySRS )
E
Var ( yc* )
S2
.
MSb*2
For Var ( yc* ) Var ( ySRS ), the variance between the clusters ( Sb*2 ) should be less. So the clusters should be
formed in such a way that the variation between them is as small as possible.
19
Sampling with replacement and unequal probabilities (PPSWR)
In many practical situations, the cluster total for the study variable is likely to be positively correlated with
the number of units in the cluster. In this situation, it is advantageous to select the clusters with probability
proportional to the number of units in the cluster instead of with equal probability, or to stratify the clusters
according to their sizes and then to draw a SRSWOR of clusters from each of the stratum. We consider
here the case where clusters are selected with probability proportional to the number of units in the cluster
and with replacement.
Suppose that n clusters are selected with ppswr, the size being the number of units in the cluster. Here
Pi is the probability of selection assigned to the i th cluster which is given by
Mi Mi
Pi , i 1, 2,..., N .
M 0 NM
Consider the following estimator of the population mean:
1 n
Yˆc yi .
n i 1
Then this estimator can be expressed as
1 N
Yˆc i yi
n i 1
where i denotes the number of times the i th cluster occurs in the sample. The random variables
Hence,
1 N
E (Yˆc ) E ( i ) yi
n i 1
1 N
nPi yi
n i 1
N
M
i yi
i 1 NM
N Mi
y
i 1 j 1
ij
Y.
NM
n i 1
1 N
nNM
M (y Y ) .
i 1
i i
2
(Yˆ ) 1 n
Var ( yi Yˆc ) 2
n(n 1) i 1
c
1 n
1 N 1 N
E ( yi Yˆc ) 2 n P y 2
n Pi ( yi Y ) 2 nY 2
n(n 1) i 1 n(n 1) i 1
i i i
n i 1
1 N
1 N
(n 1) i 1
Pi ( yi2 Y 2) Pi ( yi Y ) 2
n i 1
1 N 1 N
(n 1) i 1
Pi ( yi Y ) 2
n i 1
Pi ( yi Y ) 2
1 N
(n 1) i 1
Pi ( yi Y ) 2
Var (Yˆ ).
c
21
Chapter 10
Two Stage Sampling (Subsampling)
In cluster sampling, all the elements in the selected clusters are surveyed. Moreover, the efficiency in
cluster sampling depends on size of the cluster. As the size increases, the efficiency decreases. It
suggests that higher precision can be attained by distributing a given number of elements over a large
number of clusters and then by taking a small number of clusters and enumerating all elements within
them. This is achieved in subsampling.
In subsampling
- divide the population into clusters.
- Select a sample of clusters [first stage}
- From each of the selected cluster, select a sample of specified number of elements [second stage]
The clusters which form the units of sampling at the first stage are called the first stage units and the
units or group of units within clusters which form the unit of clusters are called the second stage units or
subunits.
The procedure is generalized to three or more stages and is then termed as multistage sampling.
Cluster sampling is a special case of two stage sampling in the sense that from a population of N
clusters of equal size m M , a sample of n clusters are chosen.
If further M m 1, we get SRSWOR.
If n N , we have the case of stratified sampling.
yij : Value of the characteristic under study for the j th second stage units of the i th first stage
1 M
Yi
M
y
j 1
ij : mean per 2nd stage unit of i th 1st stage units in the population.
1 N M
1 N
Y
MN
yij
i 1 j 1 N
y
i 1
i YMN : mean per second stage unit in the population
1 m
yi
m j 1
yij : mean per second stage unit in the i th first stage unit in the sample.
1 n m 1 n
y ij n
mn i 1 j 1
y
i 1
yi ymn : mean per second stage in the sample.
Advantages:
The principle advantage of two stage sampling is that it is more flexible than the one stage sampling. It
reduces to one stage sampling when m M but unless this is the best choice of m , we have the
opportunity of taking some smaller value that appears more efficient. As usual, this choice reduces to a
balance between statistical precision and cost. When units of the first stage agree very closely, then
consideration of precision suggests a small value of m . On the other hand, it is sometimes as cheap to
measure the whole of a unit as to a sample. For example, when the unit is a household and a single
respondent can give as accurate data as all the members of the household.
2
Population (MN units)
Second stage
sample
m units
Cluster Cluster Cluster n clusters
… … …
m units m units m units (large number
of elements
mn units from each
cluster)
Note: The expectations under two stage sampling scheme depend on the stages. For example, the
expectation at second stage unit will be dependent on first stage unit in the sense that second stage unit
will be in the sample provided it was selected in the first stage.
3
In case of two stage sampling,
E (ˆ) E1[ E2 (ˆ)]
average average average over
over over all all possible 2nd stage
all 1st stage selections from
samples samples a fixed set of units
E (ˆ) E1 E2 E3 (ˆ) .
Consider
E2 (ˆ ) 2 E2 (ˆ 2 ) 2 E2 (ˆ) 2
.
E2 (ˆ V2 (ˆ) 2 E2 (ˆ) 2
2
E1 E1 E2 (ˆ) 2 E1 V2 (ˆ)
2
Var (ˆ) V1 E2 (ˆ) E1 V2 (ˆ) .
Var (ˆ) V1 E E3 (ˆ) E1 V2 E3 (ˆ) E1 E2 V3 (ˆ) .
2
4
Estimation of population mean:
Consider y ymn as an estimator of the population mean Y .
Bias:
Consider
E ( y ) E1 E2 ( ymn )
E1 E2 ( yim i ) (as 2nd stage is dependent on 1st stage)
1 n
= E1 Yi
n i 1
1 N
Yi
N i 1
Y .
Variance
Var ( y ) E1 V2 ( y i ) V1 E2 ( y / i )
1 n 1 n
E1 V2 yi i V1 E2 yi / i
n i 1 n i 1
1 n 1 n
E1 2 V ( yi i ) V1 E2 ( yi / i )
n i 1 n i 1
1 n 1 1 1 n
E1 2 Si2 V1 Yi
n i 1 m M n i 1
1 n
1 1
2 E1 ( Si2 ) V1 ( yc )
n i 1 m M
(where yc is based on cluster means as in cluster sampling)
11 1 N n 2
n S w2 Sb
n2
m M Nn
1 1 1 1 1
S w2 Sb2
nm M n N
2
1 1
Yij Yi
N N M
where S w2
N
Si2
i 1 N ( M 1) i 1 j 1
1 N
Sb2
N 1 i 1
(Yi Y ) 2
5
Estimate of variance
An unbiased estimator of variance of y can be obtained by replacing Sb2 and S w2 by their unbiased
Consider an estimator of
1 N
S w2
N
S
i 1
i
2
2
1 M
where Si2 yij Yi
M 1 j 1
1 n 2
as si
sw2
n i 1
1 m
where si2 ( yij yi )2 .
m 1 j 1
So
E ( sw2 ) E1 E2 sw2 i
1 n
E1 E2 si2 i
n i 1
1 n
E1 E2 ( si2 i )
n i 1
1 n 2
E1 Si
n i 1
(as SRSWOR is used)
1 n
n i 1
E1 ( Si2 )
1 N
1 N
N
i 1 N
S
i 1
i
2
1 N
N
S
i 1
i
2
S 2
w
Consider
1 n
sb2
n 1 i 1
( yi y ) 2
as an estimator of
1 N
Sb2
N 1 i 1
(Yi Y ) 2 .
6
So
1 n
E ( sb2 ) E ( yi y ) 2
n 1 i 1
n
(n 1) E ( sb2 ) E yi2 ny 2
i 1
n
E yi2 nE ( y 2 )
i 1
n
E1 E2 yi2 n Var ( y ) E ( y )
2
i 1
n 1 1 1 1 1 2 2
E1 E2 ( yi2 ) i ) n Sb2 Sw Y
i 1 n N m M n
n
i 1
2
1 1 1 1 1
E1 Var ( yi ) E ( yi n Sb2 S w2 Y 2
n N m M n
n 1 1 1 1 1 1 1
E1 Si2 Yi 2 n Sb2 S w2 Y 2
i 1 m M n N m M n
1 n 1 1 1 1 1 1 1
nE1 Si2 Yi 2 n Sb2 S w2 Y 2
n i 1 m M n N m M n
1 1 1 N
1 N
1 1 1 1 1
n Si2 Yi 2 n Sb2 S w2 Y 2
m M N i 1 N i 1 n N m M n
1 1 1 N
1 1 2 1 1 1 2 2
n S w2
m M N
Y
i 1
i
2
n n N S b m M n S w Y
1 1 2 n N 2 1 1 2
(n 1) S w Yi nY n Sb
2
m M N i 1 n N
1 1 2 n N
2 1 1 2
(n 1) S w Yi NY n Sb
2
m M N i 1 n N
1 1 n 1 1
(n 1) S w2 ( N 1) Sb2 n Sb2
m M N n N
1 1 2
(n 1) S w (n 1) Sb .
2
m M
1 1 2
E ( sb2 ) S w Sb
2
m M
1 1 2
or E sb2 sw S b .
2
m M
7
Thus
( y ) 1 1 1 Sˆ 2 1 1 Sˆ 2
Var b
nm M n N
1 1 1 1 1 1 1 2
sw2 sb2 sw
nm M n N m M
1 1 1 2 1 1 2
sw sb .
N m M n N
( y) 1 1 1 2 1 1 2
Var S w Sb .
nm M n N
It depends on Sb2 , S w2 , n and m. So the cost of survey of units in the two stage sample depends on
n and m.
2 S w2
This variance is monotonic decreasing function of n if Sb 0. The variance is minimum
M
8
S2
If Sb2 w 0 (i.e., intraclass correlation is negative for large N ) , then the variance is a monotonic
M
C0
increasing function of n , It reaches minimum when n assumes the minimum value, i.e., nˆ
kM
(i.e., no subsampling).
We have
S2 S2 S2 k S2
C0 Var ( y ) b k1 Sb2 w k2 S w2 mk2 Sb2 w 1 w .
N M M m
S2
When Sb2 w 0, then
M
2 2
S2 S2 S2 k S2
C0 Var ( y ) b k1 Sb2 w k2 S w2 mk2 Sb2 w 1 w
N
M M m
which is minimum when the second term of right hand side is zero. So we obtain
k1 S w2
mˆ .
k2 2 S w2
Sb
M
C0
nˆ .
k1 k2 mˆ
S2
When Sb2 w 0 then
M
S2 S2 S2 k S2
C0 Var ( y ) b k1 Sb2 w k2 S w2 mk2 Sb2 w 1 w
N M M m
S w2
S w2 S 2
M
k1 1
mˆ 1 .
k2
1 1 1 2 1 1 2
V0 S w Sb
nm M n N
1 1
Sb2 S w2
n m M
S2
V0 b
N
So
2 S w2
Sb kS w2
C kmn km M .
2 2
V Sb V Sb
0 0
N N
S2
If Sb2 w 0, C attains minimum when m assumes the smallest integral value, i.e., 1.
M
S2
If Sb2 w 0 , C attains minimum when mˆ M .
M
10
Case 1: Sampling mn elements in one single stage
The variance of sample mean based on
- mn elements selected by SRSWOR (one stage) is given by
1 1 2
V ( ySRS ) S
mn MN
- two stage sampling is given by
1 1 1 2 1 1 2
V ( yTS ) S w Sb .
nm M n N
N 1 2 Sw
2
Sb
M M ( N 1) Sb NS w
2 2
N 1
; 1 (1)
NM 1 2 ( MN 1) S 2
M 1
S
NM
and using the identity
N M N M N M
( NM 1) S 2 ( N 1) MSb2 N ( M 1) S w2 (2)
1 N M
1 M
where Y
MN
yij , Yi
i 1 j 1 M
y .
j 1
ij
Now we need to find Sb2 and S w2 from (1) and (2) in terms of S 2 . From (1), we have
MN 1 N 1
S w2 MS
2 2
MSb . (3)
N N
N N
( N 1) MSb ( M 1)( N 1) Sb M ( M 1)( MN 1) S
2 2 2
11
N ( M 1) S w2 ( NM 1) S 2 ( N 1) MSb2
( MN 1) S 2
( NM 1) S ( N 1) M 2
2
1 ( M 1)
M ( N 1)
M 1 ( M 1)
( NM 1) S 2
M
( NM 1) S 2 ( M 1)(1 )
MN 1 2
S w2 S (1 ).
MN
Substituting Sb2 and S w2 in Var ( yTS )
MN 1 S m(n 1) N n m M m
2
V ( yTS ) 1 ( M 1) .
MN mn M ( N 1) N 1 M M
m
When subsampling rate is small, MN 1 MN and M 1 M , then
M
S2
V ( ySRS )
mn
S2 N n
V ( yTS ) 1 m 1 .
mn N 1
The relative efficiency of the two stage in relation to one stage sampling of SRSWOR is
Var ( yTS ) N n
RE 1 m 1 .
Var ( ySRS ) N 1
N n N n
If N 1 N and finite population correction is ignorable, then 1, then
N 1 N
RE 1 (m 1).
1M 2 1 2
1 Sb S w
n m M
12
which is approximately
1M 2 2 S w2
1 S for large N and Sb 0.
n m M
MN 1 S 2
where Sb2 1 ( M 1)
M ( N 1) M
MN 1 2
S w2 S (1 )
MN
So smaller the m / M , larger the reduction in the variance of two stage sample over a cluster sample.
S2
When Sb2 w 0 then the subsampling will lead to loss in precision.
M
mi : number of second stage units to be selected from i th first stage unit, if it is in the sample.
n
m0 mi : total number of second stage units in the sample.
i 1
1 mi
yi ( mi )
mi
yj 1
ij
1 Mi
Yi
Mi
yj 1
ij
1 N
Y
N
y
i 1
i YN
N Mi N
y ij M Y i i
1 N
Y
i 1 j 1
N
i 1
u Y i i
M
MN N i 1
i
i 1
Mi
ui
M
1 N
M Mi
N i 1
13
The pictorial scheme of two stage sampling with unequal first stage units case is as follows:
N clusters
Second stage
Cluster Cluster Cluster
… … … sample
m1 units m2 units mn units n clusters
(small)
14
Now we consider different estimators for the estimation of population mean.
1. Estimator based on the first stage unit means in the sample:
1 n
Yˆ yS 2 yi ( mi )
n i 1
Bias:
1 n
E ( yS 2 ) E yi ( mi )
n i 1
1 n
E1 E2 ( yi ( mi ) )
n i 1
1 n
E1 Yi [Since a sample of size mi is selected out of M i units by SRSWOR]
n i 1
1 N
Yi
N i 1
YN
Y.
Bias ( yS 2 ) E ( yS 2 ) Y
1 N
1 N
N
Yi
i 1
M iYi
NM i 1
1 N 1 N N
M iYi Yi M i
NM i 1 N i 1 i 1
1 N
(M i M )(Yi YN ).
NM i 1
This bias can be estimated by
N 1 n
( y )
Bias S2 (M i m)( yi ( mi ) yS 2 )
NM (n 1) i 1
which can be seen as follows:
( y ) N 1 E 1
E2 ( M i m)( yi ( mi ) yS 2 ) / n
n
E Bias
S 2 NM
1
n 1 i 1
N 1 1 n
E
NM n 1 i 1
( M i m)(Yi yn )
1 N
NM
(M
i 1
i M )(Yi YN )
YN Y
1 n
where yn Yi .
n i 1
15
An unbiased estimator of the population mean Y is thus obtained as
N 1 1 n
yS 2 (M i m)( yi ( mi ) yS 2 ) .
NM n 1 i 1
Note that the bias arises due to the inequality of sizes of the first stage units and probability of selection
of second stage units varies from one first stage to another.
Variance:
Var ( yS 2 ) Var E ( yS 2 n) E Var ( yS 2 n)
1 n 1 n
Var yi E 2 Var ( yi ( mi ) i )
n i 1 n i 1
1 1 1 n 1 1 2
Sb2 E 2 Si
n N n i 1 mi M i
1 1 1 N 1 1 2
Sb2
n N
Nn i 1 mi M i
Si
where S 2
b
1 N
Yi YN
N 1 i 1
2
1 Mi
S
i
2
yij Yi .
M i 1 j 1
Estimation of variance:
Consider mean square between cluster means in the sample
1 n
2
sb2 yi ( mi ) yS 2 .
n 1 i 1
It can be shown that
1 N
1 1 2
E ( sb2 ) Sb2 m Si .
N i 1 i Mi
1 mi
Also si2
mi 1 j 1
( yij yi ( mi ) ) 2
1 Mi
E( s ) S
2
i i
2
M i 1 j 1
( yij Yi ) 2
1 n 1 1 2 1 N
1 1 2
So E si m Si .
n i 1 i
m M i N i 1 i Mi
16
Thus
1 n 1 1 2
E ( sb2 ) Sb2 E si
n i 1 i
m M i
and an unbiased estimator of Sb2 is
1 n 1 1 2
Sˆb2 sb2 si .
n i 1 mi M i
So an estimator of the variance can be obtained by replacing Sb2 and Si2 by their unbiased estimators as
( y ) 1 1 Sˆ 2 1
N
1 1 2
Var S2
n N
b
Nn i 1 mi M i
Sˆi .
Bias
1 n
E ( yS* 2 ) E ui yi ( mi )
n i 1
1 n
E ui E2 ( yi ( mi ) i )
n i 1
1 n
E uiYi
n i 1
1 N
N
u Y
i 1
i i
Y.
17
Variance:
Var ( yS* 2 ) Var E ( yS* 2 n) E Var ( yS* 2 n)
1 n 1 n
Var uiYi E 2 ui2Var ( yi ( mi ) i )
n i 1 n i 1
1 1 1 N
1 1 2
Sb*2
n N
u 2
i Si
nN i 1 mi M i
1 Mi
where S 2
M i 1 j 1
i ( yij Yi ) 2
1 N
Sb*2
N 1 j 1
(uiYi Y ) 2 .
M y i i ( mi )
Yˆ yS**2 i 1
n
M
i 1
i
u y i i ( mi )
i 1
n
u
i 1
i
yS* 2
un
Mi 1 n
where ui , un ui .
M n i 1
This estimator can be seen as if arising by the ratio method of estimation as follows:
Let yi* ui yi ( mi )
Mi
xi* , i 1, 2,..., N
M
be the values of study variable and auxiliary variable in reference to the ratio method of estimation. Then
1 n *
y yi yS* 2
*
n i 1
1 n *
x* xi un
n i 1
1 N
X*
N
X
i 1
*
i 1.
18
The corresponding ratio estimator of Y is
y* y*
YˆR X * S 2 1 yS**2 .
x* un
So the bias and mean squared error of yS**2 can be obtained directly from the results of ratio estimator.
Recall that in ratio method of estimation, the bias and MSE of the ratio estimator upto second order of
approximation
is
N n
Bias ( yˆ R ) Y (Cx2 2 Cx C y )
Nn
Var ( x ) Cov( x , y )
Y 2
X XY
MSE (YˆR ) Var ( y ) R 2Var ( x ) 2 RCov( x , y )
Y
where R .
X
Bias:
The bias of yS**2 up to second order of approximation is
1 n 1 n 1 n 1 n
Cov( xS*2 , yS* 2 ) Cov E ui xi ( mi ) , ui yi ( mi ) E Cov ui xi ( mi ) , ui yi ( mi )
n i 1 n i 1 n i 1 n i 1
1 n
1 n
1 n
Cov ui E ( xi ( mi ) ), ui E ( yi ( mi ) ) E 2 ui2Cov( xi ( mi ) , yi ( mi ) ) i
n i 1 n i 1 n i 1
1 n 1 n 1 n
1 1
Cov ui X i , uiYi E 2 u 2
i Sixy
n i 1 n i 1 n i 1 mi M i
1 1 * 1 N
1 1
Sbxy
n N
u 2
i Sixy
nN i 1 mi M i
where
1 N
*
Sbxy (ui X i X )(uiYi Y )
N 1 i 1
1 Mi
Sixy ( xij X i )( yij Yi ).
M i 1 j 1
19
Similarly, Var ( xS*2 ) can be obtained by replacing x in place of y in Cov( xS*2 , yS* 2 ) as
1 1 1 N
1 1 2
Var ( xS*2 ) Sbx*2
n N
u 2
i Six
nN i 1 mi M i
1 N
where Sbx*2 (ui X i X )2
N 1 i 1
1 Mi
Six*2
M i 1 i 1
( xij X i ) 2 .
Substituting Cov( xS*2 , yS* 2 ) and Var ( xS*2 ) in Bias ( yS**2 ), we obtain the approximate bias as
1 1 Sbx*2 Sbxy
*
1 N 2 1 1 Six2 Sixy
Bias ( y ) Y 2
**
S2
XY nN
ui 2 .
n N X mi M i X
i 1 XY
1 1 1 1 1 2
MSE ( yS**2 ) Sby*2 2 R* Sbxy R*2 Sbx*2 Siy 2 R Sixy R Six .
N
n N
*
u 2
i
* *2 2
nN i 1 mi M i
Also
1 1 1 N 2 1 1 1 2
ui Yi R* X i Siy 2 R Sixy R Six .
N
u
2
MSE ( y )
**
S2
2
* *2 2
n N N 1 i 1
i
nN i 1 mi M i
20
Estimate of variance
Consider
1 n
*
sbxy ui yi ( mi ) yS* 2 ui xi ( mi ) xS*2
n 1 i 1
1 n
sixy xij xi ( mi ) yij yi ( mi ) .
mi 1 j 1
It can be shown that
1 1 1
E sbxy Sbxy*
N
*
u 2
i Sixy
N i 1 mi M i
E ( sixy ) Sixy .
So
1 n 1 1 1 N 1 1
E ui2 sixy u 2
i Sixy .
n i 1 mi M i N i 1 mi M i
Thus
1 n 1 1
Sˆbxy
*
sbxy
*
ui2 sixy
n i 1 mi M i
1 n 1 1 2
Sˆbx*2 sbx
*2
ui2 six
n i 1 mi M i
1 n 1 1 2
Sˆby*2 sby
*2
ui2 siy .
n i 1 mi M i
Also
1 n 1 1 2 1 N 1 1 2
E ui2 six u 2
i Six
n i 1 mi M i N i 1 mi M i
1 n 1 1 2 1 N 1 1 2
E ui2 siy u 2
i Siy .
n i 1 mi M i N i 1 mi M i
A consistent estimator of MSE of yS**2 can be obtained by substituting the unbiased estimators of
( y ** ) 1 1 s*2 2r * s* r *2 s*2
MSE S2 by bxy bx
n N
1 n 2 1 1 2
ui siy 2r sixy r six
* *2 2
nN i 1 mi M i
1 1 1 n
yi ( mi ) r * xi ( mi )
2
n N n 1 i 1
1 n 2 1 1 2 yS* 2
ui siy 2r sixy r six where r * .
* *2 2 *
nN i 1 mi M i xS 2
21