Chapter3 Sampling Proportions Percentages
Chapter3 Sampling Proportions Percentages
Sampling procedure:
The same sampling procedures that are used for drawing a sample in case of quantitative
characteristics can also be used for drawing a sample for qualitative characteristics. So, the
sampling procedures remain the same irrespective of the nature of the characteristic under study
- either qualitative or quantitative. For example, the SRSWOR and SRSWR procedures for
drawing the samples remain the same for qualitative and quantitative characteristics. Similarly,
other sampling schemes like stratified sampling, two-stage sampling, etc. also remain the same.
Consider a qualitative characteristic based on which the population can be divided into two
mutually exclusive classes, say C and C*. For example, if C is the part of the population of
persons saying ‘yes’ or ‘agreeing’ with the proposal, then C* is the part of the population of
persons saying ‘no’ or ‘disagreeing’ with the proposal. Let A be the number of units in C and
( N − A) units in C* be in a population of size N. Then the proportion of units in C is
A
P=
N
and the proportion of units in C* is
N−A
Q= = 1 − P.
N
Y i
A
Y = i =1
= = P.
N N
Suppose a sample of size n is drawn from a population of size N by simple random sampling.
Let a be the number of units in the sample which fall into class C and (n − a ) units fall in class
C*, then the sample proportion of units in C is
a
p= ,
n
which can be written as
n
a y i
p= = i =1
= y.
n n
N
Since Y
i =1
i
2
= A = NP, so we can write S 2 and s 2 in terms of P and Q as follows:
1 N
S2 =
N − 1 i =1
(Yi − Y ) 2
N
1
= ( Yi 2 − NY 2 )
N − 1 i =1
1
= ( NP − NP 2 )
N −1
N
= PQ.
N −1
1 n
s2 =
n − 1 i =1
( yi − y ) 2
n
1
= ( yi2 − ny 2 )
n − 1 i =1
1
= (np − np 2 )
n −1
n
= pq.
n −1
Note that the quantities y , Y , s 2 and S 2 have been expressed as functions of sample and
population proportions. Since the sample has been drawn by simple random sampling and the
sample proportion is the same as the sample mean, the properties of sample proportion in
SRSWOR and SRSWR can be derived using the properties of the sample mean directly.
1. SRSWOR
Since sample mean y is an unbiased estimator of the population mean Y , i.e. E ( y ) = Y in the
case of SRSWOR, so
E ( p) = E ( y ) = Y = P ,
and p is an unbiased estimator of P.
N −n 2
Var ( p) = Var ( y ) = S
Nn
N −n N
= . PQ
Nn N − 1
N − n PQ
= . .
N −1 n
Similarly, using the estimate of Var ( y ), the estimate of variance can be derived as
N −n 2
Var ( p ) = Var ( y ) = s
Nn
N −n n
= pq
Nn n − 1
N −n
= pq.
N (n − 1)
p − Z Var ( p), p + Z Var ( p) .
2 2
Sampling Theory| Chapter 3 | Sampling for Proportions | Shalabh, IIT Kanpur
Page 4
It may be noted that in this case, a discrete random variable is being approximated by a
continuous random variable, so a continuity correction 1/2n for normal approximation can be
applied and the confidence limits become
1 1
p − Z Var ( p) + , p + Z Var ( p) −
2
2n 2
2n
Consider a situation in which the sampling units in a population are divided into two mutually
exclusive classes. Let P and Q be the proportions of sampling units in the population belonging
to classes ‘1’ and ‘2’, respectively. Then NP and NQ are the total number of sampling units in
the population belonging to class ‘1’ and ‘2’, respectively, and so NP + NQ = N. The
probability that in a sample of n selected units out of N units by SRS such that n1 selected units
belong to class ‘1’ and n2 selected units belong to class ‘2’ is governed by the hypergeometric
distribution and
NP NQ
P(n1 ) = 1 2 .
n n
N
n
As N grows large, the hypergeometric distribution tends to Binomial distribution, and P ( n1 ) is
approximated by
n
P(n1 ) = p n1 (1 − p) n2
n1
Inverse sampling
In general, it is understood in the SRS methodology for a qualitative characteristic that the
attribute under study is not a rare attribute. If the attribute is rare, then the procedure of
estimating the population proportion P by sample proportion n / N is not suitable. Some such
situations are, e.g., estimation of the frequency of the rare type of genes, the proportion of some
rare type of cancer cells in a biopsy, the proportion of the rare type of blood cells affecting the
red blood cells, etc. In such cases, the methodology of inverse sampling can be used.
Sampling Theory| Chapter 3 | Sampling for Proportions | Shalabh, IIT Kanpur
Page 5
In the methodology of inverse sampling, the sampling is continued until a predetermined
number of units possessing the attribute under study occur in the sampling, which is useful for
estimating the population proportion. The sampling units are drawn one by one with equal
probability and without replacement. The sampling is discontinued as soon as the number of
units in the sample possessing the characteristic or attribute equals a predetermined number.
Let m denotes the predetermined number indicating the number of units possessing the
characteristic. The sampling is continued till m number of units are obtained. Therefore, the
sample size n required to attain m becomes a random variable.
NP NQ
m − 1 m − 1 n − m Np − m + 1
m + NQ
=
n=m
n −1
N
.
N − n +1
n − 1
NP − 1 NQ
NP − m + 1 m − 2 n − m
m + NQ −1
=
n=m
N − n +1
N − 1
n−2
which is obtained by replacing NP by NP – 1, m by (m – 1) and n by (n - 1) in the earlier step.
Thus
m −1
E = P.
n −1
m −1
So Pˆ = is an unbiased estimator of P.
n −1
Estimate of variance of P̂
Now we derive an estimate of the variance of P̂ . By definition
2
Var ( Pˆ ) = E ( Pˆ 2 ) − E ( Pˆ )
=E ( Pˆ 2 ) − P 2 .
Thus
Var ( Pˆ ) = Pˆ 2 − Estimate of P2 .
(m − 1)(m − 2)
In order to obtain an estimate of P 2 , consider the expectation of , i.e.,
(n − 1)(n − 2)
(m − 1)(m − 2) (m − 1)(m − 2)
E = P ( n )
(n − 1)(n − 2) n m (n − 1)(n − 2)
NP − 2 NQ
P( NP − 1) NP − m + 1 m − 3 n − m
=
N − 1 nm N − n + 1
N − 2
n−3
N − 1 (m − 1)(m − 2) Pˆ
Estimate of P 2 = +
N (n − 1)(n − 2) N
N − 1 (m − 1)(m − 2) 1 m − 1
= + . .
N (n − 1)(n − 2) N n − 1
Var ( Pˆ ) = Pˆ 2 − Estimate of P 2
m − 1 N − 1 (m − 1)(m − 2) 1 m − 1
2
= − . +
n − 1 N (n − 1)(n − 2) N n − 1
m − 1 m − 1 1 ( N − 1)(m − 2)
= + 1 − .
n − 1 n − 1 N n−2
For large N , the hypergeometric distribution tends to negative Binomial distribution with a
n − 1 m−1 n−m
probability density function P Q . So
m − 1
m −1
Pˆ =
n −1
and
(m − 1)(n − m) Pˆ (1 − Pˆ )
Var ( Pˆ ) = = .
(n − 1)2 (n − 2) n−2
C1 C C
P1 = , P2 = 2 ,..., Pk = k , be the proportions of units in the classes C1 , C2 ,..., Ck
N N n
respectively.
Let a sample of size n is observed such that c1 , c2 ,..., ck number of units have been drawn from
C1 C2 Ck
...
P(c1 , c2 ,..., ck ) = 1 2 k .
c c c
N
n
ci
The population proportions Pi can be estimated by pi = , i = 1, 2,..., k .
n
N −1 n
and
N − n pi qi
Var ( pi ) =
N n −1
and
Var (Cˆi ) = N 2Var ( pi ).
The confidence intervals can be obtained based on a single pi as in the case of two classes.
distribution given by
n!
P(c1 , c2 ,..., ck ) = P1c1 P2c2 ...Pkck .
c1 !c2 !...ck !