0% found this document useful (0 votes)
9 views10 pages

Chapter3 Sampling Proportions Percentages

Chapter 3 discusses sampling methods for estimating proportions and percentages in qualitative research, emphasizing that sampling procedures are consistent regardless of the characteristic type. It covers estimation techniques for population proportions, variance calculations, and the application of hypergeometric distribution for sampling. Additionally, it introduces inverse sampling for rare attributes and provides methods for estimating population totals and confidence intervals.

Uploaded by

rishu maurya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views10 pages

Chapter3 Sampling Proportions Percentages

Chapter 3 discusses sampling methods for estimating proportions and percentages in qualitative research, emphasizing that sampling procedures are consistent regardless of the characteristic type. It covers estimation techniques for population proportions, variance calculations, and the application of hypergeometric distribution for sampling. Additionally, it introduces inverse sampling for rare attributes and provides methods for estimating population totals and confidence intervals.

Uploaded by

rishu maurya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Chapter 3

Sampling For Proportions and Percentages


In many situations, the characteristic under study on which the observations are collected is
qualitative in nature. For example, customers' responses in many marketing surveys are based
on replies like ‘yes’ or ‘no’, ‘agree’ or ‘disagree’ etc. Sometimes, the respondents are asked to
arrange several options in the order like the first choice, second choice, etc. Sometimes, the
objective of the survey is to estimate the population proportion or the percentage of brown-eyed
persons, unemployed persons, graduate persons, or persons favoring a proposal, etc. In such
situations, the first question arises: how to do the sampling and secondly, how to estimate the
population parameters like population mean, population variance, etc.

Sampling procedure:
The same sampling procedures that are used for drawing a sample in case of quantitative
characteristics can also be used for drawing a sample for qualitative characteristics. So, the
sampling procedures remain the same irrespective of the nature of the characteristic under study
- either qualitative or quantitative. For example, the SRSWOR and SRSWR procedures for
drawing the samples remain the same for qualitative and quantitative characteristics. Similarly,
other sampling schemes like stratified sampling, two-stage sampling, etc. also remain the same.

Estimation of population proportion:


The population proportion in the case of qualitative characteristics can be estimated similarly as
the estimation of the population mean in the case of quantitative characteristics.

Consider a qualitative characteristic based on which the population can be divided into two
mutually exclusive classes, say C and C*. For example, if C is the part of the population of
persons saying ‘yes’ or ‘agreeing’ with the proposal, then C* is the part of the population of
persons saying ‘no’ or ‘disagreeing’ with the proposal. Let A be the number of units in C and
( N − A) units in C* be in a population of size N. Then the proportion of units in C is
A
P=
N
and the proportion of units in C* is
N−A
Q= = 1 − P.
N

Sampling Theory| Chapter 3 | Sampling for Proportions | Shalabh, IIT Kanpur


Page 1
An indicator variable Y can be associated with the characteristic under study, and then for i =
1,2,…,N

1 i th unit belongs to C
Yi = 

0 i th unit belongs to C *.

Now the population total is


N
YTOTAL = Yi = A
i =1

and the population mean is


N

Y i
A
Y = i =1
= = P.
N N

Suppose a sample of size n is drawn from a population of size N by simple random sampling.

Let a be the number of units in the sample which fall into class C and (n − a ) units fall in class
C*, then the sample proportion of units in C is
a
p= ,
n
which can be written as
n

a y i
p= = i =1
= y.
n n
N
Since Y
i =1
i
2
= A = NP, so we can write S 2 and s 2 in terms of P and Q as follows:

1 N
S2 = 
N − 1 i =1
(Yi − Y ) 2
N
1
= ( Yi 2 − NY 2 )
N − 1 i =1
1
= ( NP − NP 2 )
N −1
N
= PQ.
N −1

Sampling Theory| Chapter 3 | Sampling for Proportions | Shalabh, IIT Kanpur


Page 2
n
Similarly, y
i =1
2
i = a = np and

1 n
s2 = 
n − 1 i =1
( yi − y ) 2
n
1
= ( yi2 − ny 2 )
n − 1 i =1
1
= (np − np 2 )
n −1
n
= pq.
n −1
Note that the quantities y , Y , s 2 and S 2 have been expressed as functions of sample and
population proportions. Since the sample has been drawn by simple random sampling and the
sample proportion is the same as the sample mean, the properties of sample proportion in
SRSWOR and SRSWR can be derived using the properties of the sample mean directly.

1. SRSWOR
Since sample mean y is an unbiased estimator of the population mean Y , i.e. E ( y ) = Y in the
case of SRSWOR, so
E ( p) = E ( y ) = Y = P ,
and p is an unbiased estimator of P.

Using the expression of Var ( y ), the variance of p can be derived as

N −n 2
Var ( p) = Var ( y ) = S
Nn
N −n N
= . PQ
Nn N − 1
N − n PQ
= . .
N −1 n

Similarly, using the estimate of Var ( y ), the estimate of variance can be derived as
N −n 2
Var ( p ) = Var ( y ) = s
Nn
N −n n
= pq
Nn n − 1
N −n
= pq.
N (n − 1)

Sampling Theory| Chapter 3 | Sampling for Proportions | Shalabh, IIT Kanpur


Page 3
(ii) SRSWR
Since the sample mean y is an unbiased estimator of the population mean Y in case of
SRSWR, so the sample proportion,
E ( p ) = E ( y ) = Y = P,
i.e., p is an unbiased estimator of P.
Using the expression of the variance of y and its estimate in the case of SRSWR, the variance
of p and its estimate can be derived as follows:
N −1 2
Var ( p ) = Var ( y ) = S
Nn
N −1 N
= PQ
Nn N − 1
PQ
=
n
n pq
Var ( p ) = .
n −1 n
pq
= .
n −1

Estimation of population total or total number of count


It is easy to see that an estimate of population total A (or total number of counts) is
Na
Aˆ = Np = ,
n
its variance is
Var( Aˆ ) = N 2 Var( p)
and the estimate of its variance is

Var ( Aˆ ) = N 2 Var ( p).

Confidence interval estimation of P


p−P
If N and n are large, then approximately follows N(0,1). With this approximation, we
Var ( p)
can write
 p−P 
P −Z    Z   = 1−
 2 Var ( p ) 2 

and the 100(1 −  )% confidence interval of P is

 
 p − Z  Var ( p), p + Z  Var ( p)  .
 2 2 
Sampling Theory| Chapter 3 | Sampling for Proportions | Shalabh, IIT Kanpur
Page 4
It may be noted that in this case, a discrete random variable is being approximated by a
continuous random variable, so a continuity correction 1/2n for normal approximation can be
applied and the confidence limits become
 1 1 
 p − Z  Var ( p) + , p + Z  Var ( p) − 
 2
2n 2
2n 

Use of Hypergeometric distribution :


When SRS is applied for the sampling of a qualitative characteristic, the methodology is to
draw the units one-by-one, and so the probability of selection of every unit remains the same at
every step. If n sampling units are selected together from N units, then the probability of
selection of units does not remain the same as in the case of SRS.

Consider a situation in which the sampling units in a population are divided into two mutually
exclusive classes. Let P and Q be the proportions of sampling units in the population belonging
to classes ‘1’ and ‘2’, respectively. Then NP and NQ are the total number of sampling units in
the population belonging to class ‘1’ and ‘2’, respectively, and so NP + NQ = N. The
probability that in a sample of n selected units out of N units by SRS such that n1 selected units

belong to class ‘1’ and n2 selected units belong to class ‘2’ is governed by the hypergeometric

distribution and
 NP  NQ 
  
P(n1 ) =  1  2  .
n n
N
 
n
As N grows large, the hypergeometric distribution tends to Binomial distribution, and P ( n1 ) is

approximated by
n
P(n1 ) =   p n1 (1 − p) n2
 n1 

Inverse sampling
In general, it is understood in the SRS methodology for a qualitative characteristic that the
attribute under study is not a rare attribute. If the attribute is rare, then the procedure of
estimating the population proportion P by sample proportion n / N is not suitable. Some such
situations are, e.g., estimation of the frequency of the rare type of genes, the proportion of some
rare type of cancer cells in a biopsy, the proportion of the rare type of blood cells affecting the
red blood cells, etc. In such cases, the methodology of inverse sampling can be used.
Sampling Theory| Chapter 3 | Sampling for Proportions | Shalabh, IIT Kanpur
Page 5
In the methodology of inverse sampling, the sampling is continued until a predetermined
number of units possessing the attribute under study occur in the sampling, which is useful for
estimating the population proportion. The sampling units are drawn one by one with equal
probability and without replacement. The sampling is discontinued as soon as the number of
units in the sample possessing the characteristic or attribute equals a predetermined number.

Let m denotes the predetermined number indicating the number of units possessing the
characteristic. The sampling is continued till m number of units are obtained. Therefore, the
sample size n required to attain m becomes a random variable.

Probability distribution function of n


In order to find the probability distribution function of n, consider the stage of drawing of
samples t such that at t = n, the sample size n completes the m units with attribute. Thus, the
first (t - 1) draws would contain (m - 1) units in the sample that possess the characteristics of NP
units. Equivalently, there are (t - m) units that do not possess the characteristic out of NQ such
units in the population. Note that the last draw must ensure that the units selected possess the
characteristic.

So, the probability distribution function of n can be expressed as

 In a sample of ( n -1) units   The unit drawn at 


   
P(n) = P  drawn from N , (m -1) units   P  the nth draw will 
 will possess the attribute   possess the attribute 
   
  NP  NQ  
  
m − 1 n − m    NP − m + 1 
=  , n = m, m + 1,..., m + NQ.
  N    N − n + 1 
   
  n − 1 
Note that the first term (in square brackets) is derived using hypergeometric distribution as the
probability for deriving a sample of size (n – 1) in which (m – 1) units are from NP units and (n
NP − m + 1
– m) units are from NQ units. The second term is the probability associated with
N − n +1
the last draw, where it is assumed that we get the unit possessing the characteristic.
m + NQ
Note that  P(n) = 1.
n=m

Sampling Theory| Chapter 3 | Sampling for Proportions | Shalabh, IIT Kanpur


Page 6
Estimate of population proportion
m −1
Consider the expectation of .
n −1
m + NQ
 m −1   m −1 
E =
 n −1 
  n − 1  P(n)
n=m

 NP  NQ 
  
 m − 1   m − 1 n − m  Np − m + 1
m + NQ
= 
n=m

 n −1 

 N 
.
N − n +1
 
 n − 1
 NP − 1 NQ 
  
 NP − m + 1   m − 2  n − m 
m + NQ −1
= 
n=m

 N − n +1 

 N − 1
 
n−2
which is obtained by replacing NP by NP – 1, m by (m – 1) and n by (n - 1) in the earlier step.
Thus
 m −1 
E  = P.
 n −1 
m −1
So Pˆ = is an unbiased estimator of P.
n −1

Estimate of variance of P̂
Now we derive an estimate of the variance of P̂ . By definition
2
Var ( Pˆ ) = E ( Pˆ 2 ) −  E ( Pˆ ) 

=E ( Pˆ 2 ) − P 2 .
Thus

Var ( Pˆ ) = Pˆ 2 − Estimate of P2 .
(m − 1)(m − 2)
In order to obtain an estimate of P 2 , consider the expectation of , i.e.,
(n − 1)(n − 2)

 (m − 1)(m − 2)   (m − 1)(m − 2) 
E  =  P ( n )
 (n − 1)(n − 2)  n m  (n − 1)(n − 2) 
  NP − 2  NQ  
  
P( NP − 1)  NP − m + 1    m − 3  n − m  
= 
N − 1 nm  N − n + 1  

 N − 2 
   
  n−3  

Sampling Theory| Chapter 3 | Sampling for Proportions | Shalabh, IIT Kanpur


Page 7
where the last term inside the square bracket is obtained by replacing NP by ( NP − 2), N by
(n − 2) and m by (m - 2) in the probability distribution function of the hypergeometric
distribution. This solves further to
 (m − 1)(m − 2)  NP 2 P
E  = − .
 (n − 1)(n − 2)  N − 1 N − 1
Thus, an unbiased estimate of P 2 is

 N − 1  (m − 1)(m − 2) Pˆ
Estimate of P 2 =   +
 N  (n − 1)(n − 2) N
 N − 1  (m − 1)(m − 2) 1 m − 1
=  + . .
 N  (n − 1)(n − 2) N n − 1

Finally, an estimate of the variance of P̂ is

Var ( Pˆ ) = Pˆ 2 − Estimate of P 2
 m − 1   N − 1 (m − 1)(m − 2) 1  m − 1  
2

=  − . +  
 n − 1   N (n − 1)(n − 2) N  n − 1  
 m − 1   m − 1  1  ( N − 1)(m − 2)  
=    + 1 −  .
 n − 1   n − 1  N  n−2 

For large N , the hypergeometric distribution tends to negative Binomial distribution with a

 n − 1  m−1 n−m
probability density function   P Q . So
 m − 1
m −1
Pˆ =
n −1
and

(m − 1)(n − m) Pˆ (1 − Pˆ )
Var ( Pˆ ) = = .
(n − 1)2 (n − 2) n−2

Estimation of proportion for more than two classes


We have assumed up to now that there are only two classes where the population can be divided
based on a qualitative characteristic. There can be situations when the population is to be
divided into more than two classes. For example, the taste of coffee can be divided into four
categories: very strong, strong, mild, and very mild. Similarly, in another example, the damage
to crops due to the storm can be classified into categories like heavily damaged, damaged,
minor damage, and no damage, etc.

Sampling Theory| Chapter 3 | Sampling for Proportions | Shalabh, IIT Kanpur


Page 8
These type of situations can be represented by dividing the population of size N into, say k,
mutually exclusive classes C1 , C2 ,..., Ck . Corresponding to these classes, let

C1 C C
P1 = , P2 = 2 ,..., Pk = k , be the proportions of units in the classes C1 , C2 ,..., Ck
N N n
respectively.

Let a sample of size n is observed such that c1 , c2 ,..., ck number of units have been drawn from

C1 , C2 ,..., Ck . respectively. Then the probability of observing c1 , c2 ,..., ck is

 C1  C2   Ck 
   ...  
P(c1 , c2 ,..., ck ) =  1  2   k  .
c c c
N
 
n

ci
The population proportions Pi can be estimated by pi = , i = 1, 2,..., k .
n

It can be easily shown that


E ( pi ) = Pi , i = 1, 2,..., k ,
N − n PQ
Var ( pi ) = i i

N −1 n
and
N − n pi qi
Var ( pi ) =
N n −1

For estimating the number of units in the ith class,


Cˆi = Npi
Var (Cˆ ) = N 2Var ( p )
i i

and
Var (Cˆi ) = N 2Var ( pi ).

The confidence intervals can be obtained based on a single pi as in the case of two classes.

If N is large, then the probability of observing c1 , c2 ,..., ck can be approximated by multinomial

distribution given by
n!
P(c1 , c2 ,..., ck ) = P1c1 P2c2 ...Pkck .
c1 !c2 !...ck !

Sampling Theory| Chapter 3 | Sampling for Proportions | Shalabh, IIT Kanpur


Page 9
For this distribution
E ( pi ) = Pi , i = 1, 2,.., k ,
Pi (1 − Pi )
Var ( pi ) =
n
and
pi (1 − pi )
Var ( pˆ i ) = .
n

Sampling Theory| Chapter 3 | Sampling for Proportions | Shalabh, IIT Kanpur


Page 10

You might also like