0% found this document useful (0 votes)
40 views5 pages

Chapter 3 - 2012

Uploaded by

Admasu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views5 pages

Chapter 3 - 2012

Uploaded by

Admasu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Chapter 3: Sampling Proportions

3.1 Basic Concepts

In some cases the nature of the survey may require recording of the attributes, which can
be expressed qualitatively. The qualitative information can be quantified by counting the
attribute characteristics. These characteristics could be of various forms, such as living in
urban or rural area, being a male or a female, married or unmarried, literate or illiterate,
adults between 18 and 45 years or adults over 45 years, etc.

The main interest for such attributes could be to estimate the total number of units and the
proportion of units in the population possessing some characteristics. Attributes can be
changed into quantifiable information by allocating the score “1” or “0”, while
measurable variables can also be changed into attributes by categorizing the population
into different groups.

It is worth presenting the special simple form that the variance of a proportion takes when
the design is simple random sampling. The following discussion will consider a
population classified into two-category, in which each member of the population is
classified as either having or not having a specified characteristics of interest.

Notation for Population:

N= Total number of units in the population


A= total number of units in specified category C
P = A/N is population proportion in C, i.e. the proportion (Percentage) of the entire
population that has a specified value.
Q = 1-P is proportion of units not in C

Notation for Sample:

n = Total number of members in the sample


a = the total number of members sampled that have the specified attribute,
p = a/n, sample proportion, i.e., the proportion (percentage) of a sample from the
population that has the specified attribute.
q = 1-p, proportion of sample members not in C.

3.2 Variances and Standard Errors of the Estimates of the Population Proportion

For any unit in the population or in the sample, we define an observation (variable) yi as
follows to facilitate counting.
yi = 1, if the unit is in C
0, if the Unit is not in C
N

N Y i
A NPQ
For population, Y  Y i  A, Y  i 1
  P , and S 2  ,
i 1 N N N 1

1
n

n y a i
n pq
and for sample y  y i  a, y  i 1
 p , s2 
 (verify).
i 1 n n n 1
Similar to a continuous case, a sample proportion, p, can be used to make inferences
about a population proportion P. Just like the sample mean y , the sample proportion p is
also a random variable that depends on what members of the population are included in
that sample.

Theorem 5:

The sample proportion p = a/n is an unbiased estimate of the population proportion


P = A/N, i.e., E  p   P . Prove this theorem.

Theorem 6:

The variance of the sample proportion or percentage (p) is given by


PQ  N  n 
Var  p  = E(p-P)2 =   . Prove this theorem.
n  N 1 

Corollary: i) The estimated total number of units in class C, Aˆ  Np , is an unbiased


estimate of A.
ii) The variance of  = N p, the estimated total number of units in class C, is
N 2 PQ  N  n 
Var( Â ) =  .
n  N 1 

3.3 Estimation of the standard error from a sample

Theorem 7:
pq  N  n  pq
An unbiased estimate the sample variance will be var  p     1  f 
n 1 N  n 1
If N is large relative to n, the finite population correction (1-f) is negligible and the
pq
variance of p is var  p   . (Prove this theorem)
n 1

Corollary: The sample variance of estimated total number of members in specified


 N ( N  n)
category, A  N p , is given by var ( Â ) = pq. In each case we can get the
n 1
standard error by taking the square root of the variances.

Example: See Cochran 3rd edition page 52

2
3.4 Confidence Limits

For the proportion estimate the confidence limits can be obtained by: P = p  Z a S.E
2

 pfor large sample size and substitute S.E.(p) by s.e.(p) to get the confidence interval,
P = p  Z a s.e.  p  . A slight improvement can be achieved by applying continuity
2
correction for normal approximation to binomial, i.e.
P = p  Z a s.e.  p  + 1/2n.
2

3.5 Relative Error

For proportion (p), we can write the coefficient of variation as


PQ ( N  n)
n ( N  1) Q ( N  n)
CV ( p )   ,
P nP ( N  1)
Q
which is approximately equal to if finite population correction (1-f) is ignored. Its
nP
pq  N  n 
s.e( p ) (n  1) N q
estimate is given as: cv(p) = =  (1  f )
p p (n  1) p

S .E.(ˆ)
Generally, the coefficient of variation of an estimator ˆ is given by CV (ˆ)  and
E (ˆ)
Var (ˆ)
its square is known as rel-variance, i.e., CV 2 (ˆ)  .
( E (ˆ)) 2

3.6 Sample Size Determination for Proportion

The sample size required for estimation population proportion P  can be obtained in a
similar way and have similar forms to those shown above for the mean. Assume that the
proportion estimate p is normally distributed with absolute margin of error d  p  P or
Z 2 PQ d 2
relative error d   P , the sample size n can be calculated by n 
1  1 N   Z 2 PQ Nd 2
no
(verify this). If we put no  Z 2 PQ d 2 , then we get n  . For large
1  1 N   no N

3
no
population size N  we have the sample size n  , and we can approximate n
1  no N
by n o as we have done for the mean.
Z 2 PQ Z 2Q
Using the relative error () and the relation d =  P, we set no  
d2 P 2

3.7 Estimates of the population parameters in sample size determination


(continuous and proportion)

 
In Practice, the population parameters S 2 ,  y , P must be estimated and the other factors
Z and , usually set by the investigator (researcher). The relation shows the following
summary points.

 The smaller we make , the greater will be the sample size n.


 If the degree of confidence (1) increases, then certainty and sample size
increases.
 Since population parameters are unknown, calculate no by using the sample
estimates. That is,
2 2
Z 2 s y  Z cv ( y )
2
2 2
no  Z pq d or no  
2 2

How do we get estimates of the population parameters in order to use these estimates in
sample size determination? In actual practice, there are four possible ways of estimating
the parameters.

 By taking a simple random sample of size n1 , small preliminary sample, from


which s12 or p1 of S 2 or P and the required n will be obtained. This method
gives the most reliable estimates, but slows up the completion of the survey and
because of this it is not often used.
 By using the results of pilot survey: To design efficiently a large sample in an
unknown field, a pilot study may be conducted prior to the survey to gain
information for designing the survey which also serves many other purposes.
 By using previous surveys results: We should search for data from past surveys of
similar variables and make use of it after adjusting for time changes.
 By guesswork about the nature of the population: This requires educated guesses
or the services of experts such as survey statisticians, supported by specialists in
the subject matter concerned who may construct a model of the population
distribution, its shape, and its probable limits, and deduce S 2 or P from it.

Reading Assignment: Read Cochran 3rd ed., chapter 4, section 4.7, page 78-81.

Examples:

4
1. A teacher training institutes are interested in estimating the proportion (P) of teachers
who consider semester system to be more suitable as compared to the 3-term system of
education. A SRS of n =120 teachers is taken from a total N =1200 teachers, without
replacement. Some of the teachers are in favor of two semesters while others are not and
it is found that 72 teachers are in favor of semester system.
i) Estimate the proportion P along with the standard error of your estimate.

ii) Calculate the 95% confidence interval for P

iii) Do you think the sample size 120 is sufficient if the tolerable error could be 0.08? If
not, how many more units should be included in the sample?
Solution: n= 120, a= 72 , N= 1200,

i) P = a/n = 72/120 = 0.6


pq  n  0. 6 x 0. 4  120 
var( p )  1    1    0.001815  s.e( p )  0.0426
n  1  N  119  1200 
ii) 95% confidence limits:
P  p  z s.e( p )  0.6  1.96 x0.0426  0.6  0.0835  P  (0.5165, 0.6835)
2
Therefore the proportion of teachers in the institutes favoring semester system is likely to
be between 51% and 68%. If the estimate of total number of teachers who are in favor of
two-semester system is required, then it can be computed as:
 =N p =720.
2
Z 2 pq 1.96  x 0.6 x 0.4 n 144
iii ) no  2
 2
 144, 0   0.12  5%
d 0.08 N 1200
n0 144 144
n    128.57  129
n0 144 1.12
1 1
N 1200
Therefore 120 is not sufficient for achieving the given precision meaning 9 more teachers
need to be selected.

You might also like