0% found this document useful (0 votes)
6 views27 pages

SP Sampling Lect 8

The document provides an introduction to sampling theory, focusing on simple random sampling and the estimation of confidence limits for the population mean under known and unknown variance. It discusses methods for estimating population totals, determining sample sizes based on various criteria, and the implications of sample size on variance and cost. Additionally, it outlines different approaches for pre-specified variance, estimation error, confidence interval width, coefficient of variation, relative error, and cost considerations.

Uploaded by

rishu maurya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views27 pages

SP Sampling Lect 8

The document provides an introduction to sampling theory, focusing on simple random sampling and the estimation of confidence limits for the population mean under known and unknown variance. It discusses methods for estimating population totals, determining sample sizes based on various criteria, and the implications of sample size on variance and cost. Additionally, it outlines different approaches for pre-specified variance, estimation error, confidence interval width, coefficient of variation, relative error, and cost considerations.

Uploaded by

rishu maurya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Introduction to Sampling Theory

Lecture 8
Simple Random Sampling

Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

Slides can be downloaded from


http://home.iitk.ac.in/~shalab/sp
1
Estimation of Confidence limits for the population mean:
known
Assume that the population is normally distributed N(𝒀,σ2) with
mean (𝒀) and variance (σ2).

When 𝝈𝟐 is known, then the 𝟏𝟎𝟎 𝟏 𝜶)% confidence interval is


 
 y  Z  Var ( y )  Y  y  Z  Var ( y ) 
 2 2 

where Z/2 denotes the upper (𝜶/2)% points of N(µ,σ2).

2
Estimation of Confidence limits for the population mean:
unknown
When 𝝈𝟐 is unknown, then the 𝟏𝟎𝟎 𝟏 𝜶)% confidence
interval is
   
 y  t Var ( y )  Y  y  t Var ( y ) 
 2 2 
where t denotes the upper (/2)% points of t distribution with
(n – 1) degrees of freedom.

3
Proof: Useful Result when is known and unknown
Assume that the population is normally distributed N(𝒀 , 2)

with mean 𝒀 and variance 2 .

Then

y Y
 N (0,1) when 2 is known.
Var ( y )

y Y
 t( n 1) when 2 is unknown.
( y)
Var

4
Proof: Estimation of Confidence limits for the population
mean: known
When 2 is known, then the 100(1 – )% confidence interval is

given by
 y Y 
P Z   Z   1
 2 Var ( y ) 2 
 
or P  y  Z  Var ( y )  Y  y  Z Var ( y )   1  
 2 2 
the confidence limits are
 
 y  Z Var ( y ), y  Z  Var ( y ) 
 2 2 
where Z denotes the upper (/2)% points on N(0, 1) distribution.

5
Proof: Estimation of Confidence limits for the population
mean: unknown
When 2 is unknown, then the 100(1 – )% confidence interval is
given by  y Y 
P  t   t   1  
 2 ( y) 2 
 Var 

   
or P  y  t Var ( y )  Y  y  t Var ( y )   1  
 2 2 

the confidence limits are


   
 y  t Var ( y )  Y  y  t Var ( y ) 
 2 2 
where t denotes the upper (/2)% points on t distribution with
(n – 1) degrees of freedom. 6
Estimation of Population Total
Sometimes, it is also of interest to estimate the population total,
e.g. total household income, total expenditures etc.

Let YT denotes the population total defined as


N
YT   Yi  NY
i 1

YT can be estimated by

YˆT  NYˆ  Ny .

7
Estimation of Population Total
Then E (YˆT )  NE ( y )  NY  YT .
Variance of 𝑌 is
Var (YˆT )  N 2 Var ( y )
 2  N  n  2 N ( N  n) 2
 N  Nn  S  S for SRSWOR
   n

 N 2  N  1  S 2  N ( N  1) S 2 for SRSWR.
  Nn  n
Estimate of variance of 𝑌 is
 (Yˆ )  N 2 Var
Var ( y)
T

 N ( N  n) 2
 s for SRSWOR
n
 2
 N s2 for SRSWR.
 n 8
Determination of Sample Size
The size of the sample is needed before the survey starts and goes
into operation.

When the sample size increases, the variance of estimators


decreases but the cost of survey increases and vice versa.

So there has to be a balance between the two aspects.

9
Determination of Sample Size
The sample size can be determined on the basis of prescribed
values of the

• standard error of the sample mean,

• error of estimation,

• width of the confidence interval,

• coefficient of variation of the sample mean,

• relative error of sample mean or total cost among several

others.
10
Determination of Sample Size
An important constraint or need to determine the sample size is
that the information regarding the population standard derivation
S should be known for these criteria.

The reason and need for this will be clear when we derive the
sample size.

A question arises about how to have information about S


beforehand?

11
Determination of Sample Size
A possible solution to this issue is to conduct a pilot survey and
collect a preliminary sample of small size, estimate S and use it as a
known value of S it.

Alternatively, such information can also be collected from past


data, past experience, the long association of experimenter with
the experiment, prior information etc.

Now we find the sample size under different criteria assuming that
the samples have been drawn using SRSWOR. The case for SRSWR
can be derived similarly. 12
1. Prespecified Variance
The sample size is to be determined such that the variance of y
should not exceed a given value, say V. In this case, find n such that

Var ( y )  V
or N n 2
S V
Nn
1 1 V
or   2
n N S
1 1 1 S2
or   where ne  .
n N ne V
ne
or n
ne
1
N 13
1. Prespecified Variance
It may be noted here that ne can be known only when S2 is known.

This reason compels to assume that S should be known.


The same reason will also be seen in other cases.

The smallest sample size needed in this case is


ne
nsmallest 
ne
1
N

It N is large, then the required n is n  ne and nsmallest  ne .


14
2. Pre‐specified Estimation Error
It may be possible to have some prior knowledge of population
mean Y and it may be required that the sample mean y should
not differ from it by more than a specified amount of absolute
estimation error, i.e., which is a small quantity.

Such a requirement can be satisfied by associating a probability


(1 – ) with it and can be expressed as

P  y  Y  e   (1   ).

15
2. Pre‐specified Estimation Error
Assuming the normal distribution for the population,

 N n 2
y ~ N Y , S 
 Nn 
we can write
 y Y e 
P    1
 Var ( y ) Var ( y ) 

which implies that


e
 Z
Var ( y ) 2

or Z 2 Var ( y )  e 2
2
16
2. Pre‐specified Estimation Error
Now

N n 2
Z2
S  e2
2 Nn

or
  
2

  Z S / e  
  2  
n 2 
 1 1  Z S / e  
    
 N  2  

which is the required sample size. If N is large then


2
 
n   Z S / e  .
 2 
17
3. Pre‐specified Width of the Confidence Interval
If the requirement is that the width of the confidence interval of y

with confidence coefficient (1 – ) should not exceed a prespecified

amount W, then the sample size n is determined such that

2 Z  Var ( y )  W
2

assuming  2 is known and population is normally distributed.

18
3. Pre‐specified Width of the Confidence Interval
This can be expressed as

N n
2Z  S W
2 Nn

or 1 1 
4Z 2    S 2  W 2
2 n N

or 1 1 W2
 
n N 4 Z 2 S 2
2

4Z 2 S 2
2

or n W2 .
4Z 2 S 2
1 2
NW 2
19
3. Pre‐specified Width of the Confidence Interval
4 Z 2 S 2
2

The minimum sample size required is nsmallest  W2


4 Z 2 S 2
1 2
NW 2
If N is large then
4Z 2 S 2
n 2

W2

and the minimum sample size needed is

4 Z 2 S 2
nsmallest  2

W2
20
4. Pre‐specified Coefficient of Variation
The coefficient of variation (CV) is defined as the ratio of standard
error (or standard deviation) and mean.

The knowledge of coefficient of variation has played an important


role in the sampling theory as this information has helped in
deriving efficient estimators.

If it is desired that the CV of y should not exceed a given or pre‐


specified value of CV, say C0.

21
4. Pre‐specified Coefficient of Variation
CV ( y )  C0

Var ( y )
or  C0
Y
N n 2
S
or Nn  C 2
0
Y2
C2
1 1 C02 Co2
or   2 or n
n N C C2
1
NC02
is the required sample size where C  S is the
Y
population coefficient of variation.
22
4. Pre‐specified coefficient of Variation
The smallest sample size needed in this case is
C2
C02
nsmallest 
C2
1
NC02

If N is large, then
C2
n 2
C0
C2
and nsmalest  2
C0

23
5. Pre‐specified Relative Error
When y is used for estimating the population mean Y , then the
relative estimation error is defined as y  Y .
Y

If it is required that such relative estimation error should not


exceed a pre‐specified value R with probability (1 – ) , then such
requirement can be satisfied by expressing it like such requirement
can be satisfied by expressing it like

 y Y RY 
P    1.
 Var ( y ) Var ( y ) 

24
5. Pre‐specified Relative Error
Assuming the population to be normally distributed,
 N n 2
y ~ N Y , S .
 Nn 
So it can be written that
RY
 Z
Var ( y ) 2

 N n 2
or 2
Z   S  R 2 2
Y
2  Nn 

1 1  R2
or    2 2
 n N  C Z
2

25
5. Pre‐specified Relative Error
2
 Z C 
 2 
 R 
or  
n  
2
 Z C 
1  
1  2 
N R 
 
S
where C  is the population coefficient of variation and should
Y
be known.
If N is large, then
2
 z C 
n 2  .
 R 
 
 

26
6. Pre‐specified Cost
Let an amount of money C is being designated for sample
survey to collect n observations,
C0 be the overhead cost and
C1 be the cost of collection of one unit in the sample.

Then the total cost C can be expressed as


C  C0  nC1
or
C  C0
n
C1
is the required sample size.

27

You might also like