0% found this document useful (0 votes)
4 views10 pages

Lec. Note E2

The document outlines various sampling methods, categorizing them into probability and non-probability designs, with a focus on six basic random sampling techniques. It provides detailed explanations of simple random sampling, both with and without replacement, including estimators for population mean and total, variance, and confidence intervals. Additionally, it discusses the estimation of proportions and includes examples to illustrate the application of these sampling methods.

Uploaded by

Hansi Anjula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views10 pages

Lec. Note E2

The document outlines various sampling methods, categorizing them into probability and non-probability designs, with a focus on six basic random sampling techniques. It provides detailed explanations of simple random sampling, both with and without replacement, including estimators for population mean and total, variance, and confidence intervals. Additionally, it discusses the estimation of proportions and includes examples to illustrate the application of these sampling methods.

Uploaded by

Hansi Anjula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Lecture Note – 02 HPD-DSS

Sampling Methods (Designs)

Sampling designs can be broadly classified into two categories:


1. Probability sampling designs (Random sampling designs)
2. Non-probability sampling designs (Non-random sampling designs)
Probability Sampling Designs (Random Sampling Designs)
There are a number of sampling techniques that could be used for selecting a representative sample for
various purposes. The most appropriate technique to be used will depend on the purpose for which it is
to be used. Following are the six basic random sampling techniques that are available. Depending on the
situation we could use any one of them or a combination of them.
1. Simple random sampling (with or without replacement)
2. Systematic sampling
3. Stratified sampling
4. Cluster sampling
5. Multi-stage sampling
6. Probability proportional to size sampling (PPS)
Probability sampling designs again can be classified into two categories:
1. Known probability selection sampling designs
2. Unequal probability selection sampling designs (Probability Proportional to Size (PPS) Sampling
Design)

Population Sample

𝑇𝑜𝑡𝑎𝑙 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑆𝑖𝑧𝑒 = 𝑁 𝑇𝑜𝑡𝑎𝑙 𝑆𝑎𝑚𝑝𝑙𝑒 𝑆𝑖𝑧𝑒 = 𝑛

Population Units = y1 , y 2 ,..., y N Sample Units = y1 , y 2 ,..., y n


N 
1 1 n
Population Mean = Y =
N
 yi
i =1
Sample Mean = Y = y =  y i
n i =1

Population Variance = S 2 =
1 N
 yi − Y
N − 1 i =1
( )
2
Sample Variance = s 2 =
1 n

n − 1 i =1
( y i − y )2
N n
Population Total = Y =  y i y i
i =1
Population Total = Yˆ = Ny = N i =1

n
N n

Y
 yi N y
y i n
Population Ratio = R = = i =1
N
Population Ratio = Rˆ = r = = i =1
n
X
x i =1
i N x
x
i =1
i n

12 January, 2025
Lecture Note – 02 HPD-DSS


𝜇𝑦̄ = 𝐸 (𝑌) = 𝐸(𝑦̄ ) = 𝑌

(
 y2 = V  Y  = V ( y ) = E y − Y )2
=
S2
(1 − f )
  n
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑆𝑎𝑚𝑝𝑙𝑒
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐸𝑟𝑟𝑜𝑟 𝑜𝑓 𝑡ℎ𝑒 𝑀𝑒𝑎𝑛𝑠 = 𝑆. 𝐸. (𝑦̄ ) = 𝜎̂𝑦̄ = 𝑠𝑦̄ =
√𝑆𝑎𝑚𝑝𝑙𝑒 𝑆𝑖𝑧𝑒

Simple Random Sampling


This is the simplest probabilistic approach. All samples of size n (samples including n sample units)
have the same (equal) probability of selection. All sample units have a probability of selection n/N and
n(n − 1)
each set of two units have a joint probability of selection in the most usual situation of
N ( N − 1)
sampling without replacement. This may appear to be difficult to implement because there are
N!
N
Cn = possible samples if sampling is without replacement (so that all n units are distinct).
n!( N − n)!

10!
For example, for a small population of size 10 with two units selected, there are 10
C2 = possible
2!*8!
distinct samples. Selecting distinct units is more efficient than selecting with replacement, where a unit
can be selected and measured more than once. This should be intuitively reasonable, since re-measuring,
a unit already in the sample does not provide us with any new information as would the measurement of
a new unit for the sample. Tied to this is the concept of the finite population correction

( fpc) = N − n = 1 − n
, based on the sampling fraction (n N ) = f . The ( fpc) is usually part of the
N N
variance estimate and indicates that as sample size n goes to population size N, the variance estimate
becomes zero. This is true because basically, we are measuring the entire population as the sampling
fraction goes to one, or stated in another way, as the ( fpc) goes to zero. We often ignore the ( fpc) ,
because many populations are quite large and sample sizes are small so that the ( fpc) is essentially 1.
In this method of sampling, a sample of size n is drawn from a population of size N, in such a way that
every possible sample of size n ( N C n ) has an equal probability of being chosen.
SRS is not hard to implement conceptually if there is a list of the population units available. One only
has to make sure that the selection of any one unit is not influenced by the others selected or to be
selected. For example, one can give each of the units a distinct number from 1 to N and then select n
distinct random numbers between 1 and N or draw by placing the numbers 1 to N in a bowl and mixing
thoroughly. If the bowl is used, n numbers are drawn out in succession. The units which bear these
numbers constitute the sample. At any stage in the draw, the process gives an equal chance of selection
N
to all numbers not previously drawn. It is easy to verify that all C n possible samples have an equal

13 January, 2025
Lecture Note – 02 HPD-DSS
chance. Traditionally one could use a random number table but it is often more convenient now to use a
random number generator (with a computer application). SRS also has the advantage that since all units
have the same probabilities of selection, applicable analysis techniques are easy to implement and
estimation is straightforward and understandable.
Simple Random Sampling with Replacement - SRSWR
A simple random sample can be selected with replacement in the sense of replacing the selected units in
the population before the selection of another unit and in this scheme, there is a possibility of some units
getting repeated.
n
Estimator for Population Mean = Yˆ = y =  y i
1
(I)
n i =1

 s2 1 n
(II) Variance of Mean Estimator = V  Y  = V ( y ) = , where s 2 =  ( y i − y )2
  n n − 1 i =1

s2
S tan dard Error of the Mean Estimator = S .E.(Yˆ ) = S .E.( y ) = V ( y ) =
n
(III) (1 −  )100% Confidence Interval for Population Mean, Y : y  t (S.E.( y ))

(IV) Estimator for Population Total = YˆT = Yˆ = y T = Ny

()
(V) Variance of Total Estimator = V Yˆ = V ( yT ) = V ( Ny ) = N 2V ( y ) = N 2
s2
n

s2
S . E. of the Total Estimator = S .E.(Yˆ ) = S .E.( y T ) = V ( yT ) = N 2V ( y ) = N
n
(VI) (1 −  )100% Confidence Interval for Population Total, Y : yT  t ( N (S.E.( y )))
Simple Random Sampling without Replacement - SRSWOR
Sampling units without replacements are generally more efficient than sampling with replacements. In
this technique, the selected units are not replaced before the selection of another unit and as such, the
units cannot get replaced.
n
Estimator for Population Mean = Yˆ = y =  y i
1
(I)
n i =1

 s2 N − n 1 n
(II) Variance of Mean Estimator = V  Y  = V ( y ) = ( ), where s 2 =  ( y i − y )2
  n N n − 1 i =1

ˆ s2
S tan dard Error of the Mean Estimator = S .E.(Y ) = S .E.( y ) = V ( y ) = (1 − f )
n
(III) (1 −  )100% Confidence Interval for Population Mean, Y : y  t (S.E.( y ))

(IV) Estimator for Population Total = YˆT = Yˆ = y T = Ny

14 January, 2025
Lecture Note – 02 HPD-DSS

()
(V) Variance of Total Estimator = V Yˆ = V ( yT ) = V ( Ny ) = N 2V ( y ) = N 2
s2
n
(1 − f )

s2
S . E. of the Total Estimator = S .E.(Yˆ ) = S .E.( y T ) = V ( yT ) = N 2V ( y ) = N (1 − f )
n
(VI) (1 −  )100% Confidence Interval for Population Total, Y : yT  t ( N (S.E.( y )))

Random Numbers Table

15 January, 2025
Lecture Note – 02 HPD-DSS

The symbol t is the value of the normal distribution corresponding to the desired confidence probability.
The most common values are
Confidence probability (α) (%) 50 80 90 95 99
t – value (Zα) 0.67 1.28 1.64 1.96 2.58
If the sample size is less than 50 (n<50) [however, commonly consider sample size less than 30 (n<30)],
the percentage points may be taken from Student’s t table with (n-1) degrees of freedom, these being the
degrees of freedom in the estimated variance s2.

Example 1: Signatures to a particular petition were collected on 676 sheets. Each sheet had enough
space for 42 signatures, but on many sheets, a smaller number of signatures had been collected. The
number of signatures per sheet was counted on a random sample of 50 sheets (about a 7% sample), with
the results shown in the following table.

Number of
Signatures 42 41 36 32 29 27 23 19 16 15 14 11 10 9 7 6 5 4 3
(yi)
No. of
23 4 1 1 1 2 1 1 2 2 1 1 1 1 1 3 2 1 1
Sheets (fi)

Estimate the total number of signatures to the petition and the 90% confidence limits.
Example 2: In a particular sector of industry, a survey is conducted in an attempt to investigate the
extent of absenteeism not connected with illness or official holidays. A simple random sample of 1,000
men out of a total workforce of 36,000 was asked how many days they have taken off work in the
previous six months as “Casual Holidays”.

Days Off 0 1 2 3 4 5 6 7 8 9
(yi)

No. of 451 162 187 112 49 21 5 11 2 0


Men (fi)

Estimate the total number of man-days lost due to absentees and the 95% confidence limits.

Estimation of a Proportion - P
Sometimes we wish to estimate the total number, the proportion, or the percentage of units in the
population that possess some characteristic or attribute or fall into some defined class. Many of the
results regularly published from censuses or surveys are of this form, for example, the number of
unemployed persons, the number of smokers, and the percentage of students who object to the
educational reforms, the percentage of the population that is native-born. The classification may be

16 January, 2025
Lecture Note – 02 HPD-DSS
introduced directly into the questionnaire, as in questions that are answered by a simple “yes” or “no”,
“agree” or “disagree” etc.
Suppose we want to estimate the proportion of tires in a batch of tires. Suppose A is the number of
defective tires (number of items with some characteristic or attribute) and N is the total number of tires
in a batch. A sample of tires is taken and a number a of the tires is found to be defective. The total
number in the population and population proportion of the defectives and its counterpart in the sample
is given below.
Item population Sample
Total A = NP a = np
A a
Proportion  =P= ̂ = p =
N n

Simple Random Sampling with Replacement – SRSWR


a
(I) Estimator for Population Pr oportion = Pˆ = p =
n

()
(II) Variance of Pr oportion Estimator = V Pˆ = V ( p ) =
pq
n −1
, where q = 1 − p

p = Proportion of Successes, q = Proportion of Failures

S tan dard Error of the Pr oportion Estimator = S .E.( Pˆ ) = S .E.( p ) = V ( p) =


pq
n −1
(III) (1 −  )100% Confidence Interval for Population Pr oportion, P : p  t S.E.( p) + 1 2n

(IV) Estimator for Population Total = Aˆ = a = Np

()
(V) Variance of Total Estimator = V Aˆ = V (a ) = V ( Np) = N 2V ( p) = N 2
pq
n −1

S . E. of the Total Estimator = S .E.( Pˆ ) = S .E.( p ) = V ( p) = N 2V ( p) = N


pq
n −1
(VI) (1 −  )100% Confidence Interval for Population Total, A : p  t N (S.E.( p)) + 1 2n
Simple Random Sampling without Replacement - SRSWOR
a
(I) Estimator for Population Pr oportion = Pˆ = p =
n

()
(II) Variance of Pr oportion Estimator = V Pˆ = V ( p ) =
pq
n −1
(1 − f ), where q = 1 − p

p = Proportion of Successes, q = Proportion of Failures

S tan dard Error of the Pr oportion Estimator = S .E.( Pˆ ) = S .E.( p ) = V ( p) =


pq
(1 − f )
n −1

17 January, 2025
Lecture Note – 02 HPD-DSS
(III) (1 −  )100% Confidence Interval for Population Pr oportion, P : p  t S.E.( p) + 1 2n

(IV) Estimator for Population Total = Aˆ = a = Np

()
(V) Variance of Total Estimator = V Aˆ = V (a ) = V ( Np) = N 2V ( p) = N 2
pq
n −1
(1 − f )

s2
S . E. of the Total Estimator = S .E.( A) = S .E.(a ) = V (a) = N V ( p) = N
ˆ 2
(1 − f )
n
(VI) (1 −  )100% Confidence Interval for Population Total, Y : p  t N (S.E.( p)) + 1 2n

Example 1: In a simple random sample of size 75 from a batch of 900 electric circuits, 9 were found to
be defective. Find the 95% confidence limits for the proportion and the total number of defective circuits
in the population.

Estimation of a Ratio – R
Frequently the quantity that is to be estimated from a simple random sample is the ratio of two variables
both of which vary from unit to unit. In a household survey examples are the average number of suits or
clothes per adult male, the average expenditure on cosmetics per adult female, and the average number
of hours per week spent watching television per child aged 10 to 15. In the above variety of situations,
we may need to estimate the ratio of two population characteristics such as means, or totals of two
variables X and Y. Then the interested quantity is Ratio (R).
N N

X
x
i =1
i n x i =1
i
XT
Population Ratio ( R) = N
= N
=
Y YT
y
i =1
i n y
i =1
i

If we wish to estimate the proportion of students to teachers,


N

Students
x i
XT
Population Ratio ( R) = = i =1
N
=
y
Teachers YT
i
i =1

And the average income per household can be regarded as the ratio of the total household incomes to
the total number of households in the population.
XT
As mentioned above, we wish to estimate the population ratio ( R) = based on a simple random
YT

sample ( X 1 , Y1 ), ( X 2 , Y2 ),..., ( X n , Yn ) of the bivariate population measures ( X i , Yi ); (i = 1,2,3,..., n)

18 January, 2025
Lecture Note – 02 HPD-DSS
n

x i
x
(I) Estimator for Population Ratio = Rˆ = r = i −1
n
=
y
y
i =1
i

()
(II) Variance of Ratio Estimator = V Rˆ = V (r ) =
(1 − f )n 2 n n

2  
n(n − 1) y  i =1
x i − 2 r 
i =1
x i y i + r 2

i =1
y i2 

(1 − f ) n 2

St. Error of the Ratio Estimator = S .E.( Rˆ ) = S .E.(r ) = V (r ) =  (xi − ryi )


n(n − 1) y 2 i =1

(III) (1 −  )100% Confidence Interval for Population Ratio, R : r  t (S.E.(r ))

Example-1: The following table shows the number of persons (x1), the weekly family income
(x2)(Rs.’000), and the weekly expenditure on food (y)(Rs.’000) in a simple random sample of 33 low-
income families. Estimate from the sample;
(i) the mean weekly expenditure on food per family
(ii) the mean weekly expenditure on food per person
(iii) the percentage of the income that is spent on food and
(iv) compute the standard errors of these estimates.

H/H No. H/H Size Income Food Cost H/H No. H/H Size Income Food Cost

1 2 62 14.3 18 4 83 36.0
2 3 62 20.8 19 2 85 20.6
3 3 87 22.7 20 4 73 27.7
4 5 65 30.5 21 2 66 25.9
5 4 58 41.2 22 5 58 23.3
6 7 92 28.2 23 3 77 39.8
7 2 88 24.2 24 4 69 16.8
8 4 79 30.0 25 7 65 37.8
9 2 83 24.2 26 3 77 34.8
10 5 62 44.4 27 3 69 28.7
11 3 63 13.4 28 6 95 63.0
12 6 62 19.8 29 2 77 19.5
13 4 60 29.4 30 2 69 21.6
14 4 75 27.1 31 6 69 18.2
15 2 90 22.2 32 4 67 20.1
16 5 75 37.7 33 2 63 20.7
17 3 69 22.6 Total 123 2394 907.2

19 January, 2025
Lecture Note – 02 HPD-DSS
Determination of Sample Size - n
In the planning of a sample survey, a stage is always reached at which a decision must be made about
the size of the sample. The decision is important. Too large a sample implies a waste of resources, and
too small a sample diminishes the utility of the results. The decision cannot always be made satisfactory;
often we do not possess enough information to be sure that our choice of sample size is the best one.
Therefore, sampling theory provides a framework for this.
Under simple random sampling, suppose we seek the minimum value of n that will ensure
Pr( X − x  d )   , There is a 100α risk of X − x % being greater than d.

d d

x X
d = Margin of error or Tolerance, α = The risk of not obtaining such tolerance
Now x is taken as normally distributed. Then 100(1- α) % confidence interval for population mean X
𝑥̄ ± 𝑡𝑆. 𝐸. (𝑥̄ ) When sampling is done without replacement
𝑥̄ ± 𝑍𝛼⁄2 √(1 − 𝑓) 𝑆 2 ⁄𝑛 , 𝑆 2 = Population Variance

𝑑 = 𝑍𝛼⁄2 √(1 − 𝑓) 𝑆 2 ⁄𝑛
𝑑2 = 𝑍𝛼2⁄2 (1 − 𝑓) 𝑆 2 ⁄𝑛
-
-
-

𝑛0 𝑛0 −1 𝑛 𝑍𝛼2⁄2 𝑆 2
𝑛= 𝑛 = 𝑛0 (1 + ) , 𝑤ℎ𝑒𝑟𝑒𝑛0 = =
(1+ 0 ) 𝑁 (1−𝑓) 𝑑2
𝑁

S2 is known and this could be obtained from (a) a Pilot Survey and (b) a Previous Survey. d = ( / 100 ) X
, where  = percentage of X within which one would desire to have the x.

Example 1: To obtain an early indication of the total sales of Christmas Cards throughout a network of
243 retail stationary shops, it is decided that a random sample of the shops should submit returns of their
card sales by the end of January. How large a sample is needed to estimate the total sales to be within
10% of the correct figure with 95% assurance?
By July each year, precise figures for total sales of cards are available. For the previous 3 years, the
number of shops in the network has remained much the same; the total card sales and standard deviation
of sales from shop to shop have been (in units of 10,000 cards)

2020 2021 2022 2023 (In units of 10,000 cards)

X 321.7 366.8 401.0 440 – approximately


S 0.826 0.772 0.804 0.8 – ’’

20 January, 2025
Lecture Note – 02 HPD-DSS

Advantages of Simple Random Sampling


1. Scientific method: There is less chance for personal bias.
2. More representative: When the size of the sample increases, it is representative of the population.
3. Sampling error can be measured.
4. The theory of probability is applicable if a sample is random.
5. This method is economical as it saves time, money, and labour.

Disadvantages of Simple Random Sampling


1. This requires a complete list of the population but such up-to-date lists are not available in many
enquires.
2. If the size of the sample is small, then it will not be representative of the population.
3. When the distribution between items is very large, this method cannot be used.

21 January, 2025

You might also like