Unit - Iv Sampling
Unit - Iv Sampling
SAMPLING
Introduction: The totality of observations with which we are concerned , whether this
number be finite or infinite constitute population. In this chapter we focus on sampling from
distributions or populations and such important quantities as the sample mean and sample
variance.
Def: Population is defined as the aggregate or totality of statistical data forming a subject of
investigation .
The number of observations in the population is defined to be the size of the population. It
may be finite or infinite .Size of the population is denoted by N.As the study of entire
population may not be possible to carry out and hence a part of the population alone is
selected.
Def: A portion of the population which is examined with a view to determining the
population characteristics is called a sample . In other words, sample is a subset of
population. Size of the sample is denoted by n.
The process of selection of a sample is called Sampling. There are different methods of
sampling
Classification of Samples:
Central Limit Theorem: If 𝑥̅ be the mean of a random sample of size n drawn from
population having mean 𝜇 and standard deviation 𝜎 , then the sampling distribution of the
𝜎
sample mean 𝑥̅ is approximately a normal distribution with mean 𝜇 and SD = S.E of 𝑥̅ = 𝑛
√
provided the sample size n is large.
Standard Error of a Statistic : The standard error of statistic ‘t’ is the standard deviation of
the sampling distribution of the statistic i.e, S.E of sample mean is the standard deviation of
the sampling distribution of sample mean.
𝜎 2 𝜎2 2
S.E of the difference of two sample means ̅̅̅
𝑥1 and ̅̅̅
𝑥2 i.e, S.E (𝑥 𝑥2 ) = √ 𝑛1 +
̅̅̅1 − ̅̅̅ 𝑛2
1
𝑃1𝑄1 𝑃2 𝑄2
S.E of the difference of two proportions i.e, S.E(𝑝1 − 𝑝2 ) = √ 𝑛1
+ 𝑛2
Estimation :
To use the statistic obtained by the samples as an estimate to predict the unknown parameter
of the population from which the sample is drawn.
Ex. Sample proportion is an estimate of population proportion , because with the help of
sample proportion value we can estimate the population proportion value.
Types of Estimation:
In an interval estimation of the population parameter 𝜃, if we can find two quantities 𝑡1 and
𝑡2 based on sample observations drawn from the population such that the unknown parameter
𝜃 is included in the interval [𝑡1, 𝑡2 ] in a specified cases ,then this is called a confidence
interval for the parameter 𝜃.
̅̅̅1 − ̅̅̅
95% confidence limits are ((𝑥 𝑥2 )± 1.96 (S.E of ((𝑥
̅̅̅1 − ̅̅̅
𝑥2 ))
̅̅̅1 − ̅̅̅
99% confidence limits are ((𝑥 𝑥2 )± 2.58 (S.E of ((𝑥
̅̅̅1 − ̅̅̅
𝑥2 ))
99.73% confidence limits are ((𝑥 ̅̅̅1 − ̅̅̅
𝑥2 )± 3 (S.E of ((𝑥
̅̅̅1 − ̅̅̅
𝑥2 ))
̅̅̅1 − ̅̅̅
90% confidence limits are ((𝑥 𝑥2 )± 2.58 (S.E of ((𝑥
̅̅̅1 − ̅̅̅
𝑥2 ))
Confidence limits for the difference of two population proportions
𝑧𝜎 2
n= ( ) where z – Level of significance
𝐸
𝑧 2𝑃𝑄
𝑛= 𝐸2
where z – Level of significance
P − Population proportion
𝑄 − 1-P
𝐸 − Maximum Sampling error = p-P
Testing of Hypothesis :
It is an assumption or supposition and the decision making procedure about the assumption
whether to accept or reject is called hypothesis testing .
Def: Statistical Hypothesis : To arrive at decision about the population on the basis of
sample information we make assumptions about the population parameters involved such
assumption is called a statistical hypothesis .
Null hypothesis: A definite statement about the population parameter. Usually a null
hypothesis is written as no difference , denoted by 𝐻0 .
Ex. 𝐻0 : 𝜇 = 𝜇0
Alternative hypothesis : A statement which contradicts the null hypothesis is called
alternative hypothesis. Usually an alternative hypothesis is written as some difference
, denoted by 𝐻1 .
Setting of alternative hypothesis is very important to decide whether it is two-tailed or
one – tailed alternative , which depends upon the question it is dealing.
Ex.𝐻1 : 𝜇 ≠ 𝜇0 (Two – Tailed test)
or
𝐻1 : 𝜇 > 𝜇0 (Right one tailed test)
or
𝐻1 : 𝜇 < 𝜇0 (Left one tailed test)
The LOS denoted by 𝛼 is the confidence with which we reject or accept the null
hypothesis. It is generally specified before a test procedure ,which can be either 5%
(0.05) , 1% or 10% which means that thee are about 5 chances in 100 that we would
reject the null hypothesis 𝐻0 and the remaining 95% confident that we would accept
the null hypothesis 𝐻0 . Similarly , it is applicable for different level of significance.
There are several tests of significance like z,t, F etc .Depending upon the nature of the
information given in the problem we have to select the right test and construct the test
criterion and appropriate probability distribution.
One Tailed Test: The critical region under the curve is distributed on one side of
the mean.
Left one tailed test: If 𝐻1 has < sign , the critical region is taken in the left side of the
distribution.
Right one tailed test : If 𝐻1 has > sign , the critical region is taken on right side of the
distribution.
By comparing the computed value and the critical value decision is taken for accepting or
rejecting 𝐻0
Errors of Sampling :
While drawing conclusions for population parameters on the basis of the sample results , we
have two types of errors.
Type I error : Reject 𝐻0 when it is true i.e, if the null hypothesis 𝐻0 is true but it is
rejected by test procedure .
Type II error : Accept 𝐻0 when it is false i.e, if the null hypothesis 𝐻0 is false but it is
accepted by test procedure.
DECISION TABLE
𝑯𝟎 is accepted 𝑯𝟎 is rejected
Problems:
a) List all possible samples of size 3 that can be taken without replacement from
finite population
b) Calculate the mean of each of the sampling distribution of means
c) Find the standard deviation of sampling distribution of means
3+6+9+15+27 60
Sol: Mean of the population , 𝜇 = = =12
5 5
81+36+9+9+225 360
=√ 5
=√ 5
= 8.4853
= 13.3
2.A population consist of five numbers 2,3,6,8 and 11. Consider all possible samples of size
two which can be drawn with replacement from this population .Find
16+9+0+4+25
= = 10.8 ∴ 𝜎 = 3.29
5
3.When a sample is taken from an infinite population , what happens to the standard error of
the mean if the sample size is decreased from 800 to 200
𝜎
Sol: The standard error of mean =
√𝑛
let n= 𝑛2 =200
𝜎 𝜎
Then S.E2 = = 10
√200 √2
𝜎 𝜎
∴ S.E2 = 10 = 2(20 2) = 2 (S.𝐸1 )
√2 √
Hence if sample size is reduced from 800to 200, S. E. of mean will be multiplied by 2
4.The variance of a population is 2 . The size of the sample collected from the population is
169. What is the standard error of mean
𝜎 √2 1.41
Standard Error of mean = = = = 0.185
√𝑛 √169 13
5.The mean height of students in a college is 155cms and standard deviation is 15 . What is
the probability that the mean height of 36 students is less than 157 cms.
Thus the probability that the mean height of 36 students is less than 157 = 0.7881
6.A random sample of size 100 is taken from a population with 𝜎 = 5.1 . Given that the
sample mean is 𝑥̅ = 21.6 Construct a 95% confidence limits for the population mean .
7.It is desired to estimate the mean time of continuous use until an answering machine will
first require service . If it can be assumed that 𝜎 = 60 days, how large a sample is needed so
that one will be able to assert with 90% confidence that the sample mean is off by at most 10
days.
Sol: We have maximum error (E) = 10 days , 𝜎 = 60 days and 𝑧𝛼⁄2 = 1.645
𝑧𝛼⁄ .𝜎 2 1.645 x 60 2
2
∴n=[ ] =[ ] = 97
𝐸 10
8.A random sample of size 64 is taken from a normal population with 𝜇 = 51.4 and 𝜎 =
6.8.What is the probability that the mean of the sample will a) exceed 52.9 b) fall between
50.5 and 52.3 c) be less than 50.6
∴ P(̅̅̅
𝑥 > 52.9 ) = P(z > 1.76)
𝑥
̅̅̅2̅−𝜇 52.3−51.4
𝑧2 = 𝜎 = 0.85
= 1.06
√𝑛
P(50.5 < 𝑥̅ < 52.3) = P(-1.06 < z < 1.06)
= P(-1.06 < z < 0) + P(0 < z < 1.06)
= P(0 < z < 1.06) + P(0 < z < 1.06)
= 2( 0.3554) = 0.7108
9.The mean of certain normal population is equal to the standard error of the mean of the
samples of 64 from that distribution . Find the probability that the mean of the sample size 36
will be negative.
𝜎
Sol: The Standard error of mean =
√𝑛
Sample size , n =64
Given mean , 𝜇 = Standard error of the mean of the samples
𝜎 𝜎
𝜇= =
√64 8
𝜎
𝑥̅ −𝜇 𝑥̅ −
8
We know z = 𝜎 = 𝜎
√𝑛 6
6𝑥̅ 3
= -
𝜎 4
If Z < 0.75, ̅𝑥 is negative
P(z < 0.75) = P( − ∞ < 𝑧 < 0.75 )
0 0.75
= ∫− ∞ ∅(𝑧) dz + ∫0 ∅(𝑧)dz = 0.50 + 0.2734
= 0.7734
10.The guaranteed average life of a certain type of electric bulbs is 1500hrs with a S.D of 10
hrs. It is decided to sample the output so as to ensure that 95% of bulbs do not fall short of the
guaranteed average by more than 2% . What will be the minimum sample size ?
𝑥̅ − 𝜇 1470−1500 𝑛
∴ |𝑧 | = | 𝜎 | =| 120 |=√
4
√𝑛 √𝑛
From the given condition , the area of the probability normal curve to the left of
√𝑛
4
should be 0.95
√n
∴ The area between 0 and 4
is 0.45
We do not want to know about the bulbs which have life above the guranteed life .
√𝑛
∴ 4
= 1.65 i.e., √𝑛 = 6.6
∴ n = 44
11.A normal population has a mean of 0.1 and standard deviation of 2.1 . Find the probability
that mean of a sample of size 900 will be negative .
12.In a study of an automobile insurance a random sample of 80 body repair costs had a mean
of Rs 472.36 and the S.D of Rs 62.35. If 𝑥̅̅ is used as a point estimator to the true average
repair costs , with what confidence we can assert that the maximum error doesn’t exceed Rs
10.
13.If we can assert with 95% that the maximum error is 0.05 and P = 0.2 find the size of the
sample.
0.2 x 0.8
⇒ 0.05 = 1.96 √
𝑛
0.2 x 0.8 x (1.96)2
⇒ Sample size , n = (0.05) 2
= 246
14.The mean and standard deviation of a population are 11,795 and 14,054 respectively .
What can one assert with 95 % confidence about the maximum error if 𝑥̅ = 11,795 and n =
50. And also construct 95% confidence interval for true mean .
𝜎 𝜎
∴ Confidence interval = ( 𝑥̅ − 𝑍𝛼⁄2 . , 𝑥̅ + 𝑍𝛼⁄2 . )
√𝑛 √𝑛
= (11795-3899, 11795+3899)
= (7896, 15694)
15.Find 95% confidence limits for the mean of a normally distributed population from which
the following sample was taken 15, 17 , 10 ,18 ,16 ,9, 7, 11, 13 ,14.
15+17+10+18+16+9+7+11+13+14
Sol: We have 𝑥̅ = 10
= 13
(𝑥𝑖 −𝑥̅ )2
𝑆2 = ∑
𝑛−1
1
= [(15 − 13)2 + (15 − 13)2 + (15 − 13)2 + (15 − 13)2 + (15 − 13)2 +
9
(15 − 13)2 + (15 − 13)2 + (15 − 13)2 + (15 − 13)2 + (15 − 13)2 ]
40
= 3
𝑠 √40
𝑍𝛼⁄2 . = 1.96 . = 2.26
√𝑛 √10.√3
𝑠
∴ Confidence limits are 𝑥̅ ± 𝑍𝛼⁄2 . = 13 ± 2.26 = ( 10.74 , 15.26 )
√𝑛
16.A random sample of 100 teachers in a large metropolitan area revealed mean weekly
salary of Rs. 487 with a standard deviation Rs.48. With what degree of confidence can we
assert that the average weekly of all teachers in the metropolitan area is between 472 to 502 ?
P ( 472 < ̅
x < 502 ) = P ( -3.125 < z < 3.125 )