Sampling and Sampling Distribution
Sampling and Sampling Distribution
CHAPTER
Inferential Statistics
Introduction
Generally sampling has been employing not only in any statistical
investigation and research works but also in the case of daily life of human
beings. For example,
i. A housewife tests very small quantity of cooking foods to know about
the taste of whole food that she is cooking.
ii. A doctor tests a drop of blood of a patient to know about the
characteristic of the whole blood of the patient.
iii. A businessman gives order for the commodities by examining a small
unit of the same commodity.
iv. Planner taking representative class of people for measuring HDI and
PI for over all study.
In such practical decision making process associated with various fields of
human activities, most of our conclusions, decisions, and findings depend
upon the inspections or test or examination of few units of an aggregate or
totality. This process of getting result or information of conclusion about
the totality or universe or whole group by performing examination of
only some parts of the universe under the study (investigation) is called
sampling. In other words, sampling is the process by which inference is
made to the whole by examining only a few parts. Sampling is a tool which
enables us to draw conclusion about the characteristics of the population
after studying the items in the sample.
230 Inferential Statistics
The term census is used mostly in connection with National population census
and Housing Censuses and other common censuses include agriculture
census, business census, Industrial census survey etc. Census requires more
money, manpower and time.
The method or process of selecting a sample from a population under study
is called sampling.
A sample is a subset of units in a population selected to represent all units
in a population. It is a partial enumeration because it is a count from part
of the population. Therefore, the process (or survey) in which only part of
the population is selected and examined to estimate the certain character of
the population is known as sample survey. That is, the enumeration of the
selected units is known as sample survey. Information from the sampled
units is used to estimate the characteristics for the entire population. A
sample survey will usually be less expensive than a census survey and the
desired information will be obtained in less time.
When and Where Sampling/Census is Appropriate:
A sampling technique is appropriate
» When the universe is very large
» When the universe possess homogeneous characteristics
» When utmost accuracy is not required
» Where census is impossible i.e. in destructive/explosive nature of
testing.
A census is appropriate when
» The universe is small
» The population is heterogeneous
» Hundred percent accuracy is required
» The population frame is incomplete
Demerits of Sampling Technique:
» Less accuracy.
» Misleading conclusions.
» Need for specialized knowledge.
» When sampling is not possible.
» It cannot be used if the information of each and every unit of the
population is required.
Demerits of Census:
» Expensiveness
» Excessive time and effort.
» Not applicable for destructive testing.
» For infinite population, it is not possible.
232 Inferential Statistics
Objective of Sampling
The important and basic objectives of sampling are as follows:
1. Sampling can save time and money, resources
2. Sampling may be the only way for the universe having infinitely many
members.
3. Sampling may be the only choice of the test involves the destruction of
the item under investigation. If the characteristic of interest is relating to
highly sensitive, destruction of health hazards, complete enumeration
is impossible. So, we use sampling process.
4. Sampling process only possible for inferential drawing with limitation
of the different things.
Sampling Techniques
The method or process of selecting a sample from a population under study
is called sampling. There are various methods of sampling techniques used
for the selection of sample from the population. The method of selecting
a sample from the population is of fundamental importance in sampling
theory and depends very much on the objective and scope of the inquiry
or investigation , nature of the universe or population, the size of the
sample, available resources etc. Sampling methods can be classified into
the following broad categories.
a. Random sampling (Probability Sampling)
b. Non random sampling (Non- Probability Sampling)
c. Mixed sampling
a. Random sampling (Probability Sampling): Random sampling or
probability sampling is the scientific method of selecting samples
from the population according to some laws of chance in which each
and every unit in the population has some definite pre-assigned
probability of being selected in the sample. Probability samples are
selected in such a way so as to be representative of the population. The
various types of sampling are in which:
Measurements in Development 233
Merits:
» The units selected represents whole universe.
» The estimation of population parameters is more efficient.
» For large and heterogeneous population, stratified sampling is the
best design.
Demerits
» This method requires more time and cost.
» If each stratum of the population be not homogeneous the result
obtained may not be reliable.
» The samples from each stratum should be selected only by the
experts or experience persons.
3. Systematic Sampling: A random sampling in which only the first unit
is selected at random and the remaining units are automatically selected
according to pre-determined pattern (i.e. at fixed equal intervals from
one another) is known as systematic sampling. Systematic sampling is
a commonly used technique if the complete and up-to-date sampling
frame is available i.e. complete and up-to-date list of sampling units is
available.
Suppose N units of the population are numbered from 1 to N in some
order. Let N = n k where n sample size and K is an integer known as
Demerits
» The sample selected may not be the representative of the population
in some cases.
» The items of the population must be arranged in some order
otherwise the result obtained will be misleading
4. Cluster Sampling: A Cluster sampling is a method or technique of
random sampling in which the population is divided into different
groups, called clusters, in such a way that the characteristics of units
within the cluster are heterogeneous and between the clusters are
homogenous so that the number of sampling units in each cluster
should be approximately same. Then a cluster is selected as a sample
by using simple random sampling. This method of random sampling
is called cluster sampling. In cluster sampling individual clusters
are representative of the population as a whole. For example, Let us
suppose that we want to study about the economic condition of people
of Kathmandu metropolitan city. First of all, metropolitan city is divided
into different wards in such a way that the economic condition of people
within ward is heterogeneous and between wards are homogeneous.
Then, a ward is selected as sample by using simple random sampling
method and we can study about the economic condition of people in
Kathmandu metropolitan city.
Merits
» It is less costly than simple random sampling and stratified
sampling
» It is useful even when the sampling frame of elements may not be
available.
» Elements (units) selected by well-designed cluster sampling
procedures in easier, faster cheaper and more convenient than
simple random sampling and stratified sampling.
Demerits
» The efficiency decreases with increase in cluster size.
» The efficiency cost per unit may be more in cluster sampling.
» Enumeration of the sampling units within the selected clusters is
difficult when the population is large.
5. Multistage Sampling: Multi-stage sampling is a further development
of the principle of cluster sampling. Multi-stage sampling is also a
random sampling in which sampling procedure is carried out i.e.
done in various stages. At first stage, the population is divided into
large sample units i.e. large groups called first stage units (fsu) or
primary stage units and cluster is selected as the sample at random
from them. At the second stage the selected clusters at the first
Measurements in Development 237
Sampling Error
Sampling errors arise due to the fact that when a part of population (i.e.
sample) has been used to estimate the population parameters and draw
inferences about the population. So, the sampling errors are absent in
complete enumeration survey or census. The magnitude of sampling error
depends upon the nature of the population and size of the sample. If the
sample size increases, the sampling error will decrease.
Sampling error are occurred due to the following reasons:
» Faulty selection of the sample.
» Substitution of convenient unit of the population.
» Faulty demarcation of sampling units.
» Improper choice of the statistics for estimating population parameter.
Measurements in Development 241
Sampling size
X
Sampling error
Sample Size
Some units selected from the population is known as sample and number of
such unit is known by sample size. The sample size is directly proportional
to the level of accuracy and scatterer of data and inversely proportional
to the permissible error and difference between sample statistic and
population parameter.
Statistic
Statistics are the functions of the sample observations. In other words, the
statistical measures or constants which are calculated from the sample
data are called statistics. The statistical constants of the samples like
sample mean, sample variance, sample skewness etc are statistics. Hence,
a statistic is a characteristic of a sample and it is used to estimate the value
of population.
Parameter Statistic
Population size(N) Sample size (n)
Population mean (µ ) Sample mean x ()
Population standard deviation (σ ) Sample standard deviation (s )
Population proportion (P ) Sample proportion (p)
Population correlation coefficient (ρ ) Sample correlation coefficient (r)
Population coefficient of skewness β 1 Sample coefficient of skewness (b1)
Population coefficient of Kurtosis Sample coefficient of kurtosis (b2) etc.
(β 2 ) etc.
█
Measurements in Development 243
Sampling Distribution
Sampling distribution may be defined as the probability distribution of
a given statistic based on a random sample. Sample statistic is a random
variable because every random variable possesses a probability distribution;
each sample statistic possesses probability distribution. It describes the
range of all possible values that the sampling statistic can take.
Let us take the sample size is 2 and population size is 5. Then number
possible samples of size 2 can be drawn from a finite population of size 5 is
C(5,2) =10. We may compute the mean and the standard deviation for each
sample. By computing the mean and standard deviation for each of these
samples, we would quickly observe that the mean and standard deviation
of each sample would be different. The probability distribution of sample
mean i.e.10 sample means, form a distribution is known as the sampling
distribution of the mean.
EXAMPLE 1:
A population consists of five numbers 1, 3, 5, 7 and 9
(i) Enumerate all possible sample of the size two which can be
drawn from the population without replacement.
(ii) Calculate the mean and variance of the population.
(iii) Show that the mean of the sampling distribution of the sample
means is equal to the population mean.
(iv) Calculate the variance of the sampling distribution of sample
mean.
(v) Standard error of mean
Solution:
Here, N = 5, n = 2
3 –2 4
5 0 0
7 2 4
9 4 16
∑Y = 25 ∑(Y – Y )2 = 40
Population mean = Y = ΣY = 25 = 5
N 5
( )
2
Σ Y−Y
Population variance = = 40 = 8
N 5
iii. Calculation of sample means and variance of the sampling
distribution of means:
Sample Sample
no. values (y) Sample means ( y) y−y ( y − y )2
1 (1, 3) 2 –3 9
2 (1, 5) 3 –2 4
3 (1, 7) 4 –1 1
4 (1, 9) 5 0 0
5 (3, 5) 4 –1 1
6 (3, 7) 5 0 0
7 (3, 9) 6 1 1
8 (5, 7) 6 1 1
9 (5, 9) 7 2 4
10 (7, 9) 8 3 9
50 30
y
y ) = ∑=
50
Mean of the sample means ( = 5
10 10
Since the mean of the sample means ( y) = 5 equal to the
population mean Y = 5, we conclude that the mean of the
sampling distribution of the sample means is equal to the
population mean.
iv. Variance of the sample means is
246 Inferential Statistics
2
1 30 3
=V(y) =∑ y – y
n 10
v. The standard error of sample mean is given by
S.E.(y)
= V(y)
= 3 1.732
=
σ N–n
S.E. (x) = For infinite population.
n N–1
PQ N–n
S.E. (p) = For finite population.
n N–1
∑ Y = 60 ∑ (Y – Y)2 = 30
Y 60
∑=
Mean of sample means = Y = = 6
10 10
∴ Mean of sample means = population mean = 6, i.e. E(x) = µ .
2
iv. Variance of sample means = σ= V(Y)
=
∑ (Y – Y) 30
= = 3.
y
10 10
2
= σ=
Variance with formula y V(Y)
σ2 N–n 8 5–2
= = =3
n NÝ 2 5 – 1
So, σ y2 = v(Y)verified.
v. S tandard error of mean
= S.E. (x) = V(x) = 3 = 1.732
Example 3:
A population consists of three number 1, 3 and 5.
i. Enumerate all possible samples of size 2 which can be drawn
from this population with replacement.
ii. Find mean and variance of the population.
iii. Find the mean of the sampling distribution of means and show
that it is equal to the population mean.
iv. Find the variance of sampling distribution of means and also
σ2
verify with the formula v(x) = .
n
Where, σ = population variance
2
n = sample size.
v. Find the standard error of mean.
Solution:
i. No. of possible sample of size 2 which can be drawn from this
population with replacement = Nn = 32 = 9.
Possible samples are:
(1, 1) (1, 3) (1, 5) (3, 1) (3, 3) (3, 5) (5, 1) (5, 3) (5, 5)
Measurements in Development 249
∑ Y = 27 ∑ (Y – Y)2 = 12
Mean of sample means
∑=Y 27
Y
= = 3
9 9
So, mean of sample mean = population mean i.e. E(x) = µ .
iv. Variance of sample means
2
σ= V(x)
=
∑ (Y –Y) 12
= = 1.33
y
9 9
Variance with formula,
2 σ2 2.66
σ=
y = = 1.33
n 2
σ2
Hence, σ y2 = verified.
2
S tandard error of mean
= v(x)
= 1.33
Example 4:
From a population of 520, a sample of 25 units is taken. If the
population standard deviation is 1.5, find the standard error of the
sample mean when the sample is taken (i) without replacement (ii)
with replacement.
250 Inferential Statistics
Solution,
i. Sampling without replacement
σ N–n 1.5 520 – 25
S.E. (x)
= = = 0.293
n N–1 25 520 – 1
ii. Sampling with replacement
σ 1.5
S.E. =
(x) = = 0.3
n 25
Example 5:
A simple random sample of 25 apples drawn without replacement
from a lot of 300. If the number of bad apples in the lot is 15, find the
standard error sample proportion of bad apples.
Solution:
Given, n = 25; N = 300
P = Population proportion of bad apples,
15 1
= = ,so that
300 20
1 19
Q 1=
= – p 1 –=
20 20
We have,
Standard error of proportion (without replacement)
PQ N – n
= .
n N–1
1 19
.
= 20 20 . 300 – 25
25 300 – 1
= 0.042.
Exercise 3(A)
THEORETICAL QUESTIONS
1. What is sampling? Describe the various methods of sampling with
their merits and demerits.
2. Distinguish between simple random sampling and judgment
sampling.
3. What is meant by the sampling distribution of a statistic?
4. Differentiate between probability and non-probability sampling.
5. What is meant by the standard error?
Measurements in Development 251
█
252 Inferential Statistics
Estimation
The general areas in statistical inference are estimation and testing of
hypothesis. The statistical technique of estimating unknown population
parameters from corresponding sample statistic is known as estimation.
The main objective of estimation is to obtain a guess or estimate of
the unknown true value from the sample data or past experience. In
estimation, the sample statistics are used to estimate the population
parameters. For example, a campus chief makes the estimates of
enrollments next year in BBS from current enrollments in the same
courses. A businessman estimates his future sales and profits from the
past records. In each case, someone is trying to draw conclusion about a
population from sample information. In our daily life we use estimation
by knowingly and unknowingly.
"Statistics is the science of error" By M.L. Singh. Statistics is the science
of estimation and probability." By Boddington
The main objective of estimation is minimize the error and control the
variance by estimating population parameter.
"Theory of estimation" R.A. Fisher in 1920.
We use the estimation in different sector of society, planning and
development, working physiology, production etc.
Types of Estimates
A random sample of a given size is selected from a given population
and then compute statistic which is a characteristic of the sample and
this becomes an estimate of the similar characteristic of the population.
There are two types of estimates about a population namely a point
estimate and an interval estimate.
Theory of estimation 1920 by R.A. Fisher give two type of estimation (i)
Point Estimate (ii) Internal Estimate.
Point Estimate
A single value of a statistic that is used to estimate a population
parameter is called a point estimate. The single computed value is
referred as an estimate whereas an estimator is a rule which tells us
how to compute this value. For example, the sample mean (X) which we
use for estimating the population mean is an estimator of µ and single
numerical value of sample mean is called an estimate of the parameter µ.
Similarly, the sample variance s2 is the estimator of population variance
σ2. Point estimation is most risk-full for decision making.
254 Inferential Statistics
1. Unbiasedness
If (X) = µ, if the expected value of sample statistics is the population
parameter then the sample estimator is unbiasedness.
E(s)2 = σ
But if E(X) = µ ≠ 0 then it is called bias. If E(X) – µ > 0, then it is
positively biased
If E(X) – µ < 0, then it is called negative biased.
2. Consistency (Sample size method):
An
^θ estimator is said to be consistent estimator if that tends to
have a greater proibability of being close to population parameter
^
as the sample size increases. That is, an estimator θ is said to be
consistent estimator of θ if for every ε > 0. Total probability is 1.
3. Efficiency (variation method)
^
An estimator θ is said to be efficient estimator of θ if it has the
minimum variance then other estimator of θ. That is smaller
the variance of the estimator the greater is the probability that
the estimate will be close to the true value of the parameter.
Such a criterion which is based on the variances of the sampling
distribution of estimators is usually known as efficiency.
4. Sufficiency (Information coverage method)
An estimator is said to be sufficient for a parameter, if it contains
all the information in the sample regarding the Parameter, more
precisely, if T = t(x1, x2, ....., xn) is an estimator of a parameter θ,
based on a sample x1, x2, ....., xn of size n from the population with
density f(x, θ) such that the conditional distribution of x1, x2, ...., xn
given T, is independent of θ, then T is sufficient estimator for θ.
Measurements in Development 255
Interval Estimate
A range of values used to estimate a population parameter is called an
interval estimate. There are two ways for indicating error by the extent
of its range and by the probability of the true population parameter
lying within that range. In this case, the decision maker has a better idea
of the reliability of his estimate. For example, if the average salary per
month of all professors of T.U. lies between Rs. 30,000 to Rs.40,000, then
T.U. management can estimate the exactly salary that fall within this
interval. Such estimate is an interval estimate.
± Zα
d1 d2
The interval [d1, d2] within which the unknown value of parameter θ
is expected to lie is known as confidence interval (Neymen) or fiducial
interval (R.A. Fisher), the limits d1 and d2 so determination of two
constant d1 and d2 say such that 1 – α called the accepted region, α is
called rejection region.
Confidence Level
While making interval estimation we have to determine the level of
our confidence, the percentage of population mean lying within the
interval which is called the confidence level setting the confidence
level as 95% means out interval includes 95% of the true population
mean. For large sample, the sampling distribution follow normal
distribution. The value of Z at 95% confidence level is 1.96. The
confidence level can be taken as 90%, as 10 or 99 or any other values
depending upon the need of the population problems.
± Zα
d1 d2
256 Inferential Statistics
PQ
= p ± Zα . For infinite population (SRSWR)
n
PQ N- n
= p ± Zα For finite population (SRSWOR)
n N-1
We have,
X–µ
Zα =
SE (X)
E
Or, Zα = σ
√n
Z∝ σ
Or, √n =
E
2
Z .σ
∴ n= α
E
Where,
n = Sample size
σ= Population Standard Deviation
E= Minimum permissible error which is the difference
between th e sample mean and population mean
Zα = Significant value or critical value of Z corresponding to
level of ∝ significant.
α= Risk = Level of significance = Reject H0/H0 true
ii. Estimating sample size for Population Proportion
We have,
p−P
Zα =
SE P
E
Or, Zα =
PQ
n
Zα
Or, n = PQ .
E
2
Z
∴ n = PQ α
E
Where, P = Population Proportion
Q = 1− P
Zα = SNV for Level of Significant
α = Level of Significance
258 Inferential Statistics
List of Formula
1. Confidence interval for population mean or, fiducial limit by R.H.
Fisher
σ
µ = X ± Zα. SE X = X ± Zα ... (i )
n
2. Confidence interval for population protective.
P = P ± Zα .S E(p )
PQ
= P ± Zα . ... (i )
n
3. Sample size for population mean.
2
Zα σ
n =
E
4. Sample size for population protective.
2
Zσ
n = PQ
E
∝ = Risk for non representation
= Level of significance.
The following table gives common confidence level and their Z -
value.
Level of confidence 50% 68.27% 90% 95% 96% 98% 99% 99.77
Level of significance 50% 31.73% 10% 5% 4% 2% 1% 0.23
£∝ 0.6745 1 1.645 1.96 2.05 2.33 2.575 3
Note: When number reference to the confident level is given than always table
In hot given and general level of significance is 5%
α = 0.05
Zα = 1.96
Example 1:
A sample of 500 bulbs of a company showed an average life of 1400
hours with standard deviation of 30 hours. Obtain 95% and 99%
fiducial limits for population mean.
Solution:
95%
Sample size (n) = 500
Average life (X) = 1,400
Standard deviation (σ) = 30
We have, Zα=1.96 Zα=1.96
σ 30
S tandard error i.e. SE (X) = =
n 500
For 95% fudicial limits = 1.342
Measurements in Development 259
n = 100, X = 65 σ = 3 1% 1%
σ 3
SE, σ X = = = 0.3
√n √100
i. 99% confidence limit is
µ = X ± Z α (SE)
= 65 ± 2.58 (0.3)
Upper limit of population mean
= 65 + 2.58 (0.3)
= 65 + 0.774
= 65.774
260 Inferential Statistics
Solution:
Given, 95%
5 5
n = 36 2 2
N = ∞, 0 = 12.6
When N = ∞,
σ 12.6 12.6
SE( X ) = = = = 2.1
n 36 6
When , N = 101
σ N−n
SE( X ) = =
n N −1
12.6 101 − 36
=
6 101 − 1
= 1.69