0% found this document useful (0 votes)
3 views

Sampling and Sampling Distribution

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Sampling and Sampling Distribution

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Measurements in Development 229

CHAPTER

Inferential Statistics

3.1 Sampling Techniques

Introduction
Generally sampling has been employing not only in any statistical
investigation and research works but also in the case of daily life of human
beings. For example,
i. A housewife tests very small quantity of cooking foods to know about
the taste of whole food that she is cooking.
ii. A doctor tests a drop of blood of a patient to know about the
characteristic of the whole blood of the patient.
iii. A businessman gives order for the commodities by examining a small
unit of the same commodity.
iv. Planner taking representative class of people for measuring HDI and
PI for over all study.
In such practical decision making process associated with various fields of
human activities, most of our conclusions, decisions, and findings depend
upon the inspections or test or examination of few units of an aggregate or
totality. This process of getting result or information of conclusion about
the totality or universe or whole group by performing examination of
only some parts of the universe under the study (investigation) is called
sampling. In other words, sampling is the process by which inference is
made to the whole by examining only a few parts. Sampling is a tool which
enables us to draw conclusion about the characteristics of the population
after studying the items in the sample.
230 Inferential Statistics

Meaning of Population and Sample


Population
When we think of the term “population,” we usually think of people in
our town, region, state or country. But in statistics, the term “population”
takes a slightly different meaning. In statistics, “population or universe”
is the aggregative of objects under in any statistical investigation. That
is, statistical population refers the totality or aggregative of all units or
items under investigation or study. In the words, the group of items or
units under the study (investigation or inquiry) is called population. There
are two types of population. They are target population and sampling
population.
The population or universe may be finite or infinite. A population
containing a finite number of objects or items is known as finite population.
For example, total no. of students of T.U in BBS, the total number of books
in a library, total number of households in Kathmandu etc.
On the other hand, a population having an infinite number of objects
is called infinite population. E.g. the population of stars in the sky is an
infinite population, total number of fishes in Pacific Ocean, total number of
trees in a forest etc.
Sample
A part or portion of the population which is considered for study and
analysis is called a sample. In other words, a finite subset of the
population with the objective of investigating its properties is called a
sample and the number of units in the sample is known as the sample
size.
Sampling Frame: A complete list of the sampling units, which represents
the population to be covered, is called the sampling frame popularly known
as frame. In other words, a list of all the elements that is in the population.
Sampling Units: The sampling unit is the basic unit containing the
elements of the population to be sampled. It may be the element itself or a
unit in which the element is contained. The sampling unit selected is often
dependent upon the sampling frame. The selection of sampling units is also
partially dependent upon the overall design of the project.

Census Versus Sampling


A census is a study of every unit, everyone or everything, in a population.
That is, a census is the complete enumeration or count of all units of the
population for certain character of the population. Hence, the complete
enumeration of all units of the population is known as Census Survey.
Measurements in Development 231

The term census is used mostly in connection with National population census
and Housing Censuses and other common censuses include agriculture
census, business census, Industrial census survey etc. Census requires more
money, manpower and time.
The method or process of selecting a sample from a population under study
is called sampling.
A sample is a subset of units in a population selected to represent all units
in a population. It is a partial enumeration because it is a count from part
of the population. Therefore, the process (or survey) in which only part of
the population is selected and examined to estimate the certain character of
the population is known as sample survey. That is, the enumeration of the
selected units is known as sample survey. Information from the sampled
units is used to estimate the characteristics for the entire population. A
sample survey will usually be less expensive than a census survey and the
desired information will be obtained in less time.
When and Where Sampling/Census is Appropriate:
A sampling technique is appropriate
» When the universe is very large
» When the universe possess homogeneous characteristics
» When utmost accuracy is not required
» Where census is impossible i.e. in destructive/explosive nature of
testing.
A census is appropriate when
» The universe is small
» The population is heterogeneous
» Hundred percent accuracy is required
» The population frame is incomplete
Demerits of Sampling Technique:
» Less accuracy.
» Misleading conclusions.
» Need for specialized knowledge.
» When sampling is not possible.
» It cannot be used if the information of each and every unit of the
population is required.
Demerits of Census:
» Expensiveness
» Excessive time and effort.
» Not applicable for destructive testing.
» For infinite population, it is not possible.
232 Inferential Statistics

Distinction between Census and Sampling


Census Sampling
1. It takes each and every unit of the 1. It takes representative part of the
population. population.
2. It is costly and time consuming. 2. It is less expensive and less time
i.e. It requires more money, man- consuming. i.e. It requires less
power and time. money, manpower and time.
3. It is more accurate. 3. It is less accurate.
4. It has less scope. 4. It has less a greater scope.
5. It is appropriate when the uni- 5. It is appropriate when the uni-
verse is small. verse is large.
6. It is appropriate when the popula- 6. It is appropriate when the uni-
tion is heterogeneous. verse possess homogeneous
characteristics.

Objective of Sampling
The important and basic objectives of sampling are as follows:
1. Sampling can save time and money, resources
2. Sampling may be the only way for the universe having infinitely many
members.
3. Sampling may be the only choice of the test involves the destruction of
the item under investigation. If the characteristic of interest is relating to
highly sensitive, destruction of health hazards, complete enumeration
is impossible. So, we use sampling process.
4. Sampling process only possible for inferential drawing with limitation
of the different things.
Sampling Techniques
The method or process of selecting a sample from a population under study
is called sampling. There are various methods of sampling techniques used
for the selection of sample from the population. The method of selecting
a sample from the population is of fundamental importance in sampling
theory and depends very much on the objective and scope of the inquiry
or investigation , nature of the universe or population, the size of the
sample, available resources etc. Sampling methods can be classified into
the following broad categories.
a. Random sampling (Probability Sampling)
b. Non random sampling (Non- Probability Sampling)
c. Mixed sampling
a. Random sampling (Probability Sampling): Random sampling or
probability sampling is the scientific method of selecting samples
from the population according to some laws of chance in which each
and every unit in the population has some definite pre-assigned
probability of being selected in the sample. Probability samples are
selected in such a way so as to be representative of the population. The
various types of sampling are in which:
Measurements in Development 233

(i) Each sample unit has an equal chance of being selected.


(ii) Sampling units have different probability of being selected.
(iii) Probability of selection of a unit is proportional to the sample size.
Types of Random sampling are discussed as below:
1. Simple Random Sampling: The random sampling in which the units are
selected in such a way that each and every unit in the population has
equal and independent chance of being selected from the population. It is
the simplest and most common method of sampling in which the sample
is drawn unit by unit, with equal probability of selection for each unit at
each draw. This is also known as the equal probability sampling. It is the
most elementary random sampling.
If a sample is to be taken at random, the following two methods can be
used:
(i) Lottery method (ii) Use of table of random numbers.
(i) Lottery method: It is the simplest method of simple random sampling.
Suppose one is interested to select ‘n’ units out of ‘N’. Distinct numbers
from 1 to N are assigned for each unit and these numbers are written
on ‘N’ homogenous slips. These slips are put on a box or a bag and
mixed thoroughly. Then ‘n’ slips are drawn one by one.
(ii) Use of table of random numbers: Another easiest way of selecting
samples is through the use of random number tables. These random
numbers are generally generated by computer or by a table of random
numbers. Tippet’s random number tables, Fisher and Yates tables,
Kendall and Smith’s tables are some of commonly used random
number tables. This method can also be considered as better than the
lottery method since lottery method is time consuming.
There are two types of simple random sampling (SRS)
» SRSWOR: If the unit selected in any draw is not replaced in the
population before making the next draw, then it is known as simple
random sampling without replacement (SRSWOR).
The probability of selecting (drawing) of a sample of size n from a
population of size N in SRSWOR is . NCn ways.
» SRSWR: If a unit is selected and noted and then returned back or
replaced back in the population before making the next draw, then
the sampling procedure is called simple random sampling with
replacement (SRSWR).The probability of selecting (drawing) of a
sample of size n from a population of size N in SRSWR is . Nn ways
Merits
» Each item has equal chance of being selected. So, it depends upon
the chance but not on the personal judgment.
» This method is quite economic and saves time and money as
compared.
234 Inferential Statistics

» It is more representative of the population as compared to the


judgment or purposive sampling.
Demerits
» Expensive and time consuming especially when the population is
large.
» If the population is heterogeneous in nature, it may not give
accurate results.
» It requires completely up-to-date list of population units from
which samples to be drawn.
2. Stratified Sampling: Stratified random sampling is used when the
characteristics of elements of population are heterogeneous. In a
stratified random sampling approach, a population is first divided into
subpopulations or strata, based upon one or more classification criteria
in such a way that the characteristics of units are homogeneous within
strata and heterogeneous between strata. Then sample is drawn from
each stratum using simple random sampling method. These samples are
combined (pooled) to form the required sample of the population. Thus, the
procedure in which first stratification and then simple random sampling is
known as stratified random sampling. The number of units to be sampled
from each stratum depends on its size relative to the population.

For example, if we want to study about the examination results of 2000
students of M.M .campus NPJ. At first divide the no. of students into
different faculties such as management, humanities and science. If the
number of students in management, humanities, and science are 800,
700 & 500 respectively. Then we select 20% of each for the samples i.e.
selecting samples consisting of 160, 140 & 100 students from each faculty
respectively. The samples taken from each stratum are then pooled
(combined) to form the overall sample (required sample). Similarly, in
survey of average crop yield per hectare etc.
A stratified sample is a mini-reproduction of the population. Before
sampling, the population is divided into characteristics of importance
for the research. For example, by gender, social class, education level,
religion, etc. Then the population is randomly sampled within each
category or stratum.
A stratified sampling may be either
(i) Proportionate (ii) Disproportionate
In a proportionate stratified sampling, the number of items drawn from
each stratum is proportional to the size of the strata. On the other hand,
if an equal number of items are drawn from each stratum regardless of
how the stratum represented in the population (universe), it is called
disproportionate sampling.
Measurements in Development 235

Merits:
» The units selected represents whole universe.
» The estimation of population parameters is more efficient.
» For large and heterogeneous population, stratified sampling is the
best design.
Demerits
» This method requires more time and cost.
» If each stratum of the population be not homogeneous the result
obtained may not be reliable.
» The samples from each stratum should be selected only by the
experts or experience persons.
3. Systematic Sampling: A random sampling in which only the first unit
is selected at random and the remaining units are automatically selected
according to pre-determined pattern (i.e. at fixed equal intervals from
one another) is known as systematic sampling. Systematic sampling is
a commonly used technique if the complete and up-to-date sampling
frame is available i.e. complete and up-to-date list of sampling units is
available.
Suppose N units of the population are numbered from 1 to N in some
order. Let N = n k where n sample size and K is an integer known as

sampling interval. Thus, k = N . Systematic sampling is a statistical


n
method involving the selection of elements from an ordered sampling
frame. This is random sampling with a system. From the sampling
frame, a starting point is chosen at random, and choice thereafter
are at regular intervals. For example, suppose you want to sample 8
houses from a street of 120 houses. 120/8=15, so every 15th house is
chosen after a random starting point between 1 and 15. If the random
starting point is 11, then the houses selected are 11, 26, 41, 56, 71, 86,
101, and 116. This sampling is mostly used in forest surveys, fisheries
surveys etc.
Merits
» This method is simple and convenient to use.
» In selecting the sample by this method, it takes less time and
labour.
» Most of the results obtained from this method are satisfactory.
» If the complete list of the population is available and if the items
are arranged systematically, this method is more efficient.
236 Inferential Statistics

Demerits
» The sample selected may not be the representative of the population
in some cases.
» The items of the population must be arranged in some order
otherwise the result obtained will be misleading
4. Cluster Sampling: A Cluster sampling is a method or technique of
random sampling in which the population is divided into different
groups, called clusters, in such a way that the characteristics of units
within the cluster are heterogeneous and between the clusters are
homogenous so that the number of sampling units in each cluster
should be approximately same. Then a cluster is selected as a sample
by using simple random sampling. This method of random sampling
is called cluster sampling. In cluster sampling individual clusters
are representative of the population as a whole. For example, Let us
suppose that we want to study about the economic condition of people
of Kathmandu metropolitan city. First of all, metropolitan city is divided
into different wards in such a way that the economic condition of people
within ward is heterogeneous and between wards are homogeneous.
Then, a ward is selected as sample by using simple random sampling
method and we can study about the economic condition of people in
Kathmandu metropolitan city.
Merits
» It is less costly than simple random sampling and stratified
sampling
» It is useful even when the sampling frame of elements may not be
available.
» Elements (units) selected by well-designed cluster sampling
procedures in easier, faster cheaper and more convenient than
simple random sampling and stratified sampling.
Demerits
» The efficiency decreases with increase in cluster size.
» The efficiency cost per unit may be more in cluster sampling.
» Enumeration of the sampling units within the selected clusters is
difficult when the population is large.
5. Multistage Sampling: Multi-stage sampling is a further development
of the principle of cluster sampling. Multi-stage sampling is also a
random sampling in which sampling procedure is carried out i.e.
done in various stages. At first stage, the population is divided into
large sample units i.e. large groups called first stage units (fsu) or
primary stage units and cluster is selected as the sample at random
from them. At the second stage the selected clusters at the first
Measurements in Development 237

stage are further divided or sub-sampled into smaller sample units.


The method selecting only some of the units of selected cluster is
called two –stage sampling. And if it is generalized into third stage
or more stages until get to ultimate units of sample size is known
as multi-stage sampling. Multistage sampling is a complex form of
cluster sampling. Although cluster sampling and stratified sampling
bear some superficial similarities, they are substantially different. In
stratified sampling, a random sample is drawn from all the strata,
where in cluster sampling only the selected clusters are studied,
either in single- or multi-stage.
For example, in crop surveys for estimating yield to a crop in a district,
VDC can be considered as primary sampling unit (psu), the villages as
the second stage units , crop fields as third stage units and a plot of fixed
size as the ultimate unit of sampling.
Merits:
» It is more convenient when area of investigation is very large.
» It is commonly used in large scale survey.
» This method is also more flexible than other sampling methods.
» As sample size is reduced in each stage, this sampling technique
saves time and cost.
Demerits:
» If the samples are not carefully taken from the different stages, it
may give the faulty result.
» In this method, there is high chance of occurring sampling error
when the selected sampling units are decreased.
b. Non random sampling (Non- Probability Sampling): Non-random
sampling is the method of selecting samples, in which the choice
of selection of sampling units depends entirely on the discretion or
judgment or convenience, beliefs, biases of sampler or investigator.
This method is mainly used for opinion surveys but cannot be
recommended for general use as it is subject to the drawbacks of
prejudice and bias of the investigator. This is method of selecting
of selecting samples in which the choice of selection of sampling
units depends entirely on the discretion or judgement of sampler or
investigator. This method is mainly used for opinion surveys but can
not be recommended for general use as it is subject to to the drawbacks
of prejudice and bias of the investigator. However, if the researcher is
experienced and expert , it is possible that judgement sampling may
yield useful results. However, this method suffers from a serious
defect that it is not possible to compute the degree of precision of
estimate from the sample values.
238 Inferential Statistics

Types of Non-random sampling or Non-probability sampling are as


follows:
1. Judgment sampling: A sampling method, in which the choice
of sample items depends entirely upon the judgement of the
investigator, is called judgement sampling. In this method of
sampling, the choice of sampling items depends exclusively on
the judgement of the investigator.
In other words, the investigator uses self judgment in the
choice and includes only those items of the universe which
are conveniences to the investigator. It is the method for quick
decision.
For instance, if we want to study of corruption in Nepalese
society, we can select a sample of twenty of the senior professor
of T.U. to give their opinion on the subject. We consider that the
judgement of these professors is much superior to a convenience.
Then we can get the desired information.
Merits:
» It is the simple method of sampling for quick decision.
» It gives the better result when sample size is small.
Demerits:
» It gives unreliable conclusion if the investigator is personally
biased.
» Though simple, the method is not scientific and it is not in
general use.
» Sampling error can not be estimated because it is not based
on random sampling.
2. Convenience sampling (Accidental sampling): A sampling
method, in which the researcher selects the sample neither
by probability nor by judgment but by convenience, is called
convenience sampling. It is also called the chunk sampling.
Selection of sampling units is totally based on the convenience of
the researcher. The results obtained by this method can hardly be
representative of the population. They are generally biased and
unsatisfactory. However, convenience sampling is often used for
making pilot studies. For instance, if any one wants to conduct
‘ man-on-the-street’ interviews, he/she stands up in corner of
the street and interviews the desired number passers-by. Then,
required information can be obtained.
Merits:
» It is useful for making pilot studies.
» It is the simple method of sampling for quick decision.
Measurements in Development 239

» When both time and money are limited, convenience


sampling is widely used i.e. it is less expensive and less time
consuming.
Demerits:
» The results obtained by this method can hardly be
representative of the population.
» Sampling error cannot be estimated because it is also not
based on random sampling.
3. Quota Sampling: A non-random sampling method in researcher
are given quotas to be filled from the different strata and
within pre-assigned quotas, the process of drawing selecting
the required samples from these strata by judgment sampling
is called quota sampling. Quota sampling is a type of judgment
sampling. Sample quotas may be fixed according to some
specified characteristics such as income group, sex, occupation,
political or religious affiliation etc.
For instance, for the comment about the fiscal year budget in
radio listening survey, the interviewer may select as a sample
of 50 persons choosing from different areas such as 20 officials,
10 Professors, 10 businessmen, 5 farmers and 5 students. Here,
interviewer is free to select the people to be interviewed for the
comment.
Merits:
» It saves time and money rather than other sampling
methods.
» It is stratified–cum-purposive so investigator enjoys the
benefits of both.
Demerits:
» It may biased because of the personal believes and prejudices
of investigator.
» Sampling error cannot be estimated because it is also not
based on random sampling
4. Snowball sampling or Network sampling: Snowball sampling
is a special type of non-probability sampling and is used
when the desired sample characteristic is very rare. Therefore,
this sampling design is widely used in applications where
respondents are difficult to identify and are best located through
referral networks. Hence, this sampling is also known as chain
referral sampling or network sampling. In this sampling, an
initial group is discovered and then subsequent respondents,
possessing similar characteristics are identified based on referrals
provided by the initial respondents.
240 Inferential Statistics

This sampling is particularly used to study drug cultures (use),


teenage gang activities, prostitution study, political activities,
illegal activities etc.
Merits:
» This method is cheap, simple and cost –efficient.
» It is useful for rare populations for which no sampling
frames are readily available.
» The chain referral process allows the researcher to reach
populations that are difficult to sample when using other
sampling methods.
Demerits:
» It is difficult to apply when the population is large.
» It does not ensure the inclusion of all elements in the list
c. Mixed Sampling: If the samples are selected partly according to some
laws of chance and partly according to a fixed sampling rule (i.e. no
assignment of probabilities) they are termed as fixed samples and the
technique of selecting such samples is known as mixed sampling.

Sampling Error and Non-sampling Error


An ‘error’ refers the difference to the difference between the true value of a
population parameter and its estimate provided by corresponding value of
a sample statistic.
The inaccuracies or errors in any statistical investigation involved in
collection, processing and analysis of the data may be broadly classified as
follows:
(i) Sampling error (ii) Non-sampling error

Sampling Error
Sampling errors arise due to the fact that when a part of population (i.e.
sample) has been used to estimate the population parameters and draw
inferences about the population. So, the sampling errors are absent in
complete enumeration survey or census. The magnitude of sampling error
depends upon the nature of the population and size of the sample. If the
sample size increases, the sampling error will decrease.
Sampling error are occurred due to the following reasons:
» Faulty selection of the sample.
» Substitution of convenient unit of the population.
» Faulty demarcation of sampling units.
» Improper choice of the statistics for estimating population parameter.
Measurements in Development 241

Sampling size

X
Sampling error

Figure: Relationship of Sampling error with Sample size


Non Sampling Error
Non–sampling errors arise at the stages of planning, observation,
ascertainment and processing of the data and are thus present in both the
complete enumeration survey (and census) and sample survey. Thus, the
data obtained from a complete census are free from sampling errors but
may not free from non-sampling errors whereas data obtained from the
sample survey would be both sampling and non-sampling errors. So, non-
sampling errors are more serious in census survey with compared to a
sample survey.
Non sampling errors arise from the following factors:
» Faulty planning or definitions
» Response errors
» Non –response errors
» Compiling errors
» Publication errors
» Coverage error

Sample Size
Some units selected from the population is known as sample and number of
such unit is known by sample size. The sample size is directly proportional
to the level of accuracy and scatterer of data and inversely proportional
to the permissible error and difference between sample statistic and
population parameter.

Factors Affecting Sample Size


The size of sample depends upon different factors. They are:
1. Nature of population (variability)
2. Number of classes
3. Nature of the study
4. Types of sampling
5. Degree of accuracy
242 Inferential Statistics

Estimation of sample size by


2
 σ σX / 2 
h= 
 E 
 
Where,
σ = SDof population
Zα / 2 = SNV
E = Permissable between

Parameter and Statistic


Parameter
Parameters are the functions of population values. In other words, the values
that describe the characteristics of a population are called parameters. The
statistical constants of the population like population mean, population
variance, skewness etc are parameters. Hence, a parameter is a characteristic
of a population.

Statistic
Statistics are the functions of the sample observations. In other words, the
statistical measures or constants which are calculated from the sample
data are called statistics. The statistical constants of the samples like
sample mean, sample variance, sample skewness etc are statistics. Hence,
a statistic is a characteristic of a sample and it is used to estimate the value
of population.
Parameter Statistic
Population size(N) Sample size (n)
Population mean (µ ) Sample mean x ()
Population standard deviation (σ ) Sample standard deviation (s )
Population proportion (P ) Sample proportion (p)
Population correlation coefficient (ρ ) Sample correlation coefficient (r)
Population coefficient of skewness β 1 Sample coefficient of skewness (b1)
Population coefficient of Kurtosis Sample coefficient of kurtosis (b2) etc.
(β 2 ) etc.


Measurements in Development 243

3.2 Sampling Distribution

Sampling Distribution
Sampling distribution may be defined as the probability distribution of
a given statistic based on a random sample. Sample statistic is a random
variable because every random variable possesses a probability distribution;
each sample statistic possesses probability distribution. It describes the
range of all possible values that the sampling statistic can take.
Let us take the sample size is 2 and population size is 5. Then number
possible samples of size 2 can be drawn from a finite population of size 5 is
C(5,2) =10. We may compute the mean and the standard deviation for each
sample. By computing the mean and standard deviation for each of these
samples, we would quickly observe that the mean and standard deviation
of each sample would be different. The probability distribution of sample
mean i.e.10 sample means, form a distribution is known as the sampling
distribution of the mean.

Properties of Sampling Distribution of the Mean


1. Mean of sampling distribution of sample mean is equal to the population
mean, [E(X) = µ] (unbiased estimator)
2. Variance of sampling distribution of sample mean is less than population
mean.
3. The graph of sampling distribution of sample mean is nearly normal
and if sample size increasing then normality is possible. So, it follows
the central limit theorem.

Standard Error (S.E.)


The standard deviation of the sampling distribution of sample statistic is
known as its standard error (S.E.) of the statistic. Thus, the standard error
of statistic t is given by

S.E. (t) = Variance(t ) =


1
n
( )
∑ t −t
2

The standard deviation of the distribution of sample mean is called the


standard error of the sample mean. It is denoted by σ x or S.E. (X) .
Standard error of the mean is a measure of dispersion of the distribution of
244 Inferential Statistics

sample mean. Similarly, the standard deviation of the sampling distribution


of proportion is called the standard error of the proportion. The standard
error (S.E.) of some well-known statistics are given in the following table.
If n is the sample size, σ is the population standard deviation, P is the
population proportion.
Statisticians like to use the word standard error of the mean instead of
"Standard deviation of the distribution of sample means" or standard on
instead of "Standard deviation of sample proportion" because the word
standard error conveys a specific meaning. For example, the standard
error of mean measures the extent to which we expect the means from
the different samples to vary because of this chance error in the sampling
process.
The standard error indicates not only the size of the chance error that has
been made, but also the accuracy we are likely to get, if we use a sample
statistic to estimate a population parameter. A distribution of sample means
that less spread out (that has a small standard error) is a better estimator
of the population mean than a distribution sample means that is widely
spread and has a larger standard error.
The standard errors of some well known statistic, where the sample size σ
is the population standard deviation, the population proportion, n1 and n2
are the sizes of two samples are given in the following table.

EXAMPLE 1:
A population consists of five numbers 1, 3, 5, 7 and 9
(i) Enumerate all possible sample of the size two which can be
drawn from the population without replacement.
(ii) Calculate the mean and variance of the population.
(iii) Show that the mean of the sampling distribution of the sample
means is equal to the population mean.
(iv) Calculate the variance of the sampling distribution of sample
mean.
(v) Standard error of mean
Solution:
Here, N = 5, n = 2

i. Number of possible sample of size 2 can be drawn from the


population without replacement NCn = 5C2 = 10
Thus the possible samples are: (1, 3), (1, 5), (1, 7), (1, 9), (3, 5), (3,
7), (3, 9), (5, &), (5, 9), (7, 9)
Measurements in Development 245

ii. Calculation of population mean and population variance:


Y Y– Y (Y – Y )2
1 –4 16

3 –2 4

5 0 0

7 2 4

9 4 16
∑Y = 25 ∑(Y – Y )2 = 40

Population mean = Y = ΣY = 25 = 5
N 5

( )
2
Σ Y−Y
Population variance = = 40 = 8
N 5
iii. Calculation of sample means and variance of the sampling
distribution of means:

Sample Sample
no. values (y) Sample means ( y) y−y ( y − y )2
1 (1, 3) 2 –3 9
2 (1, 5) 3 –2 4
3 (1, 7) 4 –1 1
4 (1, 9) 5 0 0
5 (3, 5) 4 –1 1
6 (3, 7) 5 0 0
7 (3, 9) 6 1 1
8 (5, 7) 6 1 1
9 (5, 9) 7 2 4
10 (7, 9) 8 3 9
50 30

y
y ) = ∑=
50
Mean of the sample means ( = 5
10 10
Since the mean of the sample means ( y) = 5 equal to the
population mean Y = 5, we conclude that the mean of the
sampling distribution of the sample means is equal to the
population mean.
iv. Variance of the sample means is
246 Inferential Statistics

2
1   30 3
=V(y) =∑ y – y 
n   10
v. The standard error of sample mean is given by
S.E.(y)
= V(y)
= 3 1.732
=

Sampling Distribution Standard Error


Meas (x) σ
n
Proportions (p)
PQ
n
Standard deviation (S) σ
2n
Sample median 1.2533σ
n
Sample quartiles 1.3626σ
n
Sample correlation coefficient (r) 1– ρ2
n

Difference of two means x1 – x2  σ 12   σ 22 


  +  
 n1   n2 
Difference of two standard deviations
(s1–s2)  σ 12   σ 22 
  +  
 2n1   2n 2 
Difference of two proportions (p1–p2)
 P1Q1 P2Q 2 
 + 
 n1 n2 

σ N–n
S.E. (x) = For infinite population.
n N–1

PQ N–n
S.E. (p) = For finite population.
n N–1

Application of Standard Error in Hypothesis Testing


Standard error has important and useful role in the inferential and large
sample theory and forms on the basis of estimation.
Measurements in Development 247

1. To determine the precision and reliability of the sample estimate


1
Precision of X =
SE(X)
2. To test if the sample statistic difference significantly from the population
parameter.
Difference X – µ
Z= =
SE(X) SE(X)
3. To obtain the point estimate of the population parameter.
4. To determine the interval estimation of the population parameter with
in certain level of significant.
Example 2:
A population consists of 5 members 2, 4, 6, 8, 10.
i. Enumerate all possible samples of size 2 which can be drawn
from the population without replacement.
ii. Find the population mean and population variance.
iii. Find the mean of sampling distribution of means and show that
it is equal to the population mean.
iv. Find the variance of sampling distribution of means and verify
σX2 = V(x) .
v. Find the standard error of means.
Solution:
i. Here N = 5, n = 2
Possible no. of samples which can be drawn from the population
without replacement.
N 5 5!
= =cn = c2 = 10 ways. The possible sample are,
2! 3!
i.e. (2, 4) (2, 6) (2, 8) (2, 10) (4, 6) (4, 8) (4, 10) (6, 8) (6, 10) (8, 10)
ii. Calculation of population mean and population variance.
X (x – x ) (x – x )2
2 –4 16
4 –2 4
6 0 0
8 2 4
10 4 16
Σx = 30 Σ(x – x )2 = 40
x 30
∑=
So, population mean = µ = = 6
n 5
2
Population variance = σ 2 =
∑ (x – x)= 40
= 8
n 5
ii. Calculation of mean and variance of sampling distribution of
mean.
248 Inferential Statistics

Sample values Sample means (Y – Y)2


Sample No. (Y – Y)
(Y) (Y )
1 (2, 4) 3 –3 9
2 (2, 6) 4 –2 4
3 (2, 8) 5 –1 1
4 (2, 10) 6 0 0
5 (4, 6) 5 –1 1
6 (4, 8) 6 0 0
7 (4, 10) 7 1 1
8 (6, 8) 7 1 1
9 (6, 10) 8 2 4
10 (8, 10) 9 3 9

∑ Y = 60 ∑ (Y – Y)2 = 30
Y 60
∑=
Mean of sample means = Y = = 6
10 10
∴ Mean of sample means = population mean = 6, i.e. E(x) = µ .

2
iv. Variance of sample means = σ= V(Y)
=
∑ (Y – Y) 30
= = 3.
y
10 10
2
= σ=
Variance with formula y V(Y)
σ2  N–n 8 5–2
= =    =3
n  NÝ  2  5 – 1 
So, σ y2 = v(Y)verified.
v. S tandard error of mean
= S.E. (x) = V(x) = 3 = 1.732
Example 3:
A population consists of three number 1, 3 and 5.
i. Enumerate all possible samples of size 2 which can be drawn
from this population with replacement.
ii. Find mean and variance of the population.
iii. Find the mean of the sampling distribution of means and show
that it is equal to the population mean.
iv. Find the variance of sampling distribution of means and also
σ2
verify with the formula v(x) = .
n
Where, σ = population variance
2

n = sample size.
v. Find the standard error of mean.
Solution:
i. No. of possible sample of size 2 which can be drawn from this
population with replacement = Nn = 32 = 9.
Possible samples are:
(1, 1) (1, 3) (1, 5) (3, 1) (3, 3) (3, 5) (5, 1) (5, 3) (5, 5)
Measurements in Development 249

ii. Calculation of population mean and variance.


X (X – x ) (X – x ) (x – x )2 (x – x )2
1 –2 4
3 0 0
5 2 4
Σx = 9 Σ(x – x )2 = 8
Population mean= x=
∑ x= 9= 3
n 3
Population var iance=
∑ (x – x)2= 8= 2.66
n 3
iii. Calculation of mean and variance of sample means
Sample values Sample means (Y – Y)2
Sample No. (Y – Y)
(Y) (Y )
1 (1, 1) 1 –2 4
2 (1, 3) 2 –1 1
3 (1, 5) 3 0 0
4 (3, 1) 2 –1 1
5 (3, 3) 3 0 0
6 (3, 5) 4 1 1
7 (5, 1) 3 0 0
8 (5, 3) 4 1 1
9 (5, 5) 5 2 4

∑ Y = 27 ∑ (Y – Y)2 = 12
Mean of sample means
∑=Y 27
Y
= = 3
9 9
So, mean of sample mean = population mean i.e. E(x) = µ .
iv. Variance of sample means

2
σ= V(x)
=
∑ (Y –Y) 12
= = 1.33
y
9 9
Variance with formula,
2 σ2 2.66
σ=
y = = 1.33
n 2
σ2
Hence, σ y2 = verified.
2
S tandard error of mean
= v(x)
= 1.33
Example 4:
From a population of 520, a sample of 25 units is taken. If the
population standard deviation is 1.5, find the standard error of the
sample mean when the sample is taken (i) without replacement (ii)
with replacement.
250 Inferential Statistics

Solution,
i. Sampling without replacement
σ N–n 1.5 520 – 25
S.E. (x)
= = = 0.293
n N–1 25 520 – 1
ii. Sampling with replacement
σ 1.5
S.E. =
(x) = = 0.3
n 25

Example 5:
A simple random sample of 25 apples drawn without replacement
from a lot of 300. If the number of bad apples in the lot is 15, find the
standard error sample proportion of bad apples.
Solution:
Given, n = 25; N = 300
P = Population proportion of bad apples,
15 1
= = ,so that
300 20
1 19
Q 1=
= – p 1 –=
20 20
We have,
Standard error of proportion (without replacement)

PQ N – n
= .
n N–1
1 19
.
= 20 20 . 300 – 25
25 300 – 1
= 0.042.

Exercise 3(A)

THEORETICAL QUESTIONS
1. What is sampling? Describe the various methods of sampling with
their merits and demerits.
2. Distinguish between simple random sampling and judgment
sampling.
3. What is meant by the sampling distribution of a statistic?
4. Differentiate between probability and non-probability sampling.
5. What is meant by the standard error?
Measurements in Development 251

6. What is estimation? Differentiate between estimators and estimates.


7. How does the census method of collecting the data differ from the
sample method?

Numerical and Practical Problems


8. Construct a sampling distribution of x from the population are 1, 2, 3,
4, 5, and 6 taking a sample of size 2.
9. A population consists of 4 members 1, 2, 3 and 6.
i. Write down all possible samples of size 2 that can be drawn with
replacement.
ii. Construct a sample distribution of mean
iii. Compute the standard error of mean.
10. A population consists of 5 numbers 1, 3, 5, 7 and 9.
i. Enumerate all possible samples of size 2 which can be drawn from
the population without replacement.
ii. Find the mean of variance of the population.
iii. Find the mean of the sampling distribution of means and show
that it is equal to the population mean.
iv. Find the variance of sampling distribution of means and also
σ2  N–n
σX2 V(x)
verify with the formula= =  .
n  N–1 
v. Find the standard error of means. [Ans: 1.732]
11. A population consists of four numbers 2,5,8 and 1.
i. Enumerate all possible samples of size 2 which can be drawn from
this population with replacement.
ii. Find mean and variance of the population.
iii. Find mean of the sampling distribution of means and show that it
is equal to population mean.
iv. Find the variance of sampling distribution of mean and also verify
with the formula σX2 = V(x) .
v. Find the standard error of mean. [Ans: 1.94]
12. A sample of size 25 is drawn from a finite population consisting of 150
units. If the population standard deviation is 10. Find the standard error
of sample mean when the sample is drawn (i) Without replacement (ii)
with replacement. [Ans: (i) 1.832 (ii) 2]
13. A simple random sample of size 20 is drawn without replacement
from a finite population of 75 units, if the number of defective units in
the population is 12, find the standard error of the sample proportion.
[Ans: 0.071]


252 Inferential Statistics

3.3 Point Estimation

Estimation
The general areas in statistical inference are estimation and testing of
hypothesis. The statistical technique of estimating unknown population
parameters from corresponding sample statistic is known as estimation.
The main objective of estimation is to obtain a guess or estimate of
the unknown true value from the sample data or past experience. In
estimation, the sample statistics are used to estimate the population
parameters. For example, a campus chief makes the estimates of
enrollments next year in BBS from current enrollments in the same
courses. A businessman estimates his future sales and profits from the
past records. In each case, someone is trying to draw conclusion about a
population from sample information. In our daily life we use estimation
by knowingly and unknowingly.
"Statistics is the science of error" By M.L. Singh. Statistics is the science
of estimation and probability." By Boddington
The main objective of estimation is minimize the error and control the
variance by estimating population parameter.
"Theory of estimation" R.A. Fisher in 1920.
We use the estimation in different sector of society, planning and
development, working physiology, production etc.

Estimators and Estimates


A sample statistic used to estimate a population parameter is called an
estimator. The sample mean (X) is an estimator of the population mean
(µ), the sample standard deviation (S) is an estimator of the population
standard deviation and the sample proportion (p) is an estimator of the
population proportion (P).
A specific observed numerical value of estimator is called an estimate. In
other words, an estimator is a specific observed value of a statistic. An
estimator is a formed by taking sample and computing the value taken by
Measurements in Development 253

estimator in that sample. For example, a random sample of 100 students


of Nepal Commerce campus gave a mean weight of 58 kilograms with
standard deviation of 4 kilograms, then if we use these specific values
to estimate the mean weight and S.D. of whole students, the values 58
kg and 4 kg would be estimates of population mean and population
standard deviation respectively.
The population parameters, sample statistics and estimate are shown in
following table.
Population Sample
Characteristics Estimate
parameters statistic
Size N N 100
Mean µ X 58 kg
Standard
σ s 4 kg
deviation
Proportion P p 0.4
Correlation
ρ r 0.78
coefficient

Types of Estimates
A random sample of a given size is selected from a given population
and then compute statistic which is a characteristic of the sample and
this becomes an estimate of the similar characteristic of the population.
There are two types of estimates about a population namely a point
estimate and an interval estimate.

Theory of estimation 1920 by R.A. Fisher give two type of estimation (i)
Point Estimate (ii) Internal Estimate.

Point Estimate
A single value of a statistic that is used to estimate a population
parameter is called a point estimate. The single computed value is
referred as an estimate whereas an estimator is a rule which tells us
how to compute this value. For example, the sample mean (X) which we
use for estimating the population mean is an estimator of µ and single
numerical value of sample mean is called an estimate of the parameter µ.
Similarly, the sample variance s2 is the estimator of population variance
σ2. Point estimation is most risk-full for decision making.
254 Inferential Statistics

Criteria of a Good Estimation


A good estimator is one which is as close to the true value of the
parameter as possible. A good estimator must possess the following
characteristics:
1. Unbiasedness (Average method)
2. Consistency (Sample size method)
3. Efficiency (variation method)
4. Sufficiency (Information coverage method)

1. Unbiasedness
If (X) = µ, if the expected value of sample statistics is the population
parameter then the sample estimator is unbiasedness.
E(s)2 = σ
But if E(X) = µ ≠ 0 then it is called bias. If E(X) – µ > 0, then it is
positively biased
If E(X) – µ < 0, then it is called negative biased.
2. Consistency (Sample size method):
An
^θ estimator is said to be consistent estimator if that tends to
have a greater proibability of being close to population parameter
^
as the sample size increases. That is, an estimator θ is said to be
consistent estimator of θ if for every ε > 0. Total probability is 1.
3. Efficiency (variation method)
^
An estimator θ is said to be efficient estimator of θ if it has the
minimum variance then other estimator of θ. That is smaller
the variance of the estimator the greater is the probability that
the estimate will be close to the true value of the parameter.
Such a criterion which is based on the variances of the sampling
distribution of estimators is usually known as efficiency.
4. Sufficiency (Information coverage method)
An estimator is said to be sufficient for a parameter, if it contains
all the information in the sample regarding the Parameter, more
precisely, if T = t(x1, x2, ....., xn) is an estimator of a parameter θ,
based on a sample x1, x2, ....., xn of size n from the population with
density f(x, θ) such that the conditional distribution of x1, x2, ...., xn
given T, is independent of θ, then T is sufficient estimator for θ.
Measurements in Development 255

Interval Estimate
A range of values used to estimate a population parameter is called an
interval estimate. There are two ways for indicating error by the extent
of its range and by the probability of the true population parameter
lying within that range. In this case, the decision maker has a better idea
of the reliability of his estimate. For example, if the average salary per
month of all professors of T.U. lies between Rs. 30,000 to Rs.40,000, then
T.U. management can estimate the exactly salary that fall within this
interval. Such estimate is an interval estimate.

Rejection region Acceptance region Rejection region


1-α 1-α 1-α
α 2 2 α
2 2

± Zα
d1 d2

P(d1 < θ < d , for given value of θ)


2

The interval [d1, d2] within which the unknown value of parameter θ
is expected to lie is known as confidence interval (Neymen) or fiducial
interval (R.A. Fisher), the limits d1 and d2 so determination of two
constant d1 and d2 say such that 1 – α called the accepted region, α is
called rejection region.

Confidence Level
While making interval estimation we have to determine the level of
our confidence, the percentage of population mean lying within the
interval which is called the confidence level setting the confidence
level as 95% means out interval includes 95% of the true population
mean. For large sample, the sampling distribution follow normal
distribution. The value of Z at 95% confidence level is 1.96. The
confidence level can be taken as 90%, as 10 or 99 or any other values
depending upon the need of the population problems.

Significant Level Confidence Level Significant Level


1-α 1-α 1-α
α 2 2 α
2 2

± Zα
d1 d2
256 Inferential Statistics

a. Confidence limits for Estimating population mean for large


sample.
µ=X ± Zα
σ For infinite population (SRSWR)
√n
σ N- n For finite population (SRSWOR)
µ=X ±Zα
√n N-1
The (1 – α) % confidence limits (or interval estimate) for estimating
population mean µ by the sample mean X of th e sampling
distribution of means from large sample are given.

b. Confidence limit for Population Proportion


P = p ± Zα . SE(P)

PQ
= p ± Zα . For infinite population (SRSWR)
n

PQ N- n
= p ± Zα For finite population (SRSWOR)
n N-1

c. Estimation of sample size


It is well known that the larger the sample size, the closer the sample
statistic will be to the population parameter. The main problem of
sampling is to select the sample size. If the sample size is too small, we
may fail to achieve the objectives of our analysis
If the sample size is too large, there is more waste resources for
collecting the sample.
The major problems in the sampling theory is to determine
appropriate sample size sot that the population parameters may be
estimated with a desired degree of accuracy.

i. Sample size for estimating a population mean (µ)


Let X be the sample mean of a random sample of size n and µ
be the population mean. The ideal situation would be that the
sample mean X is an or biased estimate of the population mean
(µ). It the entire population is taken as σ - sample, then X will
be equal to µ and there will be no error in our estimation .
Here X – µ can be a considered as error or deviation = different
between sample statistics and population parameter.
Measurements in Development 257

We have,
X–µ
Zα =
SE (X)
E
Or, Zα = σ
√n
Z∝ σ
Or, √n =
E
2
Z .σ 
∴ n= α 
 E 
Where,
n = Sample size
σ= Population Standard Deviation
E= Minimum permissible error which is the difference
between th e sample mean and population mean
Zα = Significant value or critical value of Z corresponding to
level of ∝ significant.
α= Risk = Level of significance = Reject H0/H0 true
ii. Estimating sample size for Population Proportion
We have,
p−P
Zα =
SE P
E
Or, Zα =
PQ
n

Or, n = PQ .
E
2
Z 
∴ n = PQ  α 
 E 
Where, P = Population Proportion
Q = 1− P
Zα = SNV for Level of Significant
α = Level of Significance
258 Inferential Statistics

List of Formula
1. Confidence interval for population mean or, fiducial limit by R.H.
Fisher
σ
µ = X ± Zα. SE X = X ± Zα ... (i )
n
2. Confidence interval for population protective.
P = P ± Zα .S E(p )
PQ
= P ± Zα . ... (i )
n
3. Sample size for population mean.
2
 Zα σ 
n =  
 E 
4. Sample size for population protective.
2
 Zσ 
n =   PQ
 E 
∝ = Risk for non representation
= Level of significance.
The following table gives common confidence level and their Z -
value.
Level of confidence 50% 68.27% 90% 95% 96% 98% 99% 99.77
Level of significance 50% 31.73% 10% 5% 4% 2% 1% 0.23
£∝ 0.6745 1 1.645 1.96 2.05 2.33 2.575 3
Note: When number reference to the confident level is given than always table
In hot given and general level of significance is 5%
α = 0.05
Zα = 1.96

Example 1:
A sample of 500 bulbs of a company showed an average life of 1400
hours with standard deviation of 30 hours. Obtain 95% and 99%
fiducial limits for population mean.
Solution:
95%
Sample size (n) = 500
Average life (X) = 1,400
Standard deviation (σ) = 30
We have, Zα=1.96 Zα=1.96

σ 30
S tandard error i.e. SE (X) = =
n 500
For 95% fudicial limits = 1.342
Measurements in Development 259

α = (100 – 95)% = 5% = 0.05


Zα = Z0.05 = 1.96
We have, confidence (or fudicial) limits for population mean is
X ± ZαSE(x) = 1400 ± 1.96 × 1.342 = 1400 ± 2.63
Taking –ve sign, we have
1400 – 2.63 = 1397.37
95% fudicial limits for population mean = (1397.37, 1402.63)
 For 99% confidence limits
α = (100 – 99%) = 1% = 0.01
Zα = Z0.01 = 2.576
We have,
Confidence limits for population mean
µ = X ± ZαSE(x) = 1400 ± 2.576 × 1.342 = 1400 ± 3.457
Taking +ve sign, we have (Upper limit of population mean)
µu =1400 + 3.457 = 1403.457
Taking –ve sign, we have (Lower limit of population mean)
µL= 1400 – 3.457 = 1396.543
 The 99% fudicial limits for population mean = (1396.543, 1403.457)
Example 2:
In a sample of 100 students, the average marks is assumed to be
65 with standard deviation of 3 marks. Find the 98% confidence
limits.
Solution: 98%

n = 100, X = 65 σ = 3 1% 1%
σ 3
SE, σ X = = = 0.3
√n √100
i. 99% confidence limit is
µ = X ± Z α (SE)
= 65 ± 2.58 (0.3)
Upper limit of population mean
= 65 + 2.58 (0.3)
= 65 + 0.774
= 65.774
260 Inferential Statistics

Lower limit of population mean


ML = 65 – 2.58 (0.3) = 65 – 0.774
= 64.226
Limit of population mean (65.774, 64.226)
Example 3:
A random sample of 50 gave a mean of 7.5 kg and a standard
deviation of 1.5 kg. Find 99% confidence limits for the population
mean.
Solution:
Given, 0.475 95% 0.475

Sample size (n) = 50


Zα=±1.96
Sample standard deviation (s) = 1.5
Sample mean (X) = 7.5
For 95% confidence limits
α = (100 – 95)% = 5% = 0.05
= Zα = Z0.05 = 5% = 1.96
σ s 1.5
SE( X ) = = = = 0.212
n n 50
Now,
95% confined limits for population mean is given by
X + Zα SE (X ) = 7.5 ± 1.96 × 0.212
= 7.5 ± 0.4155
∴ Lower confined limits = α = (100 – 99%)
= 1% = 0.01.
Zα = Z0.01 = 2.576
Now,
99% confidence limits for population mean is given by
X + Zα SE(x) = 7.5 = 7.5 ± 2.576 × 0.212
= 7.5 ± 0.546
∴ Lower confidence limit for population mean = 7.5 0.546 = 6.954
Upper confidence limit for populastion mean = 7.5 + 0.546 =
8.046
Example 4:
From a population of 1000 employees of factory, a sample of 50
employees is selected at random. From this sample, the mean
income is found to be Rs. 6000 and standard deviation of income
is Rs. 500.
Measurements in Development 261

a. Find standard error of mean income of all the workers.


b. Construct 97% confidence interval for mean income of all the
workers.
Solution:
Population size (N) = 1000 0.03 97% 0.03
2 2
Sample size (n) = 50
Sample mean (X) = 6000 Zα=±2.17
Simple standard deviation (s) = 500
n 50
Since , = = 0.05
N 1000
σ N−n S N−n ^
SE( X ) = × = × [ σ = s]
n N −1 N N −1
500 1000 − 50
=
50 1000 − 1
= 68.955

For, 97% confidence interval


Level of significant (α)
= (100 – 97)%
= 3%
= 0.03
1 − α 1 − 0.03
= = 0.485
2 2
From normal table of value of Z at 0.485 is 2.17.
Zα = 2.17
Now,
97% confidence interval for mean incomes of the employees of the
factory is given by
X ± ZαSE(x) = 6000 ± 2.17 × 68.955
= 6000 ± 149.632
Lower confidence limit of the employees = 6000 – 149.632 = 5850.368
Upper confidence limit of the employees of factory = 6000 + 149.632
= 6149.632
Example 5:
A random sample of size 36 from a finite population consisting
101 units. If the population standard deviation is 12.6, find the
standard error of sample mean when the sample is drawn (i) with
replacement (ii) without replacement.
262 Inferential Statistics

Solution:
Given, 95%
5 5
n = 36 2 2

N = ∞, 0 = 12.6

When N = ∞,
σ 12.6 12.6
SE( X ) = = = = 2.1
n 36 6
When , N = 101
σ N−n
SE( X ) = =
n N −1
12.6 101 − 36
=
6 101 − 1
= 1.69

95% confidence interval of population mean


µ = X ± Zα . SE (X)
12.6
= 60 ± 1.96 ×
36
12.6
= 60 ± 1.96 ×
6
∴ Upper limit population mean at 95% confidence level
= 60 + 4.11
= 64.11
Lower limit population mean at 95% confidence level = 55.88
Example 6:
From a population of 540, a sample of 60 individuals is taken. From
this sample, the mean is found to be 6.2 and standard deviation
1.368.
a. Find the estimated standard error of the mean.
b. Construct a 96% confidence interval for the mean.
Solution:
Given, 96%
N = 540 4
2
4
2
n = 60
When N = ∞,

You might also like