0% found this document useful (0 votes)
24 views

Sampling & Sampling Distribution

Uploaded by

rafatanha804
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Sampling & Sampling Distribution

Uploaded by

rafatanha804
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

SAMPLING & SAMPLING DISTRIBUTIONS

Dr. M. Musa Khan


Associate Professor
DEB, IIUC
Population: All possible units or items specified by certain characteristics
under the targeted study area constitute a population.
Generally, the population consists of a large number of living or non-living
units under study. The units or objects of the population vary from survey
to survey in the same region or sphere of activity depending on the aims
and objectives of the study.
One should keep in mind that statistical population is not only the human
population which is usually conceived in literature. It is generally a group
or collection of items specified by certain characteristics or defined under
certain restrictions.
However, the statistical population can be classified into four major
categories. These are (i) Finite population, (ii) Infinite population, (iii) Real
population, (iv) Hypothetical population.
(i) Finite population: If the number of items or units constituting the
population is fixed and limited, it is known as finite population. This
type of population usually consists of existing units. For example, the
workers in a factory, students in a college, etc.
(ii) Infinite population: If the number of items or units constituting the
population is not fixed or infinite, it is called an infinite population. The
populations of stars in the sky, the population of fishes in a sea, the life
time of a bulb etc are some examples.
(iii) Real population: A population constituting the items which are all
present physically is termed as real population. The populations of
products of a factory for a particular time, the population of employees
of a garment, the population of inhabitants in a specific area, etc are
some example of real population.
(iv) Hypothetical population: A population constituting of the units which
have not happen, but likely to happen, is called a hypothetical
population, These types of populations usually result from repeated
trials. For example, the population of outcomes results from tossing a
coin repeatedly, the population of outcomes results from rolling a die
again and again, etc.
2

15.2. Complete Enumeration or Census Method


The reasons why a census must be used are:
i) When every item or units of the population is required to be
considered in the study,
ii) When extreme accuracy of the results of study is needed,
iii) When crucial decision will have to be made on the basis of the results
obtained from the study,
iv) Moreover, if the population size is small and finite, it is easy to
enumerate all units of population.
Census: If the relevant data/information from each and every unit of the
targeted population under enquiry is collected it is called census.

Population census, agriculture census, animal census are some examples


where census or complete enumerations have been done. Bangladesh
census is done every ten year. First census in Bangladesh was done in 1974,
although it was supposed to be done in 1971.

Advantages of census: In spite of having a number of limitations of


conducting a census, one may get some advantages if it is possible to
conduct it. However, we have already mentioned the reasons or situations
when one should perform census for any enquiry. Form that point of view,
the most important two advantages of census are (i) details information can
be obtained for each and every unit of the population and (ii) greater
accuracy can be obtained by using census than that of sample survey.
On the other hand, considering the extreme large effort, money, time and
destructiveness (in case of checking lifetime of tube lights) involved in
carrying out a complete enumeration, the idea of collecting information by
census method may have to be dropped. The choice left to the researcher is
to check the life times through collecting a random sample from the lot,
which is known as sampling method. Thus, unless the information is
required for all units in the domain of study, one can resort to the method,
known as, the sampling method to obtain information or to study the
population.

Sampling
Sampling is the process of selection of individual units of population
starting from the formulation of the objective of the study to the collection
of individual units using appropriate technique.
The intermediate steps include the selecting target population and sampling
units, designing a sampling frame and determining appropriate sample
size. A sampling frame is the complete list of sampling units from which
the sample is to be selected. Examples of sampling frames are the telephone
directories, electoral rolls, list of books in a library, list of students enrolled
in a university, list of schools and colleges in a country, list of the employees
working in a firm, list of workers in a garments factory, etc. Sometimes
these lists are in existence and can be readily obtained from the respective
authority. Sometimes these have to be prepared at an extra cost before
selection of units is done. Effectiveness of sampling mainly depends on the
construction of an appropriate Sampling frame.
The term sampling refers to the process of collection of sample from a
population. This term is sometimes used as a synonym of sample survey
which means studying the characteristics of a population through a sample.
Sample: A sample is a representative part of a population.
Sampling frame: A sampling frame is the complete list of all sampling units
of targeted population. It is necessary to prepare a sampling frame before
sampling is made.
Sampling: Sampling is defined as the total process involving in collection of
sample from a target population for a particular study.
Purpose of sampling: A sample is not studied for its own sake. The basic
objective of its study is to draw inference about the population. In other
words, sampling is only the tool which helps us to know the characteristics
of the universe or population by examining only a small portion of it. Such
values or characteristics obtained from the study of a sample are called
statistics (statistic in singular), while their counterparts in respective
population are called parameters.
Principles of Sampling: The following are two important principles which
determine the possibility of arriving at a valid statistical inference about the
features of population or process:
i) Principle of statistical regularity: The principle is based on the
mathematical theory of probability. According to King ‘The law of statistical
regularity lays down that a moderately large number of items chosen at
random from a large group are almost sure on the average to process the
characteristics of large group’. This principle implies that if a sample is
taken from a population of interest is likely to possess all the features of the
parent population. Thus the random sample is the one in which items are
chosen from a population in such a way that each item has an equal
probability of being selected in the sample. When the term random sample
is used without any specification, it usually refers to a simple random
sample, such a sample would be representative of the population, and only
4

this type of sample would provide fairly accurate characteristics of the


population. For example, to understand the book buying habit of the
students of university, instead of approaching all students, investigator can
talk to a randomly selected group of students to draw the inference about
all students in the university.
ii) Principle of ‘Inertial of large number’: This principle is a corollary of the
principle of statistical regularity and plays a significant role in the sampling
theory. This principle states that, under similar condition, as the sample size
increases, the statistical results are likely to be more accurate and stable. For
example, if a fair coin is tossed a small number of times, we may not get
equal number of heads or tails, as we expect, but if it is tossed a large
number of times, then the chance of getting relative frequency of heads
and tails to be equal would be very high, that means the results would be
very near to 50% heads and 50% tails. Similarly, if it is intended to study the
variation in the hourly production of a machine, say readymade garments,
over a number of years and data are randomly collected for ten or twelve
hours only, the results would reflect large variation in production due to the
deterministic or non-deterministic causes. However, if the data for
production are collected for a large number of hours, say 500, it is quite
likely that a little variation in the aggregate would be found. This does not
mean that the production would remain constant for all hours; rather it
implies that the changes in production of the individual hour will be
counter balanced and reflect small variation in production for the factories
as a whole.
Advantages of Sampling. The advantages of sampling are:
i) Reduced cost
ii) Greater speed
iii) Greater scope
iv) Greater accuracy
Reduced cost: When data are collected only from a small fraction of the
entire population, expenditure is smaller than if the entire group is studied.
Greater speed: The volume of the data to be selected will be obviously
smaller than the size of population. Hence it can be collected, tabulated and
summarized more quickly with a sample in comparison with the total
population. In applied research where urgent answers to the certain
problems are needed, this aspect of sampling for its speed gets an
additional importance.
Greater scope: In studies where a complete enumeration and census of all
units of population are impracticable and the research requires the use of
highly trained personnel or specialized equipment, the choice may be
between collecting the information by sampling rather than abandoning the
research itself. In this case, the surveys using sampling provide greater
flexibility and scope. Sampling is also obligatory in the case where
population is infinite or countable infinite.
Greater Accuracy: With the reduction in the volume of work, personnel of
higher expertise and training can be employed and a more careful
supervision of the field work and processing of the data are possible. Hence
sampling may produce results which are more accurate than those which
could have been obtained through a complete census. Moreover, sampling
is particularly more important in obtaining accurate results about
phenomena which are undergoing rapid changes such as opinions about
political and social issues, affect of price hike in the market, etc.
Census vs. Sample survey: The difference between a census and a sample
survey is tabulated below:
Census Sample Survey
1 It is a study which considers all It is a study which considers a part
units of the population. of units of the population.
2 It is useful when the population It is useful when the population
size is small and finite. size is large and/or infinite
3 It is more expensive and more time It is less expensive and less time
consuming. consuming.
4 If the study is performed with Even if the study is performed with
trained personnel, the results trained personnel, the results
obtained from census may be more obtained from census may not be
accurate and adequate. accurate and adequate.
5 There is possibility of occurrences There is possibility of occurrences
of only non-sampling errors, if any. of both sampling and non-sampling
errors.

Requirements of a good sample. If information from a sample data is to


be generalized to a population, it is essential that the sample should be
representative of that population. A representative sample would be a
miniature in all respect of the population from which it has been drawn. An
adequate sample is one that contains enough cases to insure reliable results.
The basic requirements of a good sample are (i) a good sample should be
representative, (ii) it should be adequate or of sufficient size to allow
confidence in the stability of its characteristic.

Methods of Sampling
When a sample is required to be reflected from a population, it is necessary
to decide which method of sampling should be applied. The various
methods of sampling or sampling designs can be grouped under the heads
6

as random or probabilistic sampling, non-random or non-probabilistic


sampling and mixed sampling. If the sampling process is random, the laws
of probability can be applied, thus, the pattern of sampling distribution
needs to interpret and evaluate the sample. A non-random sample is
selected on the basis of other than probability considerations such as expert
judgment, convenience or some other criteria. The common methods of
sampling are as follows:
(a) Random or probabilistic sampling methods
i. Simple random sampling
ii. Stratified sampling
(b) Mixed sampling method
iii. Systematic sampling
(c) Non-random sampling or non-probabilistic sampling methods
iv. Quota sampling
v. Judgment sampling
vi. Convenience sampling
vii. Snowball sampling
Brief descriptions of some of the sampling methods are provided below:
Simple random Sampling: Simple random sampling refers to the
sampling technique in which each and every item of the population is given
an equal chance of being included in the sample. Thus simple random
sampling is a method of selecting n units out a population of size N by
assigning equal probability to all units, or a sampling procedure in which
all possible combinations of n units that may be formed from the population
of size N have the same chance of being a sample. That’s why it is also
sometimes referred to as unrestricted random sampling.
This method is appropriate when the population size is not too large and
population units are homogeneous with respect to the characteristics of
interest. This method is very easy to use.
If a unit is selected and noted and then returned to the population before
the next drawing is made and this procedure is repeated n times, it gives
rise to a simple random sample of n units and this procedure is called a
simple random sampling with replacement. On the contrary, if this procedure
is repeated until n distinct units are selected and all repetitions are ignored,
then the procedure is called a simple random sampling without replacement.
For example, suppose there are 500 students in a class. We have to draw a
sample of size 50. Let us think that the gender of the students will not
hamper the objective of the study. The following steps are taken to draw a
random sample of size 50:
i) At first collect the list of the students from the academic office of the
School
ii) Assign a three-digit number starting from 001 against each
successive student. Suppose the numbers are 001 to 500.
iii) Then draw 50 numbers following any column or row of random
number table available in any book on sampling. Usually the
repetition of any random number will not allowed, i.e., the sample is
drawn without replacement or any student will not appear in the
sample twice.
iv) Select the students corresponding to the number obtained from the
random number table. These 50 students will constitute the sample
of size 50.
Methods of obtaining a simple random sample: To ensure the randomness
of selection, the following methods are adopted for collection of the data by
simple random sampling method.
Lottery Method: This is a very popular method of taking a random sample.
Under this method, all items of the population are numbered or named on
separate sheet of papers of identical size, colour and shape. The sheets are
then folded and mixed up in an urn. A blindfold selection is then made of
the number of sheets required to constitute the desired size of sample. The
selection of items thus depends entirely on chance. For example, if we want
to take a sample of twelve dealers from the population of 300 dealers of a
company, it is required to write the names of dealers on 300 separate papers
of same size, shape and colour, fold these papers, mix them thoroughly and
then make a blindfold selection of 12 papers, which would provide the
required sample of twelve dealers.
Random number table method: The lottery method is quite burdensome if
the size of population is large. An alternative and the most efficient method
of drawing a simple random sample is using the Table of Random
Numbers. The table of random numbers have been prepared by Kendall
and Smith, Fisher and Yates, and Tippett, and constitute a very convenient
and most objective method of random selection. The table consists of 41600
digits taken from census reports and combined by fours to give 10400 four-
figure numbers. From the members of population already numbered from 1
to N, the required number of units is selected from one of these tables in
any convenient and systematic way. These tables are so prepared that all the
ten number from 0-9 have an equal chance of being selected. If we examine
these tables, we will see that whether we go down a column or across a row,
8

there is no distinct pattern. These numbers are computer generated and are
truly random.
In random number sampling each element of the population is assigned a
number, for example, for a population of size 400, the numbers like 001, 002,
..…, 400 are usually assigned. Once this has been done, one can use the
tables for random sampling. Although the tables of random numbers are
available in most of the books on statistics on sampling theory, for the sake
of explanation, a sample of random numbers is provided below:
3905 9796
0946 9133
0106 6465
1840 9779
7056 3015
9736 5661
9915 5686
5614 7123
5477 6629
5701 8733

Let us illustrate the method with an example. In order to obtain a sample of


10 students from 400 students who have already been assigned numbers
from 001 to 400. Using the first three columns (because our population size
is three-digit) we get the following three-digit numbers 390, 094, 010, 184.
However, in the first column we have only four numbers within the range
001-400, others are ignored as they do not lie in the range 001 to 400. When
the end of the table reached one can start again at the top with the next
three unused digits along the top row (597 in this case). Once we have
selected our sample of 10 numbers, we have to go to the population and
select the corresponding students. Although we started at the top of the table
and read downwards, it is better to start at a randomly selected place in the
table and any direction can be considered for selection of a random sample.
Sampling without replacement: When all the sampling units are
considered as distinct from one another and the unit drawn is not put back
or replaced in the population before another unit is drawn is known as the
sampling without replacement. In this case, there is no possibility of appear
the same unit more than once and the size of population does not remain
the same with every unit drawn. If there are N different units, n units can be
selected from these N units one by one without replacement in NPn ways if
the order of the units are important. On the other hand, if there are N
different units, n units can be selected from these N units one by one
without replacement in N C n ways if the ordered are not important which is
equivalent to the selection of n units at a time from N units. That is, if there
are N distinct units in the population and a sample of size n is to be drawn,
the number of distinct samples of size n that can be drawn from the N units
is given by the combinational formula
N N!
N Cn or   =
 n  n! (N − n)!
Suppose a population contains 4 units denoted by A, B, C and D, if a sample
N  4
of size 3 is drawn from this population, there will be  n  =  3  = 4 distinct
   
samples which are : ABC ABD ACD BCD
Here, the same letter is not repeated in any sample and the order of the
occurrence of the letter has been ignored. For example, ABC and ACB are
identical, and so one is considered in the sample.
The probability of each drawing in case of sampling without replacement is
shown in following table.
Table. Probabilities of different element in sampling without replacement.
Position of elements Number of
Probability
in the sample possible choices
1st N 1/N
2nd N-1 1/(N-1)
3rd N-2 1/(N-2)
. . .
. . .
. . .
(n-1)th N-n+2 1/( N-n+2)
Nth N-n+1 1/( N-n+1)

Two important things need to be noted in sampling without replacement,


firstly, with the drawing of every unit, the possible number of choices goes
on decreasing by one every time, thus implying that every time, we are
sampling from a population of different size. Secondly, the probability of
selection of every unit is different, although known. In a population of 100,
the first unit to be drawn has 100 choices with a probability of 1/100; the
second unit to be drawn has 99 choices with a probability of 1/99, and so
on.
Sampling with Replacement: The random sampling in which case a unit
that has been drawn, is put back or replaced, and can be drawn again, is
called a sampling with replacement or unrestricted sampling. In this case,
the same unit of population has the possibility of reappearing in the sample
more than once. If there are N distinct units in the population and a sample
10

n
of size n is to be drawn, the total number of samples will be N , however,
the number of distinct samples of size n that can be drawn from the N units
is given by the combinational formula:
(N + n - 1)!
n! (N − 1)!

For example, suppose a sample of size 3 is to be drawn from the population


of four units A, B, C and D using the method of sampling with replacement.
(4 + 3 - 1)! 6!
In this case, the number of possible samples will be = =
3!(4 − 1)! 3! x3!
120
= 20 with no two groups being identical, which are listed below:
6x6
AAA BBB CCC DDD AAB AAC AAD ABB BBC BBD
ACC BCC CCD ADD BDD CDD ABC ABD ACD BCD

In both cases, samples are selected using Random Number Table.


However, if we consider all possible occurrences of samples (identical
groups with different positions of sampling units of draw), there will be 4 3 =
64 total samples. For example, AAB can occur in three possible ways AAB,
ABA, BAA, similarly ABC can occur in six possible ways such as ABC, ACB,
BCA, BAC, CAB, CBA and all such arrangements will constitute the 64 (4
for the same sampling units, 12 × 3 = 36 for two same units and one
different unit, 4 × 6 = 24 for all distinct sampling units) possible samples.
Stratified Random Sampling: This is a form of random sampling in
which the population is divided into groups or categories that are mutually
exclusive so that no individual or item can belong to two groups. The
groups are formed with the principle that within the group the items are
homogenous with respect to some criteria (for example, sex, age, religion,
income level, etc) and between the groups they are heterogeneous. These
groups are called strata (in singular- stratum). Within each of these strata a
simple random sample is selected. If the same proportion of each stratum is
taken, then each stratum will be represented in the correct population in the
overall result.
In stratified sampling, the population of N units is sub-divided into k sub-
populations, called strata, the ith sub-population having Ni units (i =1,2,...
k). These sub-populations are non-overlapping so that they comprise the
whole population such that
N1 + N2 + ..................+ Nk = N
In order to draw a sample of size n from the whole population, a sample of
size ni is drawn from ith stratum such that n 1 + n2 + ....... + nk = n. Hence, the
procedure of taking simple random sample from every stratum separately is
known as the stratified sampling.
In case of proportional allocation, a sample is taken from each stratum as
the proportional to the stratum size, i.e. ni  Ni.
For example, suppose in a large class of 500 students, where there are 300
male and 200 female students. So, there are two stratums, which are
mutually exclusive. In order to take a sample of 50 students, we can allot the
sample in each stratum as proportional to stratum size. Thus, 
50 
 300 =
 500 
30 male students out of 300 and 20 female students out of 200 will represent
the students of all gender proportionally and we can be sure of getting a
balanced sample of male and female. Then this type of procedure will be
called a stratified sampling.
Stratified sampling procedure is appropriate when
i) The sample size is large and respect to some criteria and it is
possible to divide the whole population into some mutually
exclusive groups.
ii) It is required to estimate the characteristics for different strata
separately.
The advantages of a stratified random sampling are as follows:
i) Stratification ensures adequate representation of various groups of
population, which may be of some interest or importance,
ii) Stratification also ensures selection of a better cross-section of
population than that under un-stratified sampling,
iii) stratification brings a gain in precision in estimation of a
characteristic of a population when there are clear strata present,
because as a result of stratification, the strata variances will be as
small as possible,
iv) Estimates of population characteristics for different strata can be
obtained separately by this sampling procedure.
However, this method has some inherent disadvantages as well. For
instance,
i) If the strata are not clearly defined or if the stratification can not be
done properly, this procedure may lead to inaccurate results, and
ii) This method is not suitable if population size is small.
Systematic Sampling: Instead of going through the laborious process of
choosing sample randomly from a list that is not necessarily assumed to be
12

random, it would be much simpler if we could select only the first unit
randomly with the help of random numbers, and the rest of the units are
selected automatically or systematically according to some pre-designed
pattern. Then this type of sampling is known as the systematic sampling. In
this case, sample is selected at regular intervals from an ordered list of
sampling units. In order to select a sample of size n from a population of
size N, let N = nk, where n is the number groups and k is the number of
items in each group, then in this method first unit is selected randomly from
first k units randomly listed in the sampling frame. Let this is the rth unit of
first group of k units, and then every rth unit is selected from each of the
subsequent n-1 groups, and finally sample of size n is selected. i.e. (k+r)th
item will be second member of sample, (2k + r)th member will be third
member of sample, and so on.
The procedure of systematic selection is easier and more convenient than
simple random sampling. It provides more even spread of the sample over
the population list and hence leads to a greater precision. The dependence
or linkage of one member of the sample on the previous one makes the
process different from simple random sampling method, in which selection
of every member is independent of the other. That’s why method is
sometimes termed as a Quasi-random sampling or mixed sampling.
This method of sampling is appropriate when the population is too large for
simple random sample, or if a quick sample is to be selected where chance
of being a member of sample for all units is not a matter. It is especially
useful for the population with more or less definite periodic trend. For
example, weekly sales, 12-mothly rainfall, quarterly remittance, etc.
The main advantage of this sampling is it simplicity of selection, operational
convenience and even spread of the sample over the population. The second
advantage is that because of its simplicity of drawing sample, it is very
useful for large samples.
The serious disadvantage of systematic sampling lies in its use with
populations having unforeseen periodicity which may substantially
contribute bias to the estimate of the parameter, or if the list itself is biased
then serious error may arise in estimation. Again, it does not provide with a
random sampling, it is only random if ordered list of population is truly
random.
Quota Sampling: The chief characteristic of simple, stratified and
systematic sampling is that known probability is associated with the
selection of every individual of the sample that means, the sample is
random or quasi-random. Sometimes non-random sampling methods are
also used when it is not possible to use a random sampling, particularly,
when the whole population is not known.
Quota sampling is an example of non-probability sampling. It involves the
selection of sample units within each group or quota, on the basis of the
judgment of interviewers rather than on calculable chance of being included
in it. Interviewer is given considerable freedom in choosing the individual
cases. Quota sampling is a method in which an interviewer is instructed to
interview a certain number of respondents with specific characteristics. The
quotas are selected before sampling takes place and they are chosen so that
they reflect the known population characteristics. Age, sex and social class
are the three universally used quota controls.
It is useful when the number of sampling units is pre-fixed for groups of
population of the same characteristics.
This method is extensively used in opinion survey, for example product
satisfaction opinion, polling opinion, etc. Suppose, a company wants to
know the customers’ opinion regarding the quality of their product, and
decides to take opinion about quality from 100 female and male consumers
of apparently young and old age as per the following table:
Age Group Sex Number
Male 25
Young (Age group 20-40 years)
Female 10
Male 40
Old (Age group above 40 years)
Female 25

The number corresponding to each age/sex group is the quota for


respective group.
The investigator can collect the opinion just by asking an indefinite number
of customers one by one standing in the exit of a market or by house-to-
house survey. In this case, the investigator will start asking the customers
coming out of the market after shopping regardless of the age and sex. Once
the investigator finishes taking information from 25 young male customers,
s(he) will not try to know the opinion of any more young male customer.
For this the investigator has to ask an indefinite number of male customers,
because, some of the customers may not use the particular product or some
may not respond properly. Similarly, the investigator needs to ask the
opinion to an indefinite number of young female customers to fulfill the
quota of 10 females. Same procedure is to be followed in case of other
groups of customers.
Advantage:
14

i) It enables the fieldwork to be done quickly because a


representative sample can be achieved with a small sample size.
ii) Costs are kept to a minimum level
iii) Administration is relatively easy
Disadvantage:
i) It is not possible to estimate the sampling error, because the
process is not a random process.
ii) The interviewer has to choose the respondents and may not be able
to judge the characteristics easily.
iii) Non-responses are not recorded in this method
iv) The process does not allow for an easy supervision of the field
worker, hence the correctness of the data collected remain
doubtful.
Judgment Sampling: In this method of sampling, the choice of sample
items depends exclusively on the judgment of investigator. The investigator
exercises his judgment in the choice of sample items and includes those
items in the sample which he thinks are most typical of the population with
regard to the characteristics under investigation. For example, if a sample of
twenty workers is to be selected from a factory having 100 workers for
analyzing their spending habits, the investigator would select twenty
workers, who in his/her opinion represent the factory.
Advantage:
The only advantage of this type of sampling is that it is very easy to select
sample according to the judgment of investigator. There is no need of
further query for this. One investigator can justify the choice of sampling
units in his own way. However, the success of this method completely
depends on the excellence in judgment.
Disadvantage:
This method possesses a number of disadvantages, some of important
limitations are mentioned below:
i) The method is not at all a scientific method; hence the results may be
considerably affected by the personal prejudice or bias.
ii) This method provides the quick estimation
iii) This method involves the risk that the investigator may establish
foregone conclusions by including those sampling units which
conform his preconceived notions.
iv) There is no objective method for determining the sample size or
likelihood of sampling error, which is considered as a big defect of
this method.
Applications:
Although the principles of sampling theory are not applicable to judgment
sampling, this method is often used in solving many types of economic and
business problems, such as :
i) It is used when sample size is small; in such case simple random
sample may miss the more important elements, whereas judgment
selection would certainly include them in the sample.
ii) In solving everyday business problems and making public policy
decisions, executives and public officials are often in hurry and can
not wait for probability sampling. In this situation this is the only
practical method.
iii) Judgment sampling may be used to conduct pilot survey. In any case,
the reliability of sample results in judgment sampling depends on the
quality of the sampler’s expert knowledge or judgment.
Convenience Sampling: The method of convenience sampling is also
known as chunk or portion. A chuck is a fraction of one population taken
for investigation because of its convenient availability. Thus a chunk is
selected neither by probability nor by judgment, but by the convenience. A
sample obtained from a readily available list, such as telephone directories
or automobile registrations (not the complete list of these), is a convenience
sample and not a random sample, even if the sample is drawn at random
from the list.
Advantage:
Like judgment sampling, this method is also very simple and convenient for
analyzing and obtaining quick results.
Disadvantage:
Since this type of sample is a convenient part of whole population, it can by
hardly be representative of population; hence one can not be sure whether
the sampling units included in this type of sample are representative of the
target population.
Applications:
Formerly this method was frequently used in public opinion surveys when
interviewers stopped near the railway station or the bus stop or in front of
the office nearby building to interview people, however, today, accountants
still use this sampling method to analyze or audit accounts. This is also
useful in making pilot survey - questions may be tested and preliminary
information may be obtained from the mass before the final sampling
design is constructed.
16

Snowball Sampling: The term ‘snowball’ comes from the analogy of a


snowball, which begins small but becomes bigger and bigger as it rolls
downhill. The ‘snowball sampling’ has been used to describe a sampling
procedure in which the sample goes on becoming bigger and bigger as the
observation or study proceeds. For example, an opinion survey is to be
conducted on smokers of a particular brand of tobacco. At the first stage, we
may pick up a few persons who are known to us or can be identified to be
the smokers of that brand. At the time of interviewing them, we may obtain
the names of other persons known to the first stage subject. Thus, the
subjects go on serving as informants for the identification of more subjects
and sample goes on increasing.
Snowball sampling, which is generally considered to be non-probabilistic,
can be converted into probabilistic by selecting the subjects randomly
within each stage. For a non-probabilistic sample, some methods such as
quota sampling can be used at each stage.

Sampling and Non-sampling Errors


Sampling Error: Error or variation among sample statistics due to chance,
that means, the differences between each sample and the population, and
among several samples, which are solely due to elements we happen to
choose for the sample. Sampling errors arise due to the fact that sample are
used and to the particular method used in selecting the items from the
population. Hence, the complete enumeration will not possess any sort of
sampling error, because in this case, the whole population is studied, hence
no question of drawing sample arises. Sampling errors are again of two
types, such as biased and unbiased errors
Sampling Error: The error due to drawing inference about the population
on the basis of a sample is termed as sampling error.
Biased error arises due to faulty process of selection, faulty work during the
collection of information and faulty method of analysis. Faulty process of
selection may arise in a number of ways such, as deliberate selection of a
sample, conscious or unconscious fault in the selection of random sample,
substitution, non-response, etc. Faulty work during collection of
information may include – poorly designed questionnaire, ill-trained
interviewer, failure of a respondent’s memory, errors in measurements, etc.
Again, bias in analysis may arise, particularly, due to faulty method of
analysis such as from improper use of statistical measurements, improper
selection of models, etc.
Non-sampling Error: When a complete enumeration of units in the
population is made, one would expect that it would give rise to data free
from any error. Unfortunately, it is not so in practice. For example, it is very
difficult to avoid errors of observation or measurement. Again, in the
processing of data, tabulations errors may be committed affecting the final
results. Such errors are termed as non-sampling errors, because, they are
due to factors other than the inductive process of inferring about the
population from a sample. Thus, the data obtained in an investigation by
complete enumeration, although free from sampling error, non-sampling
errors can occur at every stage of planning and execution of the census or
survey. This type of errors can arise due to a number of causes, such as
defective methods of data collection and tabulation, faulty definition,
incomplete coverage of population or sample, inappropriate questionnaire,
etc. However, some of the major sources of non-sampling error can be
pointed out as follows:
i) Data specification being inadequate and inconsistent with respect to
the objective of the study, whatever the study method is, census or
survey.
ii) Omission or duplication of units due to imprecise definition or
boundaries of area units, incomplete or wrong identification of units,
or faulty methods of enumeration.
iii) Defective frame, faulty selection of sampling units. Inaccurate or
inappropriate questionnaire, methods of interview, definition or
instruction may also cause non-sampling error.
iv) Lack of trained and experienced investigators,
v) Lack of adequate inspection and supervision of primary staff
vi) Errors due to non-response, that means, incomplete coverage in
respect of units,
vii) Errors in data processing operations such as coding, punching,
certification, tabulation, etc.
viii) Errors committed during presentation or printing of tabulated results
ix) Errors in scrutiny of primary or basic data

Non-Sampling Error: The possible error which may arise at any stage of
investigation, either in census or in sampling, is termed as non-sampling
error. This type of error arises due to faulty questionnaire, due to non-
response, due to faulty tabulation method, etc.
However, the non-sampling error tends to increase with the sample size,
while sampling error decreases with increase of sample size. In case of
complete enumeration, non-sampling errors and in case of sample survey,
both sampling and non-sampling errors require to be controlled and
reduced to a level at which their presence does not distort the final results.
18

Sampling Distribution
Parameter: The unknown constant or any function of them that appear in
the mathematical specification of a population is known as parameter.
Any numerical quantity calculated from the population data is also called
parameter.

Statistic: Any function of a random sample is known as statistic.


Any numerical quantity calculated from sample is also called statistic.
Sampling distribution: The probability distribution of a statistic derived
from all possible random samples of a given population is called sampling
distribution.

Concept of Standard Error


Suppose we wish to know about average daily working hours of the
workers of a large industry with the help of sample. It is possible to take
several samples of workers of particular size from the population. If we
calculate average daily working hours for all samples, it would be highly
unlikely that all of these sample means would be the same; some variability
in the samples means would be observed. This variability in the sample
statistics results from sampling error due to chance error in sampling
process. This variability occurs because there are differences between each
sample and the population, and among the selected samples. The standard
deviation of the distribution of sample means measures the extent to which
we expect the means from the different samples to vary because of chance
error in the sampling process. Thus, the standard deviation of a statistic is
called its standard error. For example, standard deviations of sample mean,
sample median, sample proportion etc. are known as the standard errors of
mean, median and proportion, etc.
Standard Error: The positive square root of the variance of a statistic
(sample mean, sample median, sample proportion, etc) is known as the
standard error of the statistic.
Suppose, X1, X2, …… Xn constitute a random sample of size n from a normal
population with mean  and variance 2, X denotes the mean of the
random sample, then the expectation and standard error of sample mean

are given by µ and respectively.
n
. Importance of standard error: Standard error plays an important role in
statistical inference, such as estimation and test of hypothesis. All of the test
statistics are defined based on standard error of the statistic. Its importance
in estimation is cited below:
i) Standard error gives an index of the precision of the estimate of the
parameter.
ii) Standard error enables us to determine the probable limits within
which the population parameter may expect to lie.

Central Limit Theorem


The central limit theorem states that ‘Regardless of the shape of the
population, the distribution of the sample means approaches the normal
probability distribution as the sample size increases.’ In practice, the sample
size of 30 or more is considered adequate for this purpose. However,
regardless of the sample size, the sampling distribution would be normal, if
the original population is normally distributed.
Statement of Central limit theorem. Let X1 , X2 , ……… Xn be a random
sample from a population having mean µ and variance 2.. Let X be the
sample mean, then central limit theorem states that as n becomes large the
X −
sampling distribution of Z =  approaches to the standard normal
n
distribution whatever may be the form of the distribution. That means Z =
X −
 ~ N (0, 1).
n

It can be seen that the possible values of sample means tend to be close to
the population mean, and according to the central limit theorem, the
distribution of these sample means tend to be approximately normal for a
sample size larger than 30.
Remarks.
1. Central limit theorem holds only if the mean and variance of the
distributions from which the random sample drawn exist.
2. If the random sample has been drawn from the normal population
X −
then the sampling distribution of the Z = is exactly N(0,1) for
/ n
any sample size n.
20

Some Important Sampling Distributions


The important sampling distributions are
i) 2- distribution (Chi-square distribution)
ii) t-distributions
iii) F-distributions
iv) Distribution of sample mean
v) Distribution of difference between two sample means
vi) Distribution of sample proportion.
These distributions play important roles in test of hypothesis. All these
distributions are derived from the normal distribution. Now we shall give a
brief survey of these distributions .

Chi-square ( 2 ) distribution: The sum of squares of n independent


standard normal variates is called chi-squares with n degrees of freedom.
Let Z1 , Z 2 ,........, Z n be n independent standard normal variables, then chi-
square denoted 2 is defined as
n
n =  Zi .
2 2

i =1
However, if X1, X2, ….. Xn are n independently and identically distributed
random variables each of which is normally distributed with mean µ and
n 2
x −μ
variance 2, then  n =   i
2 2
 is distributed as  n with n df.
i =1  σ 

2
The probability density function of  with n degrees of freedom is
2

( )
2
2 1 − 2 n /2 −1 2
f ( ) = n e ;  >0
n 2
2 ( )
2
2
Important Properties of 2 distribution
i) The distribution contains only one parameter which is the degree
of freedom of the distribution.
ii) The mean of the distribution is n and the variance is 2n.
iii) The mode of the distribution is n-1.
iv) It is positively skewed distribution for smaller values of n; the
distribution becomes symmetrical as n tends to infinity.
Applications of Chi-square distribution: 2 - distribution has a large
number of applications in statistics, some of which are enumerated below:
i) To test the goodness of fit.
ii) To test the population variance.
iii) To test the independence two attributes in a contingency table.
iv) To test the homogeneity of several variances.
v) To test the equality of several population correlation co-efficient.
vi) To test the equality of several proportions.
Student’s t-Distribution: Let X1, X2, ….., Xn be random sample from a
_
normal distribution with mean with mean µ and variance 2, then x is
normally distributed with mean µ and variance 2/n. Now, if the
_ n
1
n
estimators of µ and variance 2 are given by x= xi and
i =1
n _
1

2 2
s = (xi − x) respectively,
n - 1 i =1

X −
Then the statistic t is defined as t = which follows Student’s t
s/ n
distribution with n-1 degrees of freedom (df).
A continuous random variable t is said to have a t-distribution with n df if
its probability density function is given by
 n +1 
f(t) =
1
n B(1 / 2 , n / 2)
(1 + t )
2 / n − 2 
; -∞<t<∞

Properties of t distribution.
i) The distribution has only one parameter which is the degree of
freedom of the distribution.
ii) The distribution symmetric about mean zero and variance is
n /(n − 2) and all odd order moments of t-distribution are zero.
iii) Since, the distribution is symmetric at mean t = 0, hence, the mean,
median and mode are all zero.
iv) If the degree of freedom increases, t- distribution tends to normal
distribution. Actually, t-distribution tends to normal distribution
when n > 30.
Application of t-distribution: t- distribution is used to test the following
cases :
i. To test the mean of a population when the variance is unknown and
the sample size is less than 30.
22

ii. To test the equality of two population means when the variances are
equal but unknown and the sample sizes are small.
iii. To test the population correlation co-efficient  = 0 .
iv. To test the population regression co-efficient β = 0 or β = β0
v. To test the equality of two independent regression co-efficient.

Remarks. t-tests are called small sample tests n < 30.

F- Distribution: If X11, X22, ….. , X 1n1 be a random sample of size n1


drawn from a normal population with mean µ1 and variance 12, and X21,
X22, ….., X 2 n2 be another random sample of size n2 drawn from a normal
_
population with mean µ2 and variance 22 . Let x1 and s12 are the estimators
_
of µ1 and 12, and x2 and s22 are the estimators of µ2 and variance 2
_ n1 n 1 _
1 1
 x1i , 
2
respectively, defined by x1 = s12 = (x1i − x 1 ) ,
n1 i =1
n 1 - 1 i =1
_ n2 n2 _
1 1
 x2j , n2 - 1 
2
x2 = s2 2 = (x 2j − x 2 )
n2 j =1 i =1

(n 1 - 1)s12
Thus, 12 = is a 2 -variate with (n1 - 1) df
1 2
(n 2 - 1)s2 2
and 2 2 = is a 2 -variate with (n2 - 1) df
2 2
Since the two samples are independent, these 2 –variates are also
independent. Thus, the ratio of two independent chi-squares divided by
their respective degrees of freedom is called F-variate and it is defined as :
2
1 / n 1 − 1 s12
F= 2
= .
2 /n 2 − 1 s2 2

Which follows Snedecor’s F with n1 – 1 = 1 and n2 – 1= 2 degrees of


freedom.

The density function of F with 1 and 2 df is given by


1 / 2
 1  1
−1
  
 2  F 2
f(F ) = ; F≥0
  1 +  2
 1 , 2   1  2
 2 2  1 + F
 2 
Remark. The sampling distribution of F-statistic does not involve any
population parameters and depends only on the degrees of freedom.

Properties of F-distribution
i) The distribution contains two parameters which are the degrees of
freedom of the distribution.
ii) The mean and variance of F-distribution are
2
Mean = and
( 1 +  2 )
2  2 2 (  1 +  2 − 2)
Variance = var (F) = 2
; 1 > 2 and 2 > 4.
 1 (  1 − 2 ) (  2 − 4)
 2 (  1 − 2)
iii) The mode of the distribution is Mode = which less than
( 2 + 2) 1
unity is always. It means mode of the distribution exists if  1  2 .
iv) The distribution is positively skewed.

Applications of F-distribution. The F-distribution is used in the following


cases
i) To test the equality of several population means.
ii) To test the equality of two variances.
iii) To test the equality of several regression coefficients.
iv) To test the equality of multiple correlation coefficient.
v) It is widely used in analysis of variances in design of experiments.

Sampling distribution of sample mean: Let X1 , X2 , ……… Xn be e a


random sample from a population with mean µ and variance 2. Then the
1
sample mean is X =  X i .
n
Mean of sample mean. It can be easily shown that the expectation of the
sample mean is equal to the population mean. That is
1 nμ
E( X )= E (X 1 + X 2 + ..... + X n  = =μ.
 n  n

Hence, the mean of the sampling distribution of sample means is the


population mean. This is an important result of random sampling and
indicates the protection that random samples provide against
unrepresentative samples. A single sample mean could be larger or smaller
than the population mean. However, on average, there is no reason for us to
expect a sample mean that is either higher or lower than the population
mean
24

Variance of sample mean. We know that the variance of a linear


combination of independent random variables is the sum of the linear
coefficients squared times the variance of the random variables. It follows
that
2 2 2
1 1 1 1 nσ σ
Var ( X ) = Var X 1 + X 2 + ..... + X n  =    σ = 2 = .
2

n n n  n n n

And the corresponding standard error of mean is given by se ( X ) =
n
which shows that the variance or standard error of mean decreases as the
sample size n increases. The above results are also true for all possible
samples of size n drawn with replacement from a finite population of size
N.
Theorem 1. If all possible random samples of size n are drawn with
replacement from a finite population of size N with mean  and standard
deviation  , then the sampling distribution of the mean X follows a
distribution with mean  and standard deviation  / n .

X −
Remarks. According to central limit theorem Z = follows standard
/ n
normal variate which has mean zero and variance one. Now we shall prove
the theorem with the help of an example.
Example 1. Suppose a population consists with four values 0, 1, 2, 3. Draw
all possible of size 2 with replacement and show that the sample mean
follows the above the theorem.
0+1+2 +3
Solution. The population mean is  = = 1.5 .
4
1
(x − )2
2
Population variance =  =
N
1
(
= [(0 − 1.5)2 + 1 − 1.5
4
)
2
+ (2 − 1.5)2 + (3 − 1.5)2

2.25 + 0.25 + 0.25 + 2.25 5


= = = 1.25.
4 4
All possible samples and their means are shown in the table given below:
Sample No. Sample x Sample No. Sample x
1 0, 0 0 9 2,0 1.0
2 0,1 0.5 10 2,1 ]1.5
3 0,2 1.0 11 2,2 2.0
4 0,3 1.5 12 2,3 2.5
5 1.0 0.5 13 3,0 1.5
6 1,1 1.0 14 3,1 2.0
7 1,2 1.5 15 3,2 2.5
8 1,3 2.0 16 3,3 3.0

The sampling distribution of X is :


f
x f p( x) =
k
0 1 1/16
0.5 2 2/16
1.0 3 3/16
1.5 4 4/16
2.0 3 3/16
2.5 2 2/16
3.0 1 1/16
f
Here k is the number of samples. is the relative frequency and the ratio’s
k
are the probabilities for different values of X .

Mean of X is given by
E( X) = xi p(xi ) = 0  1 / 16 + 0.5  2 / 16 + 1  3 / 16 + 1.5  4 / 16
+2  3 /16 + 2.5  2 /16 + 3  1 /16
1+3+6+6+5+3 24
= = = 1.5 =  = population mean.
16 16
2
(
Variance of X =  (X) =  X −  p( x) )
2

(
= 0 − 1.5 )  161 + (0.5 − 1.5)  162 + (1 − 1.5)  163 + (1.5 − 1.5)  164
2 2 2 2

+ (2 − 1.5)2 
3
16
(
+ 2.5 − 1.5 )  162 + (3 − 1.5)  161
2 2

5
2
1  10 5
= (2.25 + 2 + 0.75 + 0 + .75 + 2 + 2.25) = = = 4 = .
16 16 8 2 n
Theorem 2. If all possible random samples of size n are drawn without
replacement from a finite population of size N with mean µ and standard
26

deviation , then the sampling distribution of the mean X follows a


 N−n
distribution with mean µ and standard error  X = . .
n N−1
Remarks. It is to be noted that the sampling distribution of the sample mean
follows the above theorem if n observations is taken from N population
observations one by one without replacement or n observations are at a time
from the N observations. In first case the total number of samples is k= N
(N-1) (N-2) …..(N-n+1). In second case the total number of samples is l =
N Cn . Although the total number of samples in first case is n! times the total
number of samples in second case, but the distribution of sampling mean
remains the same.
Example 2: Suppose a population consists with four values 0, 1, 2 and 3.
Draw all possible of size 2 without replacement and show that the sample
mean follows the above the theorem.
All possible samples and their means are shown in the table given below:
Sample No. Sample x Sample No. Sample x
1 0,1 0.5 7 1,0 0.5
2 0,2 1.0 8 2,0 ]1.0
3 0,3 1.5 9 3,0 1.5
4 1.2 1.5 10 2,1 1.5
5 1.3 2.0 11 3,1 2.0
6 2.3 2.5 12 3,2 2.5

The sampling distribution of X is


f
x f p( x) =
k
0.5 2 1/6
1.0 2 1/6
1.5 4 1/3
2.0 2 1/6
2.5 2 1/6
f
Here k is the number of samples. is the relative frequency and the rato’s
k
are the s the probabilities for different values of X . Here k = 12

Mean of X = E( X) = xi p(xi )


= 0.5  1 / 6 + 1.0  1 / 6 + 1.5  1 / 3 + 2  1 / 6 + 2.5  1 / 6
0.5 + 1 + 3 + 2 + 2.5 9
= = = 1.5
6 6
2
(
Variance of X =  (X) =  X −  p( x) )2

(
= 0.5 − 1.5 )  61 + (1 − 1.5)
2 2

1
6
+ (1.5 − 1.5)2

1 1 1
 + (2 − 1.5)2  + (2.5 − 1.5)2 
3 6 6
1 2.5 5
= (1 + 0.25 + 0 + .25 + 1) = =
6 6 12

5
2
4−2  N−n
= 4 =  .
2  4−1 n  N−1

The term (N – n)/(N – 1) is often termed as finite population correction


factor. This formula can be made simpler to use by the fact that we
generally deal with very large populations, which can be considered as
infinite, so that the population size N is very large and sample size is very
small, then (N-n)/(N-1) would approach 1, then we can use the formula
2
− σ
Var  X  = n .
 

We have the same probability distribution of the sample mean if we take


4C2 = 6.

The possible samples , sample means are


Values of Sample mean
Sample No. Sample value
X
1 0,1 0.5
2 0,2 1.0
3 0,3 1.5
4 1,2 1.5
5 1,3 2.0
6 2,3 2.5

The sampling distribution of the sample mean is


The sampling distribution of X is
28

f
x f p( x ) =
k
0.5 1 1/6
1.0 1 1/6
1.5 2 1/3
2.0 1 1/6
2.5 1 1/6

Properties of sampling distribution of sample mean: Let X denotes the


sample mean of a random sample of n observations from a population with
mean  and variance 2, then
i. The sampling distribution of X has mean E( X ) = .
σ
ii. The sampling distribution of X has standard error
n
iii. However, if the sample size n is not small compared to the
population size, N, and the population size is finite and the sampling
is drawn without replacement, then the standard error of X is given
σ N−n
by . . Although for large sample, if the sample size is very
n N−1
small relative to the population size, this standard error approaches
to the standard error given in (ii) and is usually used instead of this.
N−n
Here the term is called the finite population multiplier or
N−1
correction.
iv. If the parent population distribution is normal, thus sampling
distribution of X is also normal, then the random variable Z =
X −
follows a normal distribution with mean 0 and standard
/ n
deviation 1.
However, if the sample size n is small (< 30) and the population variance 2
2
(x − x)
is not known, it is to be estimated using the formula s 2 = and the
n−1
standard error of the mean is given by s / n , then the mean X will not
follow normal distribution, rather it will follow t-distribution, and the
X −
corresponding statistic is defined as t = which is distributed as
s/ n
Student’s t with n-1 degrees of freedom. Again, if sample size is large, this t
tends to standard normal distribution.

Let us illustrate the concept of sampling distribution with different


examples.

Example 3. The MBA class has a total of 60 students. Their average score in
statistics after final term was 70 with a standard deviation of 8. A sample of
36 students is taken at random from this class . Calculate the standard error
of the mean for this sample.
Solution. Here N = 60, n = 36 and  = 8, since the sample size is not a small
portion of population size, the standard error of mean is given by

 N−n 8 60 − 36
(X)= . = . = 0.85.
n N−1 36 60 − 1

Example 4. A large bag contains some counters, 60% of the counters have
the number 0 on them and 40% have the number 1. A random sample of 3
counters is taken from the bag, find the sampling distribution of sample
mean and sample mode.
Solution. Let X be the number of counters which can take value 0 and 1 .
The distribution of population is given by

Values of X: x : 0 1
p(x) : 0.6 0.4
And the population mean is E(X) = 0 × 0.6 + 1 × 0.4 = 0 .4.
The possible samples of size 3 are
(0,0,0), (1,0,0), (0,1,0), (0,0,1), (1,1,0), (1,0,1), (0,1,1), (1,1,1)

p( X = 0) = (0.6)3 = 0.216 for (0, 0, 0) case


p( X = 1/3) = 3 × 0.4 × 0.62 = 0.432 for (1,0,0), (0,1,0), (0,0,1) cases
p( X = 2/3) = 3 × 0.42 × 0.6 = 0.288 for (1,1,0), (1,0,1), (0,1,1) cases
p( X = 1) = (0.4)3 = 0.064 for (1,1,1) case

The sampling distribution of sample mean X is

X : x 0 1/3 2/3 1
30

p( x ): 0.216 0.432 0.288 0.064

Thus E( X ) = 0 × 0.216 + 1/3 × 0.432 + 2/3 × 0.288 + 1 × 0.064 = 0.4 which is


exactly same as the population mean µ.
Similarly, from the list of possible samples it is clear that mode can take two
values 0 or 1. Hence the sampling distribution of mode (M) is given by,
M: 0 1
P(M): 0.648 0.352
Hints: p(Mode = 0) = (0.6)3 + 3 x 0.4 x 0.62 = 0.648 for first four cases
p(Mode = 1) = 3 x 0.42 x 0.6 + 0.064 = 0.352 for last four cases
Example 5. Suppose a baby sitter has 5 children under her supervision with
average age of 6 years. The individual ages of five children are X1 = 2, X2 =
4, X3 = 6, X4 = 8, X5 = 10 years. If a sample of 2 children is selected at
random, find the sampling distribution of mean age.
Solution. Here we can consider 5C2 = 10 samples in place of 5 × 4 = 20
samples. The possible samples, sample means and their probabilities are
given below:
Sample
Sample Individual Sample age Mean age Probability
No.
1 X1, X2 (2, 4) x =3 p( x = 3) = 1/10 = 0.1
2 X1, X3 (2, 6) x =4 P( x = 4) = 1/10 = 0.1
3 X1, X4 (2, 8) x =5 P( x = 5) = 1/10 = 0.1
4 X1, X5 (2, 10) x =6 P( x = 6) = 1/10 = 0.1
5 X2, X3 (4, 6) x =5 P( x = 5) = 1/10 = 0.1
6 X2, X4 (4, 8) x =6 P( x = 6) = 1/10 = 0.1
7 X2, X5 (4, 10) x =7 P( x = 7) = 1/10 = 0.1
8 X3, X4 (6, 8) x =7 P( x = 7) = 1/10 = 0.1
9 X3, X5 (6, 10) x =8 P( x = 8) = 1/10 = 0.1
10 X4, X5 (8, 10) x =9 P( x = 9) = 1/10 = 0.1

The probability distribution of the sample mean, referred to the sampling


distribution of mean, is given by

Sample mean ( x ) 3 4 5 6 7 8 9
p( x ) 0.1 0.1 0.2 0.2 0.2 0.1 0.1

Here the population mean is


2 + 4 + 6 + 8 + 10 30
= = =6.
5 5

Variance =  =
2 1
5

(2 − 6)2 + (4 − 6)2 + (6 − 6)2 + (8 − 6)2 + (10 − 6)2 
16 + 4 + 0 + 4 + 16 40
= = =8
5 5
The mean of the sample mean =
E[ X ]= 3  .1 + 4  .1 + 5  .2 + 6  .2 + 7  .2 + 8  .1 + 9  .1
= .3 + .4 + 1.0 + 1.2 + 1.4 + .8 + .9 = 6.0.

 X 2 = (x − ) p(x) = 9  .1 + 4  .1 + 1  .2 + 0 + 1  .2 + 4  .1 + 9  .1
2

2
85−2  N−n
= 3.0 =  =  .
2  5−1 2  N−1

Example 6. Suppose the experiences in years of six employees are given as


2, 4, 6, 6, 7, 8. Find the sampling distribution of means for random samples
of size 2.
Solution. The number of samples of size two is 6C2 = 15. The years of
experience of possible 15 samples of 2 employees are listed below along
with sample means

Sample Sample Sample Sample


Sample Sample
No. mean No. mean
1 2, 4 3.0 9 4, 8 6.0
2 2, 6 4.0 10 6, 6 6.0
3 2, 6 4.0 11 6, 7 6.5
4 2, 7 4.5 12 6, 8 7.0
5 2, 8 5.0 13 6, 7 6.5
6 4, 6 5.0 14 6, 8 7.0
7 4, 6 5.0 15 7, 8 7.5
8 4,7 5.5
Thus the sampling distribution of sample means from the employees
population for sample of size 2 is

Sample Mean ( x ) Probability of x


3.0 1/15
32

4.0 2/15
4.5 1/15
5.0 3/15
5.5 1/15
6.0 2/15
6.5 2/15
7.0 2/15
7.5 1/15
In this case too, it can be easily shown that the mean of sampling
distribution is exactly same as the population mean µ = 5.5.
Example 7. The time between two arrivals in a queuing process of cars in a
busy road is normally distributed with a mean 2 minutes and standard
deviation 0.25 minutes. If a random sample size 36 such cars is taken, what
is the probability that the sample mean will be greater than 2.1 minutes?
Solution. Since the population is normally distributed, therefore, the
sampling distribution of the sample mean will follow a normal distribution
σ 0.25
with mean  x = 2 and standard error x = = = 0.042.
n 36

Thus, the sampling distribution of X is given by X ~ N (2, 0.0422)


Therefore the probability that the sample mean will be greater than 2.1
minutes is given by

P ( X  2.1) = P (Z  2.38) = 1 – P(Z< 2.38) = 1 – 0.9913 = 0.0087.


Example.8. A spark plug manufacturer claims that the lives of its plugs are
normally distributed with mean 36000 miles and standard deviation 4000
miles. A random sample of 16 plugs had an average life of 34500 miles. If
the manufacturer’s claim is correct, what is the probability of finding a
sample mean of 34500 or less?
Solution. To compute the probability, we need first to obtain the standard
error of the sample mean, which is
σ 4000
x = = = 1000.
n 16

Hence, the desired probability is


34500 - 3600 
P ( X < 34500) = P  Z   = P(Z < – 1.50) = 0.0668
 1000 
which suggests that if the manufacturer’s claims –  = 36000 and  = 4000 –
are true, then a sample mean of 34500 or less has a small probability.
Example 9. The weights of packets of cosmetics are normally distributed
with mean 120 pounds and standard deviation 10 pounds. (i) What is the
probability that the weight of any packet chosen at random is between 120
and 125 pounds? (ii) If a random sample of 25 packets is taken, what is the
probability that the mean of this sample will be between 120 and 125?
Solution. (i) Let X represents the weights of the packets, given that X ~ N
(120, 102). We have to find P (120 < X< 125). Using the standardized normal
X −
distribution formula Z = .

34

125 − 120 120 − 120


We have, Z = = 5/10 = 0.5 and Z = = 0, thus, in terms
10 10
of Z we have to find P (0 < X < 0.5) = 0.1915, this means there is 19.15 %
chance that a packet picked up at random will have weight between 120
and 125 pounds.

(ii) Here we have to find P(120 < X < 125), that means we have to use the
standardized normal distribution for sampling distribution of sample mean
defined as
X −
Z= ~ N(0,1)
/ n

 120 − 120 125 − 120 


So, we have P(120 < X < 125) = P  Z 
 10 / 25 10 / 25 

= P (0 < Z < 2.5) = 0.4938.


Which shows there is a chance of 49.38% that the sample mean will be
between 120 and 125 pounds? It is also clear from here that the chance of a
sample mean being between 120 and 125pounds much higher than the
probability of an individual packet having weight between 120 and 125
pounds and the probability of it increases as the sample size increases.

Example 10. A bank calculates that its individual savings deposits are
normally distributed with mean TK. 2000 and a standard deviation of Tk.
600. If the bank takes a random sample of 100 accounts, what is the
probability that mean deposit will lie between TK. 1900 and TK. 2050?
Solution. Let X be the individual savings deposit, given that X ~ N(2000,
6002). We have to find P (1900 < X < 2050). Using the standardized normal
distribution for sampling distribution of sample mean we have,

X −
Z= ~ N(0,1)
/ n

σ 600
Here, standard error of X is given by = = 60,
n 100

So, the required probability is


1900 − 2000 2050 - 2000 
P(1900 < X < 2050) = P  Z 
 60 60 
= P (-1.67 < Z< 0.83)
= P[Z<.83]-P[Z<-1.67] = 0.7967- 0.0475= 0.7492.
Example 11. Suppose that the annual percentage salary increases for the
managers of a large industry are normally distributed with mean 12.2% and
standard deviation 3.6%. A random sample of nine managers is obtained
from this population and the sample mean computed. What is the
probability that the mean will be less than 10%?
Solution. Given  = 12.2,  = 3.6 and n = 9
36

Let X denote the sample mean, and computing standard deviation of


sample mean we have,

 = σ = 3.6 = 1.2, thus, we have to compute,


X

n 9

P( X < 10) = P  Z  10 - 12.2  = P(Z< -1.83) = 0.0336


 1.2 
Sampling Distribution of the difference between two means:
Suppose X11, X12, …......,X1n1 be a random sample of size n1 taken from a
normal population with mean µ1 and variance 12, and X21, X22, …..,X2n2 be
another independent sample of the same size n2 taken from normal
population with mean µ2 and variance 22. Let X 1 and X 2 be the sample
1
n 1  1i
means of two samples respectively defined by X1 = X and

1
X2 =
n2  X2i .
(a) When the population variances are known , the sampling distribution of
( X 1 – X 2 ) follows exactly normal distribution with

i) Mean = (µ1 – µ2) and


2 2
σ1 σ2
ii) Standard error = + whatever may be the values of n1 and n2.
n1 n2

(b) When 1 and 2 are not known, and but n1>29 and n2 >29 are sufficiently
large, then standard error of the difference between two sample means can
2 2
s1 s2
be estimated by + where s12 and s22 are the variances obtained
n1 n2
from two samples respectively. In this case, the sampling distribution of
( X 1 – X 2 ) follows approximately normal distribution with

i) Mean = (µ1 – µ2) and


2 2
s1 s2
ii) Standard deviation = + whatever may be the form of the
n1 n2
parent populations from which the samples are drawn.
(c) When the sample sizes are small and population variances are equal but
1 1
unknown, then standard error of ( X 1 – X 2 ) is estimated by s +
n1 n2
2 n 1s 1 2 + n 2 s 2 2
where s = is the pool estimate of the equal population
n1 + n2 − 2
variances and s12, s22 are the sample variances of two samples.

Then the sampling distribution of ( X 1 – X 2 ) follows t-distribution with


n 1 + n 2 − 2 degrees of freedom and the variate t is defined as

(x 1 − x 2 ) − ( 1 −  2 )
t= .
1 1
s +
n1 n2

In all of the above cases, we have assumed that the parent distribution is
normal. However, if the parent distribution is not normal and if the sample
size is large, then by the virtue of central limit theorem, the distribution of
the difference between two means follow normal distribution with
respective mean and variance.
Example 12. Strength of wire produced by company A has a mean of 4500
kg and a standard deviation of 200 kg. Company B has a mean of 4000 kg
and a standard deviation of 300 kg. If 50 wires of company A and 100 wires
of company B are selected at random and tested for strength, what is the
probability that the sample mean strength of A will be at least 600 kg more
than that of B?
Solution. For the sampling distribution of the difference between two
means, we know the mean value of the difference between two sample
means is given by (µ1 - µ2) = 4500 – 4000 = 500.
2 2 2 2
σ1 σ2 200 300
And standard error + = + = 41.23.
n1 n2 50 100

Thus, the desired probability is given by


38

 
 
 ( X 1 − X 2 ) − ( 1 −  2 ) 600 − ( 1 −  2 ) 
P( X 1 − X 2 >600) = P   
 2
1  2
2
1  2 
2 2
 + + 
 n1 n2 n1 n2 

= P Z  600 − 500 


 41.23 
= P (Z > 2.43) = 0.0075
Therefore, the probability that the sample mean strength of the wire
produced by company A will be at least 600 kg more than that of B is given
by 0.0075.
Example 13. A man buys 200 electric bulbs of each of two well known
brands taken at random from stock for testing purposes. He finds that
brand A has a mean life of 2560 hours with a standard deviation of 90 hours
and brand B has a mean life of 2650 hours with standard deviation of 75
hours. Find the probability that average life of brand A is 110 hours less
than that of brand A.
Solution. For the sampling distribution of the difference between two
means, we know the mean value of the difference between two sample
means is given by (µ1 - µ2) = 2560 – 2650 = 90 hours.
2 2 2 2
s1 s2 90 75
And standard error + = + = 8.248
n1 n2 200 200

Thus, the required probability is given by


 
 
 ( X 1 − X 2 ) − ( 1 −  2 ) 100 − ( 1 −  2 )   110 − 90 
P( X 1 − X 2 >100) =    = P Z  
  8.284 
1  2 
2 2 2 2
1  2
 + + 
 n1 n2 n1 n2 

= P (Z < 2.41) = 0.9920.


Sampling distribution for proportion: Suppose a variable has two
categories which follows binomial distribution with parameters n and ,
and suppose a random sample of size n is taken from the population, where
P is the proportion of a particular category of the variable. We know the
mean and variance of distribution are n and n (1 – ) respectively. The
(1 − )
standard error of estimated p is given by p = .
n
Usually a large sample is considered for finding the distribution of sample
proportion, so since the sample size is large, by the virtue of central limit
P− P−
theorem, Z = = is standard normal variate.
p (1 − )
n
That is Z defined above follows standard normal variate with mean zero
and variance unity.
Example 14. It is known that 65% of items of lot are defectives, (i) what is
the probability that a simple random sample of size 100 items will not
reveal the proportions of defectives items to be 60% or less? (ii) how would
this probability change if the sample size is increased to 500 ?
Solution. (i) The problem states that the population proportion of defectives
items is 65%. This also means that if all possible samples of size 100 are
taken from the population, then, the various sample proportions would be
normally distributed with average proportion 65%, that means E(P) =  =
0.65, and the standard error of proportion P is

(1 − ) 0.65(1 − 0.65)


p = = = 0.0477.
n 100

Now we have to find Pr. (P ≤ 0.60) = Pr (Z ≤ -1.05) = 1 –  (1.05) = 1 – 0.8531


= 0.1469, which is the required probability.
(ii) Again, when the sample size is increased to 500, then with the same
parameter value of P = 0.65, the value of p would be .0213, and that of Z
defined above would be – 2.35, so the required probability is given by Pr(p
≤ 0.60 ) = Pr(Z ≤ -2.35) = 1 –  (2.35) = 0.0094.
Example 15. 45% of the workers working in a garments factory are married.
If a sample of size 200 of workers is selected at random, what is the
probability that the proportion of married worker in this sample would be
between 40% and 48%?
Solution. Distribution of various samples of 200 each from population of
workers of the factory would follow normal distribution with average
proportion  = 0.45 and standard error,

(1 − ) 0.45(1 − 0.45)


P = = = 0.035.
n 200
Now, we have to find the Pr. (0.40 < P< 0.48) = Pr.(-1.43 < Z < 0.86)
= (0.86) –  (–1.43) = 0.8051-.0764= 0.7287.
40

Example16. A random sample of 250 homes was taken from a large


population of older homes to estimate the properties of homes with unsafe
wiring. If, in fact, 30% of the homes have unsafe wiring, what is the
probability that sample proportion will be between 25% and 35% homes
with unsafe wiring?
Solution. For the given problem, we have  = 0.30, n = 250.
We can compute the standard error of the sample proportion, p, as
(1 − ) 0.30(1 − 0.30)
p = = = 0.029.
n 250
Thus, the required probability is Pr(0.25 < P < 0.35 )
 0.25 −  P −  0.35 −  
= Pr.    
 σp σp σ p 

0.25 − 0.30 0.35 − 0.30 


=Pr.  Z  = P(–1.72 < Z < 1.72)
 0.029 0.029 
= P[Z<1.72] –P[ Z<–1.72]= (1.72) – (–1.72)
= 0.9573 – 0.0427=0.9146.
Example17. It has been found that 43% of business graduates believe that a
course in business ethics is very important for imparting ethical values to
students. Find the probability that more than one-half of a random sample
of 80 business graduates believe this fact.
Solution. We are given  = 0.43, n = 80.
The standard error of sample proportion P is calculated as

(1 − ) 0.43(1 − 0.43)


p = = = 0.055.
n 80
 P −  0.50 -  
Thus, we have to find, Pr. (P > 0.50) = P   
 σp σ p 

0.50 − 0.43 
=  Z   = P (Z > 1.27) = 0.1020.
 0.055 

That means, the probability of having one-half of the sample believing in


the value of business ethics courses is approximately 0.1

You might also like