CH-1 Sampling Distribution
CH-1 Sampling Distribution
1.1. INTRODUCTION
Sampling in statistics is as common and important as salt is in food. In homes, ladies take out
one teaspoonful to detect the quality of what they are cooking. In medical sciences, a few drops
of blood are taken and tested microscopically or chemically to determine whether the blood
contains any abnormalities.
In different fields of human activity, in performing the ordinary actions of our daily lives, the
decision-making process is based on the observations of a few units that form a portion of the
total population. Nowadays, sampling methods are extensively used in socio-economic surveys
to understand the living conditions, cost of living index, etc., of a class of people. In biological
studies, experiments are conducted on some units (persons, animals, or plants), and inferences
are drawn about the breed or variety to which the units belong. In industries, sampling
procedures are predominantly used for quality control.
1.2.SAMPLING THEORY
Sampling theory is the study of relationships existing between a population and samples drawn
from the population.
1.2.1. BASIC DEFINITIONS
➢ Universe or Population: is an aggregate of objects under study. It is a collection of
individuals or of their attributes or of results of operations which can be numerically
specified.
If the population consists of a finite number of individuals, then it is called a Finite
Population. If the population consists of an infinite number of units, then it is practically
impossible to observe all the units, and it is called an Infinite Population.
Although many populations appear to be exceedingly large, no truly infinite population
of physical objects actually exists. Given limited resources and time, it is practically not
possible to count the number of grains of sand on the beach. Such populations are termed
as infinite populations for our study.
➢ Sample:is a finite subset of a population drawn from it to estimate the
characteristics of the population. Sampling is a tool which enables us to draw
conclusions about the characteristics of the population.
➢ Sampling: sampling consists of drawing a sample from a given population and
1
using the sample data to gain knowledge about the population parameters.
➢ Census: a census involves complete enumeration or full count of the entire units
comprising the population in respect of the parameter of our interest.
➢ Sampling design: - A sample design is a definite plan for obtaining a sample from the
sampling frame. It refers to the technique or the procedure that one would adopt in
selecting some sampling units from which inferences about the population is drawn.
Sampling design is determined before any data are collected.
➢ Sampling error: - Sample surveys do imply the study of a small portion of the
population and as such there would naturally be a certain amount of inaccuracy in the
information collected. This in accuracy may be termed as sampling error or error
variance. The discrepancies between population parameters and estimates (statistics),
which are derived from a random sample is also the error or the sampling bias. In short,
sampling error is the difference between a sample statistic and its corresponding
population parameter.
2
B. The physical impossibility of checking all items in the population
The populations of fish, birds and other wild lives are large and are constantly moving being
born and dying. There is no mechanism to contact all items or individual members of the
population.
C. The cost of studying all the items in a population is often prohibitive
Contacting the entire population is costly and expensive; therefore, sampling is preferable. Public
opinion polls and consumer testing organizations typically contact only a fraction of the millions
of families. Consider a multinational corporation with 50 million customers worldwide. If this
company plans to conduct a market survey, it will need to take 2000 samples out of the 50
million. If it costs 20 Br. to mail samples and tabulate the responses of these 2000 samples, the
total survey will cost 40,000 Br. Meanwhile, the same survey involving the entire 50 million
population would cost about one billion Br.
D. The adequacy of sample results
If the sampling technique is effectively conducted, there is no need to study the entire
population; the sample itself leads to adequate generalization about the population. Even if funds
were available, it is doubtful whether the additional accuracy of census, i.e., studying the entire
population, is essential in most problems. To determine the monthly index of food prices (bread,
beans, milk, etc.), it is unlikely that the inclusion of all grocery stores and shops would
significantly affect the index, since the prices of such commodities usually do not vary by more
than a few cents from one store to another. 100% accuracy cannot always be guaranteed by
studying the entire population. The chance of error in collecting and analyzing bulk data has its
own disadvantages.
E. To contact the whole population would often be time consuming
Studying all population is time consuming. Therefore, to save time, sampling is critical.
3
1.2.3. ERRORS
The term error denotes the difference between population value and its estimate provided by
sampling technique. The errors involved in the collection, processing, and analysis of data, as
well as sampling, may be broadly classified under the following two heads:
I. Sampling Errors and
II. Non-sampling Errors
Sampling Errors is the difference between the result of a sample and the result of census. It
represents the variance between the sample estimation and the actual value of the population.
This discrepancy arises because only a portion of the population (i.e., the sample) has been
utilized to estimate population parameters and draw inferences about the population. Sampling
error decreases as the sample size increases.
Non-Sampling Errors are those errors that can be created from errors in the sampling/census
procedure, and they cannot be reduced or eliminated by increasing the sample size. Non-
sampling errors primarily arise at the stages of observation ascertainment and processing of the
data and are thus present in both the complete enumeration survey and the sample survey. Such
errors occur because of human mistakes and not chance variation.
1.2.4. TYPES OF SAMPLES
The sampling techniques may be broadly classified as follows:
1. Probability(random) Sampling
2. Non-Probability(non-random) Sampling
4
Selection of Simple Random Sampling can be done by Lottery Method or the use of table
of random numbers. In lottery Method we identify each and every unit with distinct
numbers by allotting an identical card. The cards are put in a drum and thoroughly shuffled
before each unit is drawn. There are several Random Numbers Tables.
2. Systematic Random Sampling
The items or individuals of the population are arranged in some way (alphabetical) or some other
method. A random starting point is selected and then every Kth member of the population is
selected for the sample. A systematic random sample should not be used if there is a
predetermined pattern to the population. Values are listed in ascending or descending order.
- To get a systematic sample of size n from a population of size N, draw a random number i from
N
1 to K, where K = n , and then select i, i + K, i + 2K, i + 3K, …
Example: Suppose 200, 300 and 500 items are produced by factories located at three cities X, Y
and Z. We wish to draw a sample of 20 items under proportional stratified sampling.
Proportional samples to be selected are:
For Factory X = 20 x (200/1000) = 4
For Factory Y = 20 x (300/1000) = 6
For Factory Z = 20 x (500/1000) = 10
Total = 20
5
In non-proportional stratified sampling equal number of samples is taken regardless of the size of
the stratum.
4. Cluster Sampling
The total population is divided into recognizable subdivisions, known as clusters such that within
each cluster units are more heterogeneous and between clusters, they are homogenous. The units
are selected from each cluster by suitable sampling techniques.
- Often employed to reduce cost of sampling a population scattered over a large geographic area.
1.2.4.2. Non-random (non-probability) sampling techniques
It is a sampling scheme in which there is no attachment of probability concept in selecting a
sample unit for the sample. Rather, the sample is selected with a definite purpose in view, and
the choice of the sampling units depends entirely on the discretion and judgment of the
investigator. Depending on the object of inquiry and other considerations, a predetermined
number of sample units are selected purposefully so that they represent the true characteristics of
the population. A serious drawback of this sampling design is that it is highly subjective in
nature. The selection of sample units depends entirely on the personal convenience, biases,
prejudices, and beliefs of the investigator. This method will be more successful if the investigator
is thoroughly skilled and experienced.
i. Judgment Sampling: The choice of sample items depends exclusively on the judgment of the
investigator. The investigator’s experience and knowledge about the population will help to
select the sample units. It is the most suitable method if the population size is small.
ii. Convenience Sampling: The sample units are selected according to convenience of the
investigator. It is also called “chunk” which refers to the fraction of the population being
investigated which is selected neither by probability nor by judgment. Further a list or
framework should be available for the selection of the sample. There is high chance of bias being
introduced. It is used to make pilot studies.
iii. Quota Sampling: It is a type of judgment sampling. Under this design quotas are set up
according to some specified characteristic such as age group, income groups etc. From each
group a specified number of units are sampled according to the quota allotted to the group.
Within the group the selection of sample units depends on personal judgment. It has a risk of
personal prejudice and bias entering the process. This method is often used in public opinion
studies.
iv. Snowball Sampling: Snowball sampling involves identifying one or a few initial participants
who meet the criteria for the study and then asking them to refer other potential participants. This
method is often used when the population of interest is difficult to identify or access.
6
1.3.SAMPLING DISTRIBUTIONS
Sampling distribution describes the way in which a statistic or a function of statistics, which is
the function of the random variables X1, X2, ……., Xn, will vary from one sample to another
sample of the same size. Such sampling distributions have provided a foundation for the number
of test statistics used in hypothesis testing. We are often concerned with sampling distribution in
sampling analysis. If we take certain number of samples and for each sample compute various
statistical measures such as mean, standard deviation, etc., then we can find that each sample
may give its own value for the statistic under consideration. All such values of a particular
statistic, say mean, together with their relative frequencies will constitute the sampling
distribution of the particular statistic, say mean. Accordingly, we can have sampling distribution
of mean, or the sampling distribution of standard deviation or the sampling distribution of any
other statistical measure. It may be noted that each item in a sampling distribution is a particular
statistic of a sample. The sampling distribution tends quite closer to the normal distribution if the
number of samples (sample size) is large. The significance of sampling distribution follows from
the fact that the mean of a sampling distribution is the same as the mean of the population.
1.3.1. Sampling Distribution of the mean
Sampling distribution of the sample means, x , is the probability distribution consisting of a list
of all possible sample means of a given sample size selected from a population, and the
probability of occurrence associated with each sample mean. It is the probability distribution of
all possible values of the sample means x .The number of samples that can be drawn from a
population depends on whether we sample with replacement or without replacement.
Example: Consider a finite population with five elements labeled A, B, C, D, and E. If a sample
of size 2 is to be selected at random from the population without replacement.
A. How many possible samples of size 2 is possible?
B. List each of the different possible sample combinations.
Solution
N = 5 and n = 2 (NCn)= N!
n!(N-n)!
A. (5C2)= 5! = 5! = 10
2!(5-2)! 2!(3!)
B. AB, AC, AD, AE, BC, BD, BE, CD, CE, DE.
7
❖ In establishing relationship between the population and the sampling distribution of
the mean the following symbols will be used.
N = population size n = sample size
= population mean x = sample mean
= population standard deviation or X =mean of sampling distribution of the means
X
X
1 =1
i
2 + 4 + 6 + 8 + 10 30
= = =6
The population mean ()= N 5 5
( Xi − )
2
8
The mean of the distribution of sample means (grand mean) is obtained by summing the various
sample means and dividing the sum by the number of samples (N).
The grand mean, of the distribution of these sample means is:
X
10
Xi
i =1 3 + 4 + 5 + 6 + 5 + 6 + 7 + 7 + 8 + 9 60
= = = =6
X
N 10 10
The grand mean has the same value as the population mean.
Let’s organize this distribution of sample means into a frequency distribution and probability
distribution.
(X i − )
2
X = X
, Where N = number of sample means.
N
However, it is not possible to take all possible samples from the population; we must use
X
alternative method to compute
If mean is given for a finite population,
X =
.
(N − n ) Where = Population standard deviation
n N −1
N = Population size
n = Sample size.
9
(N − n )
Since we generally deal with very large population, which can be considered infinite,
(N − 1)
would approach 1.
Hence, X =
for the above example of the sampling distribution of the hourly wage of 5
n
employees, the standard error of the means is: 2.83/√2 =2.0011
N −n
- The factor is known as the finite correction factor and should be used when population
N −1
size is finite.
X
➢ Any single sample mean will become closer to the population mean, as the value of
decreases.
Properties of Sampling distribution of the mean
✓ The mean of the sampling distribution of the means and the mean of the population are
equal.
✓ The spread of the sample means in the distribution is smaller than in the population values.
For instance, the spread in the distribution of sample means above is from 3 to 9, while the
spread in the population was from 2 to 10.
The central limit theorem
The Central Limit Theorem States that: “Regardless of the shape of population distribution, the
distribution of the sample means approaches the normal probability distribution as the sample
size increases.”
The question is how large should the sample size be in order for the distribution of sample means
to approximate the normal distribution for any type of population. In practice, the sample sizes
of 30 or larger are considered adequate for this purpose. This should be noted however, the
sampling distribution would be normally distributed if the original population is normally
distributed, no matter what the sample size.
The discussion on the sampling distribution is concerned with the proximity to a sample mean to
the population mean. It can be seen that the possible values of sample means tend towards the
population mean, and according to central limit theorem, the distribution of sample means tends
to be normal for a sample size of “n” being larger than 30. The sampling distribution of the mean
approaches normal distribution as the sample size increases.
10
Distribution of the standardized statistics for the sample mean
In order to use the central limit theorem, we need to know the population standard deviation
when it is not known the standard deviation of the sample, designated by S is used to
approximate it. The standardized distribution of the sample means is Z and
x− x−
Z = , if the population standard deviation is known or , if the population standard
s
n n
deviation is unknown.
Example 1:
The annual wages of all employees of a company has a mean of 20,400 per year with standard
deviation of 3200. The personnel manager is going to take a random sample of 36 employees and
calculate the sample mean wage. What is the probability that the sample mean will exceed
21,000?
n= 36 = 20,400 and =3200
x− 21000 − 20400
P[ x > 21,000] = = = 1.125
3200
n 36
Using standard Table the probability of Z1.13 is 0.3708, the area under the normal curve less
than-1.13 is 0.5-0.3708=0.1292. P (Z > 1.13) = 0.1292
Therefore, the probability that the sample mean will exceed 21, 000 is approximately 13%.
Example 2:
A company makes engine used in speedboats. The company’s engineers believe that the engine
delivers an average power of 220 horse power /HP/ and that the standard deviation of power
delivered is 15 HP. A potential buyer intends to sample 100 engines. What is the probability that
Using standard Table the probability of Z-2 is 0.4772, the area under the normal curve less than
-2 is 0.5-0.4772=0.0228. P (Z < -2) = 0.0228
Thus, the probability that the potential buyer’s tests will result in a sample mean lower than 217
HP is 2.28%.
11
1.3.2. Sampling distribution of the proportion
The population proportion, represented by “P " , is the proportion of items in the entire
population with the characteristic of interest. The sample proportion, represented by p , is the
proportion of items in the sample with the characteristic of interest. The sample proportion, a
statistic, is used to estimate the population proportion, a parameter.
To calculate the sample proportion, you assign possible values 0 to 1, to represent the presence
or absence of the characteristic.
Sample Proportion
p = P (1 − P ) / n
In most cases in which inferences are made about the proportion, the sample size is substantial
enough to meet the conditions for using the normal approximation. Therefore, in many instances,
you can use the normal distribution to estimate the sampling distribution of the proportion.
Finding Z for the sampling distribution of the proportion
p −P
Z=
√P(1 − P)/n
12
Where
p =Sample proportion
P =Population proportion
n=sample size
Example
To illustrate the sampling distribution of the proportion, suppose that the manager of the local
branch of a bank determines that 40% of all depositors have multiple accounts at the bank. If you
select a random sample of 200 depositors, what is the probability that the sample proportion of
depositors with multiple accounts is less than 0.30?
Solution
p −P
Z=
√P(1 − P)/n
0.3−0.4
Z= = -2.89
√.40(1−.4)/200
Using standard Table, the probability is 0.4981, the area under the normal curve less than-2.89 is
0.5-0.4981=0.0019. Therefore, if the population proportion of items of interest is 0.40, only
0.19% of the samples would be expected to have sample proportions less than 0.30.
In the above discussion, the assumption made is if “n” is very large. If the sampled population is
finite, or small, we will make some adjustments to the standard error of the sample proportions
using the finite-population correction factor.
Exercise:
A manufacturer of screws has noticed that on average two percent of the screws produced are
defective. A random sample of 400 screws is examined for the proportion of defective screws.
Find the probability that the proportion of defective screws in the sample is between 0.01 and
0.03.
13