Statistics for finance Chapter -3- Sampling and Sampling Distributions
2
CHAPTER THREE
3. SAMPLING AND SAMPLING DISTRIBUTIONS
I. Sampling Theory
Basic Concepts of Some Statistical Terms:
Population: aggregation of the elements from which a sample is actually selected. It is the entire
group of individuals or objects under consideration.
Sample: it is a subgroup or part of the population selected by some method in order estimate
population characteristics.
Elementary unit (unit of analysis): an element or group of elements on which information is
required or it is the object that we observe or measure. Thus, persons, vehicles, households, farms
are examples of elementary units.
Sampling units: for the purpose of sample selection, the population is divided in to a finite number
of distinct, non-overlapping and identifiable units called sampling units.
Sample Frame: is a list of elements covering the survey population, and serves as a base for sample
selection.
Data: These are measurements or observations (values) recorded for each element.
Variable: is a characteristic or attribute that can assume different values.
Population parameters: These are facts about population/descriptions of population.
Statistic: it is characteristic or a fact about a sample.
THE NEED FOR SAMPLING:
The following points summarize the benefits of studying samples.
Sampling can save time and money. A sample study is usually less expensive than a census
study and produces results at a relatively faster speed. There could be resource (time, finance,
manpower, etc.) limitations which would make it difficult to study the whole population.
Sampling may enable more accurate measurements for a sample study is generally conducted
by trained and experienced investigators
Page 1 of 15
Statistics for finance Chapter -3- Sampling and Sampling Distributions
Sampling remains the only choice when a test involves the destruction of the item under study.
In some cases, tests may be destructive. For example, when we test the breaking strength of
materials, we must destroy them. A census would mean complete destruction of materials. In
such a case, we must sample.
Sampling provides much quicker results than does a census. When the time between the
recognition of the need of information and the availability of that information is short, sampling
helps not to miss the information.
Sampling is the only process possible if the population is infinite.
Sampling usually enables to estimate the sampling errors and, thus, assists in obtaining
information concerning some characteristic of the population
ERRORS IN SAMPLING
Sampling result may not always be correct, because sample results are either based on partial or
incomplete analysis of the population. This error is referred to as the sampling error.
Sampling error (estimation error): is caused by observing a sample instead of a whole
population.
Non-sampling errors: arise during both census as well as sampling surveys due to biases and
mistakes. The errors that occur in the collection, recording, and tabulation of data
are called non-sampling errors
TYPES OF SAMPLING
Several alternative ways to take a sample are available. The main alternative sampling plans may be
grouped into two categories: probability techniques and non-probability techniques.
In probability sampling, every element in the population has a known, nonzero probability of
selection. The simple random sample, in which each member of the population has an equal
probability of being selected, is the best-known probability sample.
In non-probability sampling, the probability of any particular member of the population being
chosen is unknown. The selection of sampling units in non-probability sampling is quite arbitrary,
as researchers rely heavily on personal judgment.
Page 2 of 15
Statistics for finance Chapter -3- Sampling and Sampling Distributions
SAMPLING DISTRIBUTIONS
Sampling distribution is the distribution of the individual values included in a sample. It is a
distribution of all the possible values of a sample static for a given size of sample from a population.
In the previous chapter we defined a random variable as a numerical description of the outcome of
an experiment. If we consider the process of selecting a simple random sample (a type of sampling
that every unit in the population has an equal and known chance of being selected in the sample) as
an experiment, the sample mean X is the numerical description of the outcome of the experiment.
Thus, the sample mean X is a random variable. As a result, just like other random variables, X
have a mean or expected value, a standard deviation, and a probability distribution. Because the
various possible values of X are the result of different simple random samples, the probability
distribution of X is called the sampling distribution of X .
Sampling Distribution of the Mean ( X )
It is a probability distribution of all possible sample means of a given sample size. The sampling
distribution of the mean is described by determining the mean of such a distribution, which is the
expected value E( X ), and the standard deviation of the distribution of sample means, designated
by X .
Example: Assume there is population size N=4 Random variable, X, is age of individuals. Values of
X: 18, 20, 22, 24 (years). Find population mean and standard deviation.
μ
X i
18 20 22 24
21
N 4
σ
(X i μ) 2
2.236
N
Example 2: ABC Industries has seven production employees (considered the population). The
hourly earnings of each employee are given below.
Employee Hourly Earnings
A Br 7
B 7
C 8
D 8
E 7
F 8
G 9
Page 3 of 15
Statistics for finance Chapter -3- Sampling and Sampling Distributions
REQUIRED:
(1) Determine the population mean?
Ans: The population mean is Br 7.71, found by
X 7 7 8 8 7 8 9 54
Br 7.71
N 7 7
(2) Determine the sampling distribution of the sample mean for samples of size 2?
To arrive at the sampling distribution of the sample mean, we need to select all possible samples of 2 without
replacement from the population, and then compute the mean of each sample. There are 21 possible samples:
N n 7 2 21
Where N = 7 is the number of items in the population and n = 2 is the number of items in the
sample.
Summarized results of Sampling Distribution of the Sample Mean for n = 2
Sample MeanNumber of Means Probability
7.00 3 0.1429
7.50 9 0.4285
8.00 6 0.2857
8.50 3 0.1429
21 1.00
(3) What is the mean of the sampling distribution?
The mean of the sampling distribution of the sample mean is obtained by summing the
various sample means and dividing the sum by the number of samples. The mean of all the
sample means is usually written x . The reminds us that it is a population value
because we have considered all possible samples. The subscript x indicates that it is the
sampling distribution of the sample mean.
x Tota
Sum of a ll sa mpleme a ns (7 x3) (7.50 x9) (8 x 6) (8.50 x3)
lnumbe rof sa mple s
21
162
21
Br 7.71
The following observations can be made about the population and the sampling distribution.
The mean of the distribution of the sample mean (Br 7.71) is equal to the mean of the
population.
The range (spread) in the distribution of the sample mean is less than the range (spread)
in the population values. The samples mean ranges from Br 7.00 to Br 8.50, while the
population values vary from Br 7.00 up to Br 9.00.
Page 4 of 15
Statistics for finance Chapter -3- Sampling and Sampling Distributions
The Standard Deviation of the Sample Mean
Let us define the standard deviation of the sampling distribution of X . We will use the following
notation.
σ x = The Standard Deviation of x
σ = The Standard Deviation of Population
n = The sample Size and
N = The population Size
It can be shown that the formula for the standard deviation of X depends on whether the
population is finite or infinite. The two formulas for the standard deviation of X follow
STANDARD DEVIATION OF X
Finite Population Infinite Population
N n
x x
N 1 n n
x
As a general rule, when n < 0.05N, we can use the formula
n
Example:
Take the above example & determine population standard deviation and obtain the standard
deviation, x , of the variable X for samples of size 2. Indicate any apparent relationship between
x and
Solutions:
To determine the population standard deviation we will use the formula:
( xi )2
N
2 2 2 2 2 2 2
( 7 7.71) ( 7 7.71) (8 7.71) (8 7.71) ( 7 7.71) (8 7.71) ( 9 7.71)
7
3.4287
0.4899 0.6998 Populationstandarddeviation
7
To obtain the standard deviation of the variable X for samples of size 2, we apply the following
formula:
(( 7 7.71) 2 x3) ((7.50 7.71) 2 x9) ((8 7.71) 2 x 6) ((8.50 7.71) 2 x3)
21
4.2861
x 0.2047 0.4517 Standard deviation of a Sample Mean
21
OR
Page 5 of 15
Statistics for finance Chapter -3- Sampling and Sampling Distributions
When sampling is done without replacement from a finite population, as in the above example:
x
N n
N 1 n
Thus, x
N n
N 1 n
=
7 2 0.6998 = 5
0.4948
7 1 2 6
= 0.4517
When sampling is done with replacement from a finite population or when it is done from an
infinite population, the appropriate formula is x
n
Example 1:
Suppose the mean of a very large population is = 50 and the standard deviation of the
measurements is = 12. We determine the sampling distribution of the sample means for a sample
size of n = 36, in terms of the expected value & the standard error of the distribution, as follows:
Example 2:
Suppose that in the above Example the sample of n = 36 values were taken from a population of just
100 values. The sample thus constitutes 36 percent of the population. The expected value and
standard error of the sampling distribution of the mean are:
Example 3:
As reported by MOFED, the mean living expense for a single-family is 1,742 Birr. Assume a
standard deviation of 568 Birr.
A. For samples of 25 single-family, determine the mean and standard deviation of the variable x .
Page 6 of 15
Statistics for finance Chapter -3- Sampling and Sampling Distributions
B. Repeat part (a) for a sample of size 500.
Exercise:
The mean wage per hour for all 5000 employees who work at a large company is $17.50 and the
standard deviation is $2.90. Let x be the mean wage per hour for a random sample of certain
employees selected from this company. Find the mean and standard deviation of x for a sample
size of: (a) 30 (b) 75 (c) 200
Ans: (a) x = = $17.50 and x = $ 0.529
(b) x = = $17.50 and x = $ 0.335
(c) x = = $17.50 and x = $ 0.205
From the preceding calculations we observe that the mean of the sampling distribution of x is
always equal to the mean of the population whatever the size of the sample. However, the value of
the standard deviation of x decreases from $ 0.529 to $ 0.335 and then to $ 0.205 as the sample size
increases from 30 to 75 and then to 200
N.B:
The larger the sample size, the smaller is the standard deviation of x .
The smaller the standard deviation of x , the more closely the possible values of x (the possible
sample means) cluster around the mean of x .
The mean of x equals the population mean
The Central Limit Theorem
If the population or process from which a sample is taken is normally distributed, then the
sampling distribution of the mean also will be normally distributed, regardless of sample size.
However, what if a population is not normally distributed? Remarkably, a theorem from
mathematical statistics still permits application of the normal distribution with respect to such
sampling distributions.
Page 7 of 15
Statistics for finance Chapter -3- Sampling and Sampling Distributions
The central limit theorem states that as sample size is increased, the sampling distribution of the
mean (and for other sample statistics as well) approaches the normal distribution in form,
regardless of the form of the population distribution from which the sample was taken.
In selecting random samples of size n from a population, the sampling distribution of the sample
mean can be approximated by a normal distribution as the sample size becomes large.
The sample size is usually considered to be large if n > 30.
Thus, according to the central limit theorem,
1. When n > 30, the shape of the sampling distribution of x is approximately normal irrespective
of the shape of the population distribution.
2. The mean of x , x is equal to the mean of the population, .
3. The standard deviation of x , x
n
Again, remember that to apply x formula, n < 0.05N
n
Determining Probability Values for the Sample Mean
If the sampling distribution of the mean is normally distributed, either because the population is
normally distributed or because the central limit theorem is invoked, then we can determine
probabilities regarding the possible values of the sample mean, given that the population mean and
standard deviation are known. The process is similar to determining probabilities for individual
observations using the normal distribution, as described in chapter 2. In the present application,
however, it is the designated value of the sample mean that is converted into a value of z in order to
use the table of normal probabilities. This conversion formula uses the standard error of the mean
because this is the standard deviation for the x variable. Thus, the conversion formula is
Page 8 of 15
Statistics for finance Chapter -3- Sampling and Sampling Distributions
Example:
An auditor takes a random sample of size n = 36 from a population of 1,000 accounts receivable. The
mean value of the accounts receivable for the population is = Br 260, with the population
standard deviation =Br 45. What is the probability that the sample mean will be less than Br 250
f ( x)
250 260 Mean acct Balance
- 1.33 0 z
Therefore:
Using the above example what is the probability that the sample mean will be within Br 15 of the
population mean?
Page 9 of 15
Statistics for finance Chapter -3- Sampling and Sampling Distributions
Sampling distribution of difference between two sample means
Another sampling distribution that you will soon encounter is that of the difference between two
sample means. The difference between two sample means x 1 - x 2 is normally distributed if both
populations are normal. Through the use of the laws of expected value and variance we derive the
expected value and variance of the sampling distribution of x 1 - x 2.
And
Thus, it follows that in repeated independent sampling from two populations with means 1 and
2 and and standard deviations 1 and 2, respectively, the sampling distribution of x 1 - x 2 is
normal with mean
Page 10 of 15
Statistics for finance Chapter -3- Sampling and Sampling Distributions
And standard deviation (which is the standard error of the difference between two means)
Example:
Suppose that the starting salaries of MScs at Haramaya University (HU) are normally distributed,
with a mean of Br 62,000 and a standard deviation of Br 14,500. The starting salaries of MScs at the
Private University (PU) are normally distributed, with a mean of Br 60,000 and a standard deviation
of Br 18,300. If a random sample of 50 HU MScs and a random sample of 60 PU MScs are selected,
what is the probability that the sample mean starting salary of HU graduates will exceed that of the
PU graduates?
Given:
Haramaya University; 1 = 62,000 1 = 14,500 n = 50
Private university; 2 = 60,000 2 = 18,300 n = 60
We want to determine p( x 1 - x 2 > 0) . We know that x 1 - x 2 is normally distributed with mean
of 1 - 2 = 62,000 – 60,00 = 2,000 and standard deviation
= P (Z > -0.64)
= 0.50 + 0.2389 = 0.7389
There is a 0.7389 probability that for a sample of size 50 from the HU graduates and a sample of size
60 from the PU graduates, the sample mean starting salary of HU graduates will exceed the sample
means of PU graduates.
Page 11 of 15
Statistics for finance Chapter -3- Sampling and Sampling Distributions
POPULATION AND SAMPLE PROPORTIONS
The concept of proportion is the same as the concept of relative frequency discussed in previous
Chapters and the concept of probability of success in a binomial experiment. The relative frequency
of a category or class gives the proportion of the sample or population that belongs to that category
or class. Similarly, the probability of success in a binomial experiment represents the proportion of
the sample or population that possesses a given characteristic.
The population proportion, denoted by p, is obtained by taking the ratio of the number of elements
in a population with a specific characteristic to the total number of elements in the population. The
sample proportion, denoted by p̂ (pronounced p hat), gives a similar ratio for a sample.
EXAMPLE:
Suppose a total of 789,654 families live in a city and 563,282 of them own homes. A sample of 240
families is selected from this city, and 158 of them own homes. Find the proportion of families who
own homes in the population and in the sample.
Solutions
For the population of this city,
N = population size = 789,654
X =families in the population who own homes = 563,282
The proportion of all families in this city who own homes is
Now, suppose a sample of 240 families is taken from this city and 158 of them are homeowners.
Page 12 of 15
Statistics for finance Chapter -3- Sampling and Sampling Distributions
Then,
n = sample size = 240
x = families in the sample who own homes = 158
The sample proportion is
Sampling distribution of p̂
Just like the sample mean x the sample proportion p̂ is a random variable. Hence, it possesses
a probability distribution, which is called its sampling distribution.
Sampling Distribution of the Sample Proportion, p̂ is the probability distribution of the sample
proportion, p is called its sampling distribution. It gives the various values that p̂ can assume
and their probabilities.
The value of p̂ calculated for a particular sample depends on what elements of the population are
included in that sample
Mean and Standard Deviation of p̂
The mean of p̂ , which is the same as the mean of the sampling distribution of p is always equal to
the population proportion, p, just as the mean of the sampling distribution of x is always equal to
the population mean, .
Mean of the Sample Proportion: The mean of the sample proportion, p̂ , is denoted by p̂ and is
equal to the population proportion, p. Thus, p̂ = P
The standard deviation of p̂ denoted by σ pˆ is given by the following formula. This formula
is true only when the sample size is small compared to the population size. The sample size is said
to be small compared to the population size if n < 0.05N
Standard Deviation of the Sample Proportion The standard deviation of the sample proportion,
p̂ is denoted by p̂ and is given by the formula
pq
σ pˆ
n
Page 13 of 15
Statistics for finance Chapter -3- Sampling and Sampling Distributions
Where p is the population proportion, q = 1 - p, and n is the sample size. This formula is used when
n < 0.05N, where N is the population size.
However, if n > 0.05N, then p̂ is calculated as follows
pq N n
pˆ
n N 1
We use the concepts of the mean, standard deviation, and shape of the sampling distribution of to
determine the probability that the value of computed from one sample falls within a given interval.
The z value for p̂ is computed using the following formula.
pˆ p
z
σ pˆ
EXAMPLE -1-
The proportion of a population with a characteristic of interest is p = 0.37. Find the mean and
standard deviation of the sample proportion p̂ obtained from random samples of size 1,600.
Ans: Since p̂ = P = 0.37 and p̂ = 0.012
EXAMPLE -2-
A random sample of size 121 is taken from a population in which the proportion with the
characteristic of interest is p = 0.47. Find the indicated probabilities.
A. P (0.45 ≤ p̂ ≤ 0.50) Ans = 0.4154
B. P ( p̂ ≥ 0.50) Ans. = 0.2546
Page 14 of 15
Statistics for finance Chapter -3- Sampling and Sampling Distributions
Sampling Distribution of the Difference of Sample Proportions
When sampling is done from two populations with proportions p1 and p2 respectively, the sampling
distribution of the difference of sample proportions p1 p 2 approaches to a normal distribution
p1q1 p 2q 2
with mean p1 - p2 and standard deviation of as the sample sizes n1 and n2 increases.
n1 n2
Example:
It has been experienced that proportions of defaulters (in tax payments) belonging to business class
and professional class are 0.20 and 0.15 respectively. The results of a sample survey are:
Business class Professional class
Sample size: n1 = 400 n2 = 420
Proportion of defaulters: p1 = 0.21 p2 = 0.14
Find the probability of drawing two samples with a difference in the two sample proportions larger
than what is observed.
Solution:
Given
p1 = 0.20 p2 = 0.15
q1 = 1-0.20 = 0.80 q2 = 1-0.15 = 0.85
n1 = 400 n2 = 420
p1 0.21 p 2 0.14
Since the population is infinite and also the sample sizes are large, the central limit theorem applies.
i.e.
p1q1 p 2q 2 0.2 x0.80 0.15 x0.85
p1 p 2 = .0004 0.0003 = 0.0264
n1 n2 400 420
So we can find the required probability using standard normal variable
z
p1 p2 p1 p2
p1q1 p 2q 2
n1 n2
P( p1 p 2 > 0.07) = P(Z
0.21 0.14 0.20 0.15
0.2 x0.80 0.15 x0.85
400 420
= p (z>0.75) = 0.2266
Page 15 of 15