CH 3 Statistical Estimation
CH 3 Statistical Estimation
CHAPTER THREE
STATISTICAL ESTIMATION
INTRODUCTION
Managers in business, education, social work, and other fields make decisions without complete
information. Automobile manufacturers do not know exactly how many people will purchase
new cars next year. The college registrar does not know exactly how many students will enroll
next fall, but based on the past experience may lay down an estimate plan. Everyone makes
estimates. When you get ready to cross a street, you estimate the speed of the car that is
approaching, the distance between you and the car and your own speed. Having made these
quick estimates, you decide whether to wait, walk or run. In such decisions without complete
information, there is a considerable uncertainty.
In statistical inference, one estimates about the population based on the result obtained from the
sample selected from that population. Thus, estimation is a process by which we estimate various
unknown population parameters from sample statistics.
Any sample statistic that is used to estimate a population parameter is called an estimator and an
estimate is a numerical value of an estimator.
The sample mean is often used as an estimator of the population mean. Suppose that we calculate
the mean daily revenue of a store for a random sample of 6 days and find it to be 1,110 birr. If
we use this value to estimate the daily revenue for the whole year, then the value 1,110 birr
would be an estimate.
Definition of Terms:
Interval estimate – The interval, within which a population parameter probably lies, based on
sample information.
Point estimate – A single number computed from a sample and used to estimate a population
parameter.
Sampling error – is the difference between a sample statistic and its corresponding population
parameter.
Confidence interval – An interval estimate which is associated with degree of confidence of
containing the population parameter is called Confidence Interval.
TYPES OF ESTIMATION
1) POINT ESTIMATION
Point estimation is a statistical procedure in which we use a single value to estimate a population
parameter. A point estimate is a single number that is used as an estimate of a population
parameter, and is derived from a random sample taken from the population of interest.
Some of the most important point estimators are given below:
S
2
=
∑
n−1
Standard deviation, S = √ S2
Proportion, P x
Ṕ =
n
To set the price of a product, one strategy is competition-oriented in which you fix the price of
your product at the average level charged by other producers. Suppose you want to market a 200-
gram bar or soap that you produce. The current wholesale prices charged by a random sample of
10 soap producers (in birr) are:
1.00 1.35 1.50 0.95 0.90 1.25 1.00 1.20 0.90 and 1.50
What is an estimate of the mean wholesale price charged by all soap producers? Find an estimate
of the standard deviation in the wholesale prices of all the producers?
Solution: - The mean wholesale price or the population mean () is estimated by the sample
X = ∑ xi/n
mean X , given by i = (1.00 + 1.35 + ---- + 1.50) / 10 = 1.155
Thus, an estimate of the mean wholesale price charged by all soap producers is 1.155 Birr. Based
on this information, you might set the wholesale price per unit of your product at 1.155 Birr.
The standard deviation in the wholesale prices of all producers, what we call the population
standard deviation () and is estimated by the sample standard deviation.
∑ ( Xi − X )2
S=
√ i
n−1
=
(1.00 − 1.155 )2 + ( 1.35 −1.155 )2 + −−−−+ ( 1.50 − 1.155 )2
√
= 0.237
9
Thus, the wholesale prices fluctuate below and above their mean by about 0.237 Birr, which is
an estimate of the standard deviation in the wholesale prices of all producers.
Suppose you are interested to know the proportion of fishes that are inedible as a result of
chemical pollution of a certain lake. In a random sample of 400 fishes caught from this lake, 55
were found out to be inedible. Out of all fishes in this lake, what is an estimate of the proportion
of inedible fishes?
Solution: -
The proportion of inedible fishes in the entire lake is what we call population proportion ( P ).
Thus is estimated by the sample proportion:
x 55
P= = = 0.1375 = 13.75 percent.
n 400
Although point estimates are often useful, they do have one serious drawback: we do not know
how close or far these values are from the population value they are supposed to estimate, and
hence, we cannot be certain of their reliability. In other words, a point estimate will be more
useful if it is accompanied by an estimate of the error that might be involved. To this end, we use
interval estimation.
2) INTERVAL ESTIMATION
Interval estimation is a statistical procedure in which we find a random interval with a specified
probability of containing the parameter being estimated. An interval estimate is an interval that
provides an upper bound and a lower bound for a specific population parameter whose value is
unknown. This interval estimate has an associated degree of confidence of containing the
population parameter. Such interval estimates are also called Confidence intervals and are
calculated from random samples.
The interval estimate is an interval that includes the point estimate. For example, if the sample
mean is say 0.28, one may report that the population mean is in the range of 0.25 and 0.31 with a
probability of 0.95. i.e. the 95 percent confidence interval of the population mean is (0.25, 0.31).
Clearly this interval contains the point estimated 0.28.
CONFIDENCE INTERVAL FOR THE POPULATION MEAN ()
CaseI. 1 Sampling from a normally distributed population with known variance 2
Recall that Z denotes the value of Z for which the area under standard normal curve to its right
is equal to . Analogously, Z/ 2 denotes value of Z for which the area to its right /2 and, Z/2
denotes the value for which the area to its left is / 2.
Consider the following figure
X−μ
P
(− Z α/2 <
σ / √n )
< Z α / 2 = 1 −α
P (− Z α / 2 . σ / √ n < X − μ < Z α / 2 . σ /√ n ) = 1− α
P ( X − Z α / 2 . σ / √ n < μ < X + Z α / 2 . σ / √ n ) = 1− α
Thus, a (1 - ) 100% confidence interval for the population mean is given by:
X ± Zα / 2 σ / √ n
α
Where X
Z
is the sample mean, α / 2 is the value of Z for which the area to its right is 2 .
Common confidence intervals are the 95 percent and the 99 percent confidence intervals. The 95
percent confidence interval means that about 95 percent of the similarly constructed intervals
will contain the parameter being estimated. If we use the 99 percent level of confidence, then we
expect about 99 percent of the intervals to contain the parameter being estimated.
Another interpretation of the 95 percent confidence interval is that 95 percent of the sample
means for a specified sample size will be within 1.96 standard deviations of the hypothesized
population mean. Similarly, for a 99 percent confidence interval, 99 percent of the sample means
will lie within 2.58 standard deviations of the hypothesized population mean.
If = 0.05, then the (1 -) 100 percent confidence interval, which is the (1 – 0.05) 100 = 95
percent confidence interval and if = 0.01, then the (1 -) 100 percent confidence interval will
be the (1 – 0.01) 100 % which is the 99 % confidence interval. Where is called the confidence
coefficient.
√( Xi − X )2
S= n−1
Then the 95 % confidence interval is given by
S
X ± 1 . 96
√n
And the 99 % confidence interval is given by
S
X ± 2 .58
√n Where
Given: n = 64, X = 19,200 miles, S = 2000 miles. Though we have no information about the
normality of the population by central limit theorem, for large n, say n 30. We assume that the
distribution is normal. In our case as n = 64 30 then we consider the normality.
X ± Zα / 2 S / √ n
Solution: -
Estimated mean = 402.7 grams
It is a point estimate
The interval is between 399.11 and 406.29 grams, found by:
S 8. 8
X ± 2 .58 = 402. 7 ± 2 .58
√n √ 40
399.11 and 406.29 are the two limits
.99 Or 99%.
If we were to construct 100 similar intervals, about 99 should include the population mean. Or
we are 99 % confident that the population mean is located in the interval.
Small sample confidence interval for the population mean: Sampling from a normally
distributed population with 2 unknown and n < 30.
If the population variance 2 is not known, then it must be estimated by the sample variance S2
as,
∑ ( Xi − X )2
i
S2 =
n−1
Under this situation, since 2 is estimated by S2, the sampling distribution of the mean deviates
from the Normal distribution for small size, or we say the sampling distribution of X follows
the students t distribution with n – 1 degrees of freedom.
For n > 30, the student t distribution can be approximated by the Normal distribution.
Like the Normal distribution, the t-distribution is symmetrical about the mean = 0. But it is flatter
as compared to the Normal distribution. However, as the sample size increases the t-distribution
losses its flatness and becomes approximately Normal.
The shape of the t-distribution is determined by the degrees of freedom. Degrees of freedom can
be defined as the number of values we can choose freely. Suppose we are dealing with a sample
of size n = 6, and we know the mean of these 6 numbers is 5. Symbolically, we have:
a+b +c+d+e+f
=5
6
Now, we are free to assign any value to a, b, c, d and e,
Say a = 3, b = 2, c = 4, d = 5 and e = 3. But, we are no more free to assign a value to f since:
MAU College of Business and Economics Dep’t of Accounting and FinancePage 8
Business Statistics Chapter 3 Statistical Estimation
a+b+c+d +e+f 17 + f
=5 ⇒ = 5 ⇒ 17 + f = 30
6 6
⇒ f =13
That is, in order for the mean of these 6 numbers to be 5, f must be 13. If we assign another
number for f, then the mean will not be equal to 5. Thus, we are free to choose only 5 values and
the 6th one is determined automatically.
Hence, the degrees of freedom is:
n–1=6–1=5
Generally, for a sample of size n, the degree of freedom is n – 1. The values of t for different
degrees of freedom and different values of X are tabulated. t (n – 1) denotes the value of t for
which the area under the curve to its right is equal to with (n – 1) degrees of freedom.
Example 1.
a. for n = 20 and = 0.025, find t (n –1)
Solution:
From the t-distribution table, t0.025 (19) = 2.093 (shaded area = 0.025)
b. If n = 26, = 0.005
Then t(n – 1) = t0.005 (25) = 2.787
(From the table of t-distribution)
(1 - ) 100 % Confidence interval for the population mean is given
Under such situations, a
by: X ± t α / 2 (n− 1) S / √ n
Example 2: One measure of a company’s financial health is its debt-to equity ratio. This quantity
is defined to be the ration of the company’s corporate debt to the company’s equity. If this ratio
is too high, it is one indication of financial instability. For obvious reasons, banks often monitor
the financial health of companies to which they have extended commercial loans. Suppose that,
in order to reduce risk, a large bank has decided to initiate a policy limiting the mean debt-to-
equity ratio for its portfolio of commercial loans to 1.5. In order to estimate the mean debt-to-
equity ratio of its loan portfolio, the bank randomly selects a sample of 15 of its commercial loan
accounts. Audits of these companies result in the following debt-to-equity ratios:
1.31 1.05 1.45 1.21 1.19
A stem-and-leaf display of these ratios is reasonably mound shaped. Furthermore, the sample
mean and standard deviation of these ratios can be calculated to be X = 1.343 and S = 0.192
Suppose that the bank wishes to calculate a 95% confidence interval for a loan portfolio’s mean
debt-to-equity ratio, . Since the bank has taken a small sample of size 15, it is appropriate to
calculate an interval based on the t distribution. We have n – 1 = 15 – 1 = 14 degrees of freedom,
and the level of confidence 100 (1 - ) percent = 95 percent implies that = 0.05. Therefore, we
use the t point t /2 = t0.05 / 2 = t 0.025 = 2.145 (from, the table). It follows that the 95 percent
confidence interval for is
S .192
(X ± t . 025
√n) [
= 1 .343 ± 2. 145
√15 ( )]
= 1.343 0.106
= 1.237, 1.449
This interval says that the bank is 95 percent confident that the mean debt-to-equity ratio for its
portfolio of commercial loan accounts is between 1.237 and 1449. Based on this interval, the
bank has strong evidence that the portfolio’s mean ratio is less than 1.5 (or that the bank is in
compliance with its new policy).
INTERVAL ESTIMATION OF THE POPULATION PROPORTION
Sample proportion ṕ is the unbiased point estimator for the population, p, and the sampling
distribution is normal when n is large ( np, nq≥5) with:
ṕ− p
z=
pq
√
n
Expression p= ṕ−z δ ṕ
Here however: p=unknown and therefore it is to be estimated using ṕ. The above expression
would become.
p= ṕ−z ṕ q́
√ n
Since z represents the confidence level we can write the above expression as
p= ṕ ± z α δ ṕ
2
p=0.39± 1.96 ¿)
I.
87 √
δ ṕ= (0.39)(0.61) = 0.0523
α
II. Compute =and work up z α from the table.
2 2
¿ 0.39 ±1.96 ¿)
¿ 0.39 ± 0.1025
MAU College of Business and Economics Dep’t of Accounting and FinancePage 11
Business Statistics Chapter 3 Statistical Estimation
0.2875≤ p ≤0.4925
Interpretation of results: We state with 95% confidence that the portion of companies
which used telemarketing to assist order processing lies between 0.2875 and 0.4925
DETERMINING THE SAMPLE SIZE IN ESTIMATION
Whenever we take a sample for inferential purposes, there is always a sampling error. This
sampling error is controlled by selecting a sample that is adequate in size. If the sample size is
small, then we may fail to achieve the objective of our analysis, and if it is too large, then we
waste the resources when we gather the sample.
1. When we estimate the population mean by the sample mean X , with probability (1 -
) the maximum error E will be:
E = Z / 2 / √n
2. With probability (1 -), the sampling error will not exceed some prescribed quantity E if
the sample size is at least:
2
Zα / 2 σ
n=
[ ] E
If n comes out fractional, round up to the next integer.
Example 3: The owner of a chain of hotels wants to determine the mean number of rooms
occupied per day (so that he can have an estimate of the average daily revenue obtained by
renting rooms). From past records, the standard deviation of the daily occupancy is known to be
9 rooms.
a. How large a sample of days should be taken so that the true mean number of rooms
occupied per day will not differ from the sample mean by more than 3 rooms at the 95
percent confidence level?
b. At the 99 percent confidence level, what is the maximum error committed in estimating
the true mean by the sample mean if a random sample of 64 days is taken?
Solution: -
Given = 9 rooms
E = 3 rooms, (1 - ) 100 % = 95 % = 0.05
Z / 2 = Z 0.025 = 1.96
2
Zα / 2 σ 1.96 x 9 2
n=
( ) (
E
=
3 )
= 34. 5744.