Topic 6
Topic 6
Topic 6
STATISTICAL ESTIMATION
Outline
I Point Estimates
Introduction
Population
What is the value of
parameter (e.g. µ)?
Random Sample
Use sample statistic to
estimate parameter.
The above figure illustrates the statistical estimation problem. We wish to estimate a characteristic
of a population. In more technical terms, we want to estimate a parameter. To do this, we take a
random sample from the population and use the sample to calculate a sample statistic that estimates
the parameter.
1
Topic 6: Statistical Estimation
I Point Estimates
Definition
A point estimate is a _____________________ that is used to estimate an unknown population
parameter.
Example
(a) The estimated average age of all ABC Company employees is 47.
The number _______ is the point estimate.
(b) The course has an estimated total enrolment of 350 students this year.
The number __________ is a point estimate.
(b) It is often __________________ as you do not know how wrong is the estimate and cannot
be certain of the ________________ of the estimate (i.e. unable to tell how close we are to
the true population value).
2
Topic 6: Statistical Estimation
µ x = Σx µˆ
n (mu hat)
2
σˆ 2
σ2 s 2 = Σ(x − x)
n−1 (sigma hat
square)
σ
s = s2 σˆ
(sigma hat)
Example 1
The average waiting time of customers at a popular restaurant during peak hours is 30 minutes. The
service quality manager of the restaurant implemented some time-saving measures and would like
to know if the target of reducing customers’ average waiting time by at least 10 minutes had been
achieved. 35 customers were randomly selected and data on their waiting time at the restaurant
during peak hours was recorded. The sample mean waiting time was 23.6 minutes with a standard
deviation of 6.4 minutes. Calculate the point estimates of the population mean and standard
deviation waiting time of customers after the time-saving measures were implemented.
Solution:
Example 2
A retiree is doing a study on the one-year rate of return of investment funds in Singapore for the
past year. A random sample of 15 investment funds was chosen to estimate the mean rate of return
of investment funds. The performance of the sample was listed below. Calculate the point
estimates of the population mean rate of return and the standard deviation.
Solution:
3
Topic 6: Statistical Estimation
Because the point estimate raises some doubt on the accuracy of the estimates, an interval estimate is
often used. A confidence interval (interval estimate) is a
____________________________________ (computed from sample data) used to estimate a
population parameter. This interval is likely to capture the true population value.
Example
(a) The mean life span of car batteries that a company manufactures is estimated to be between
34.6 to 37.4 months.
The range of values from _______________________ is the interval estimate.
The interval estimate describes a range of values within which the population mean is likely
to lie. In this case, we say that the interval estimate for the average life of all batteries might
be 34.6 < µ < 37.4.
3. Provides degree of confidence to estimate where the unknown population parameter lies.
Example the population mean is estimated between 50 &70 with 95% confidence.
This interval estimate is known as a confidence interval.
Example
We can calculate a 95% confidence interval that between an average of 33% to 39% of all
Singaporeans suffer from allergies.
Notice that this statement conveys a range of estimates of a population value (33% to 39%) as well
as the likelihood (95%) that the interval does indeed capture the population value.
4
Topic 6: Statistical Estimation
Confidence Level
A __________________________________ (usually a percentage or probability) is assigned before
an interval estimate is calculated. For instance, one may wish to be 95% confident that the interval
contains the true population mean.
Example
The 95% confidence interval for the population mean battery life is between 34.6 to 37.4 months.
We are 95% certain that the population mean battery life is between 34.6 to 37.4 months.
95% or 0.95 is the designated probability.
The probability that the population parameter lies within the interval
=Confidence level (1-α) = 0.95 1-α (Confidence Level) = 0.95
Sampling Sampling
Error Error
34.6 37.4
Lower Confidence Limit Point Estimate Upper Confidence Limit
Probability 1-α
90% 0.90
95% 0.95
80% 0.80
5
Topic 6: Statistical Estimation
E=1.4 E =1.4 Z
= 36 ± 1.4
6
Topic 6: Statistical Estimation
The confidence interval for the population mean with level of confidence 100(1-α)% is :
σ
X±Z
n
Since the population mean is at the middle of the normal distribution, the two endpoints of the
confidence interval should be of equal distance from the population mean. In other words, the area
under the normal curve that is consistent with probability of 1-α should be symmetric about the mean.
(a) (b)
7
Topic 6: Statistical Estimation
σ
X±Z
n
8
Topic 6: Statistical Estimation
Example 3
The average waiting time of customers at a popular restaurant during peak hours is 30 minutes. The
service quality manager of the restaurant implemented some time-saving measures and would like
to know if the target of reducing customers’ average waiting time by at least 10 minutes had been
achieved. 35 customers were randomly selected and data on their waiting time at the restaurant
during peak hours was recorded. The sample mean waiting time was 23.6 minutes with a standard
deviation of 6.4 minutes. Calculate with the help of a diagram, the 90% confidence interval for the
population mean waiting time of customers after the time-saving measures were implemented.
Solution:
9
Topic 6: Statistical Estimation
Student’s t Distribution
Z distribution
t (df = 13)
t - distribution t (df = 5)
0 Z or t
The t score in the t-Distribution Table performs the same function as that of the Z-score. However,
the area under the curve for the t-score depends on 2 factors:
1. Degree of freedom (n - 1)
2. Confidence Level (1 - α)
10
Topic 6: Statistical Estimation
Consider the t distribution with 9 degrees of freedom. Find the value of t such that the area in the
right tail of the distribution is 5%.
Consider the t distribution with 9 degrees of freedom. Find the value of t such that the area in the
left tail of the distribution is 5%.
Consider the t distribution with 9 degrees of freedom. Find the values t 1 and t2 such that each tail of
the distribution contains an area of 0.025.
11
Topic 6: Statistical Estimation
Case 4: Sample Size (n<30) and Population Standard Deviation ( ) unknown
When do you use t score for the calculation of confidence interval?
When the sample size is small, the t-distribution will be used in estimating the interval of a
parameter instead of the Normal Distribution, as the Central Limit Theorem no longer holds.
When the population standard deviation σ is unknown, the sample standard deviation s is
used to estimate σ.
Thus when the sample size is small ( i.e. n < 30), but the population standard deviation σ is
unknown, we will have the confidence interval formula to be
s
X±t
n
In summary, the formula to be used for the calculation of the confidence interval for the population
mean is summarized in the table below.
12
Topic 6: Statistical Estimation
Example 4
A retiree is doing a study on the one-year rate of return of investment funds in Singapore for the
past year. A random sample of 15 investment funds was chosen to estimate the mean rate of return
of investment funds. The performance of the sample was listed below. Calculate with the help of a
diagram, the 95% confidence interval for the population mean rate of return of investment funds for
the past year.
Solution
13
Topic 6: Statistical Estimation
There are 3 key considerations when we are calculating a confidence interval to estimate the
population mean.
The reliability of the estimate depends on the __________________of the confidence interval.
In general, we would desire to have a more reliable estimate. All other things constant, the higher
the confidence level, the ________________________will be the confidence interval.
Example
A 95% confidence interval will be more reliable than a 90% confidence interval for the same
population mean. Why?
To interpret a 95% confidence interval for the population mean life span of car batteries that a
company manufactures to be between 34.6 to 37.4 months, it means that
We estimate that the population mean life span of car batteries to be between 34.6 and 37.4 months,
and this estimate is correct 95% of the time.
Or we are 95% confident that the population mean life span of the car batteries is between 34.6 and
37.4 months.
Comparatively,
A 90% confidence interval for the population mean life span of car batteries means that the estimate
is only________________________________.
14
Topic 6: Statistical Estimation
The precision of confidence interval deals with the exactness of the estimate. This in turn depends on the
______________of the confidence interval or the size of the ________________of our estimate.
In general, we would desire to have a more precise estimate. That is to say, we would want the width of
the confidence interval to be ___________________ or the sampling error to be ______________.
Let us now examine the factors that can affect the width of the confidence interval and the size of
the sampling error of our estimate.
1−α
E=z σ E=z σ
n n
σ x σ x
x−z x+z
n n
W=2E
The width of the confidence interval is dependent on 3 factors that make up the formula:
1) Population Standard Deviation (σ)
2) Sample Size (n)
3) Z score which depends on the confidence level, (1 - α)
Thus, these are the 3 factors that can affect the width of the confidence interval, the size of the
sampling error of our estimate and the precision of the confidence interval.
15
Topic 6: Statistical Estimation
The width of the confidence interval can be widen or reduced. Consider the three factors that affect
the confidence interval: population standard deviation (σ), the sample size (n), and the confidence
σ
level (1 - α). Remember these 3 factors are actually components of the sampling error (SE), z .
n
Thus, changing each of these three components has an impact of the width of the confidence interval.
Keeping all other factors constant, the more that the population standard deviation σ can be
reduced, the smaller the sampling error. Consider an example where σ was assumed to be 75.
The interval estimate was 370.16 ± 29.40 = 340.76, 399.56. If σ increased to 150, the 95%
confidence interval estimate would become:
σ 150
x ±z = 370.16 ± 1.96 = 370.16 ± 58.80 = 311.36,428.96
n 25
Thus, doubling the population standard deviation has the effect of doubling the width of the
confidence interval estimate. This result is quite logical. If there is a great deal of variation in
the random variable (reflected by a large standard deviation), then it becomes more difficult to
accurately estimate the population mean. This difficulty is translated into a wider interval.
340.76 0 399.56
311.36 0 428.96
The reverse would also be true, i.e. a smaller the population standard deviation, the narrower
the confidence interval. We generally have no control over the value of σ.
16
Topic 6: Statistical Estimation
While we have no control over the value of σ, we do have the power to select values for the
sample size. Had the sample size been 100 instead of 25, the 95% confidence interval estimate
would be
σ = 370.16 150
x ±z ± 1.96 = 370.16 ± 14.70 = 355.46,384.86
n 100
Increasing the sample size decreases the width of the interval. A larger sample size provides more
potential information. The increased amount of information is reflected in a narrower interval.
However, there is another trade-off: Increasing the sample size increases sampling cost.
340.76 0 399.56
355.46 0 384.46
Thus, the reverse is also true, i.e. decreasing the sample size would also increase the width of
the confidence interval. We generally have control over the sample size n to take when
estimating the population mean.
17
Topic 6: Statistical Estimation
3. Confidence Level, (1 - α)
Finally keeping all other factors constant, if the confidence level (1 - α) is decreased, the
sampling error will be reduced. For example, a 95% confidence interval will be narrower than a
99% confidence interval based on the same information. If we had chosen 99% confidence
interval, the interval estimate would have been:
σ 75
x ±z = 370.16 ± 2.575 = 370.16 ± 38.63 = 331.53,408.79
n 25
As you can see, increasing the confidence level will widen the interval. However, a large
confidence level is generally desirable since that means a larger proportion of confidence
interval estimates will be correct in the long run. There is a direct relationship between the
width of the interval and the confidence level. This is because in order to be more confident in
the estimate, we need to widen the interval. Caution: The reduction of the confidence level
reduces the probability that the interval includes the value of the true population value.
340.76 0 399.56
n = 25, σ = 75, 1 - α = 0.99
331.53 0 408.79
We generally have control over the confidence level when estimating the population mean
using a confidence interval.
18
Topic 6: Statistical Estimation
Relationship between Confidence Level (Reliability) and the Width (Precision) of Confidence
Interval
High confidence level does not mean higher degree of accuracy. If more confidence is desired in an
estimate, the allowance for sampling error (SE) must be increased.
19
Topic 6: Statistical Estimation
1. Decision Making
In business, decision makers usually have to make decisions without prefect information of the
population mean values. They will have to
Example 5
The average waiting time of customers at a popular restaurant during peak hours is 30 minutes. The
service quality manager of the restaurant implemented some time-saving measures and would like to
know if the target of reducing customers’ average waiting time by at least 10 minutes had been
achieved. 35 customers were randomly selected and data on their waiting time at the restaurant during
peak hours was recorded. The sample mean waiting time was 23.6 minutes with a standard deviation of
6.4 minutes. The 90% confidence interval for the population mean waiting time of customers after the
time-saving measures were implemented is between 21.8150 and 25.3850 minutes.
(a) Explain if the target of the service quality manager of the restaurant had been achieved.
(b) Suggest how the service quality manager should respond to the results obtained.
Solution:
20
Topic 6: Statistical Estimation
Example 6
A retiree is doing a study on the one-year rate of return of investment funds in Singapore for the
past year. A random sample of 15 investment funds was chosen to estimate the mean rate of return
of investment funds. The performance of the sample was listed below. The 95% confidence interval
for the population mean rate of return of investment funds for the past year is between is between -
4.8876% and 4.3170%.
Based on the 95% confidence interval calculated above, explain if the following people should
invest in investment funds in Singapore.
(a) A retiree is risk-averse and does not want to incur a loss in his investment.
(b) A 30 year-old professional wants to invest his CPF Special Account balance, which
currently pays 6% guaranteed interest.
Solution
21
Topic 6: Statistical Estimation
Generally the more precise you want your estimate to be, the larger the sample you need to take.
However, too large a sample size would be costly. It would therefore be useful to be able to
determine the minimum sample size for a certain level of precision (E).
From the previous sections, we know that when we have an infinite population with known
standard deviation, the confidence interval for µ is ( x − zσ / n, x + zσ / n ). If σ is not given,
we use the sample standard deviation s as the estimate (i.e. x − zs / n , x + zs / n )
1−α
E=z σ E=z σ
n n
x
x−z σ x x+z σ
n n
Since E=z σ
n
Solving for n :
= zσ
n
E
Therefore,
zσ 2
n=
E
2
zs
If σ is unknown, use the point estimate s, i.e. n =
E
Always round up your answer to the nearest bigger whole number.
22
Topic 6: Statistical Estimation
Example 7
The average waiting time of customers at a popular restaurant during peak hours is 30 minutes. The
service quality manager of the restaurant implemented some time-saving measures and would like to
know if the target of reducing customers’ average waiting time by at least 10 minutes had been
achieved. 35 customers were randomly selected and data on their waiting time at the restaurant during
peak hours was recorded. The sample mean waiting time was 23.6 minutes with a standard deviation of
6.4 minutes. The 90% confidence interval for the population mean waiting time of customers after the
time-saving measures were implemented is between 21.8150 and 25.3850 minutes.
The service quality manager wants to estimate the population mean waiting time to within 1 minute
of its true value with 90% confidence. Calculate the minimum number of customers he should use
in his sample. Assume the standard deviation remains the same at 6.4 minutes.
Solution
23
Topic 6: Statistical Estimation
Example 8
A retiree is doing a study on the one-year rate of return of investment funds in Singapore for the
past year. A random sample of 15 investment funds was chosen to estimate the mean rate of return
of investment funds. The performance of the sample was listed below. The 95% confidence interval
for the population mean rate of return of investment funds for the past year is between is between -
4.8876% and 4.3170%.
Determine the minimum sample size required to reduce the width of the 95% confidence interval
above by half. Assume the standard deviation remains the same at 8.3098.
Solution
24
Topic 6: Statistical Estimation
Tutorial Questions
1.
2.
An apartment-finder service would like to estimate
the average cost of a one-bedroom apartment in Kansas
City. A random sample of 41 apartment complexes
yielded a mean of $310 with a standard deviation of $29.
Construct a 90% confidence interval for the mean cost
of one-bedroom apartments in Kansas City.
3.
An investment advisor believes that the return on interest-
sensitive stocks is approximately normally distributed. A
sample of 24 interest-sensitive stocks was
selected, and their yearly return (including dividends and
capital appreciation) was as follows (in percentages):
11.1, 12.5, 13.6, 9.1, 8.7, 10.6, 12.5, 15.6, 13.8, 8.0, 10.9, 7.6,
5.2, 1.2, 12.8, 16.7, 13.9, 10.1, 9.6, 10.8, 11.6, 12.3, 12.9, 11.6
Find a 90% confidence interval for the mean yearly return on
interest-sensitive stocks.
4.
A quality-assurance manager uses a random sample of size 25 to construct a 95%
confidence interval on the average weight of a bag of dry feed. This interval is found to be
49.9174 to 50.0826.
a. What is the point estimate of the average weight of a bag of dry feed?
b. What is the sample standard deviation of the weight of a bag of dry feed?
25
Topic 6: Statistical Estimation
5.
6.
If a sample size of 70 was necessary to estimate the
mean of a normal population to within 1.2 with 90% confidence,
what is the approximate value of the standard
deviation of the population?
26
Topic 6: Statistical Estimation
Question 7
(a) State two (2) reasons why statistical estimation or inferences have confidence levels
associated with them.
(4 marks)
(b) Monthly mobile phone price plans offered by various Telco companies in Singapore are very
competitive. Singtel is keen to experiment with an innovative bundled services mobile plan
for its subscribers. To determine the monthly price to charge for this new plan, it randomly
selected 15 mobile plan subscribers and the monthly price paid for their mobile plans are
shown in the following table.
(ii) Construct the 90% confidence interval for the population mean monthly price paid
by mobile plan subscribers.
(9 marks)
(iii)
Determine the minimum sample size required if Singtel wants to estimate the
mean monthly price paid by mobile plan subscribers to be within $2 of the true
mean with 95% confidence. The standard deviation is estimated to be $10.
(5 marks)
27
Topic 6: Statistical Estimation
Question 8
(a) Explain how an increase in sample size would affect the confidence interval.
(3 marks)
(b) For interval estimation, explain if a wide confidence interval is better than a narrow
confidence interval for the same confidence level.
(3 marks)
(c) According to the Managing Director (MD) of a food court chain in Singapore, the average
monthly sales per stall in his food court was $160,000. To maintain profitability, he set a
target for each food court outlet manager to improve average monthly sales per stall by at
least $5000. 3 months after the target was set, the MD wanted to determine if his target was
met. 100 stalls in his food court were selected at random. The sample mean monthly sales
was $175,000 and the standard deviation was $45,500.
(i) State the point estimates for the population mean and population standard deviation
of the monthly sales per stall 3 months after the target was set.
(2 marks)
(ii) Construct the 95% confidence interval for the mean monthly sales per stall 3
months after the target was set.
(9 marks)
(iii) Using the confidence interval constructed in part (c)(ii), explain if the MD’s target
was met.
(3 marks)
28
Topic 6: Statistical Estimation
Student’s t-Table α
tα
Level of Significance for One-
Tailed Test
df
0.1 0.05 0.025 0.01 0.005 0.001 0.0005
1 3.078 6.314 12.706 31.821 63.657 318.310 636.619
2 1.886 2.920 4.303 6.965 9.925 22.326 31.599
3 1.638 2.353 3.182 4.541 5.841 10.213 12.924
4 1.533 2.132 2.776 3.747 4.604 7.173 8.610
5 1.476 2.015 2.571 3.365 4.032 5.893 6.869
6 1.440 1.943 2.447 3.143 3.707 5.208 5.959
29