TOAE201-LecturerNotes-Chapter 5. Parameter Estimation
TOAE201-LecturerNotes-Chapter 5. Parameter Estimation
(Some contents of this documant are based on Lecture Notes of Panayiotis Skordi, Fullerton University)
Chapter 5
Parameter Estimation of Random Variables
in economics and business
5.1. Introduction
Inferential Statistics
Statistical inference is the process by which we gather information about populations
from samples.
We can gather information from a sample e.g. the mean. We then use this mean derived
from the sample as an estimate for the population mean.
Estimation
1) Point Estimate
2) Interval Estimate
Point Estimate
A point estimator draws inference about a population by estimating the value of an
unknown parameter using a single value or point.
It is a single value best describing the population of interest. The sample mean is the
most common. The advantage of a point estimate is that it is easy to calculate and easy to
understand.
1) The estimate will almost certainly be wrong i.e. not exactly the figure that would have
been derived from using the whole population (most samples will not be exact
representations of the population).
3) We would expect that as the sample size increases, the sample would produce results
that are more accurate. The point estimate does not explicitly account for the size of the
sample (it does so implicitly but it is not measured directly)
Interval Estimate
Confidence Interval
Range of values used to estimate a population parameter and is associated with a specific
confidence level.
The range of values is constructed from sample data so that the population parameter is
likely to occur in that range at a specified probability – the confidence level.
Confidence Level (1 – )
A confidence level, (1 – ), is the probability that the interval estimate will include the
population parameter .
Confidence Level
−
CONFIDENCE INTERVAL
Level of Significance,
This is the complement to the confidence level. If the confidence level is 90% then the
significance level is 10%. The level of significance is commonly denoted by .
The level of significance is also referred to as the probability of making a type I error
(See Chapter 6).
z is known as the critical z score, which is the number of standard deviations, away
2
from the mean point of zero, on a standard normal chart, that .5- 2 %.of the
observations lie. It is based on the confidence level, 1 − , chosen.
It is the point on the Z axis such that the area under the standard normal curve to the right
of the point z is 2 .
2
−
Z
0
Using the properties of the normal distribution we can derive the following chart from
above:
Z
0
And this leads to the following chart:
−
Z
− 0
Suppose we wanted to construct an interval estimate with a 90% confidence level. This
confidence interval corresponds to a z score, z = z 5% , from the standard normal tables
2
equal to 1.645.
90% Confidence Interval
90%
5% 5%
z
0
90%
− z = − z 5% + z = + z 5%
2 2
We begin by choosing a sample of size n and calculate the sample mean X .According to
the central limit theorem, the distribution of X , will be normally distributed, if X is
normally distributed. X will be approximately normally distributed if X is not normal
and the sample size n is at least 30.
X −
This means that Z = has a standard normal distribution.
n
−
Z
− 0
In terms of formulas:
P − z Z z = 1 −
2 2
P X − z X + z = 1 −
2 n 2 n
Conclusion
With repeated sampling from the population, the proportion of values of X for which the
interval
X − z , X + z
2 n 2 n
includes the population mean is equal to 1 − . This gives the confidence interval
estimator of . So the ( 1 − ) % confidence interval for is given by the limits above.
This emphasizes the fact that it is the interval and not that is random.
−
x − z x x x + z x
2 2
=AVERAGE(data set) in Excel gives us the point estimate or mean of the data
=CONFIDENCE(α,σ,n) in Excel gives us the margin of error, E
where:
is the significance level
is the population standard deviation
n is the size of the data set
The point estimate for the true underlying height of all students would be 5’10”.
The 90% interval estimate can be found by applying the above formulae.
2
= 5% or 0.05 as = 10% or 0.10
z = 1.645 Since
2
90%
5% 5%
–1.645 0 1.645
90%
x = 5’10” = 4.1 n = 45
4.1 4.1
x = as x = z x = 1.645( ) = 1.0
45 n 2 45
4.1
Upper limit x + z x = 5'10"+(1.645) = 5'11"
2 45
4.1
Lower limit x − z x = 5'10"−(1.645) = 5'9"
2 45
90% Confidence Interval
90%
5% 5%
5'9" 5'10" 5'11"
90%
According to our results, our 90% confidence interval for this random sample of students’
heights is between 5’9” and 5’11”.
Using Excel
We can find the amount spent on take out, per week of this class.
Suppose we find that this class forms a sample of 45 students and that the average
amount spent on take out is $78.25 and the population standard deviation is $37.50.
2
= 5% or 0.05 as = 10% or 0.10
z = 1.645
2
x = $78.25 = 37.5 n = 45
37.50 37.50
x = z x = 1.645( ) = 9.20
45 2 45
37.5
Upper limit x + z x = $78.25 + (1.645) = $87.44
2 45
37.5
Lower limit x − z x = $78.25 − (1.645) = $69.05
2 45
90% Confidence Interval
90%
5% 5%
$69.05 $78.25 $87.44
90%
According to our results, our 90% confidence interval for this random sample of students’
average takeout food expenditure, is between $69.05 and $87.45.
It would be wrong to say that there is a 90% probability that the population mean lies
within say, the interval (69.05, 87.45). This would suggest that the mean is a variable.
It would be true to say that (69.05, 87.45) is a 90% confidence interval for . This
emphasizes the fact that it is the interval and not that is random. In the long run 90%
of the intervals will include and 10% will not include .
Thus the confidence level applies to the estimation procedure and not to any one interval.
In constructing ten 90% confidence intervals, we would expect about 90% (i.e. 9) of
these confidence intervals to contain the population mean. About 10% (i.e. 1) would not.
So far we have only considered 90% confidence levels. We can choose any confidence
level but the ones most commonly used are shown below:
Confidence Significance
Level Level
(1- ) z − z
2 2 2
0.90 0.10 0.05 1.645 -1.645
0.95 0.05 0.025 1.960 -1.960
0.98 0.02 0.01 2.330 -2.330
0.99 0.01 0.005 2.575 -2.575
Re-work example with “take out meal” expenditure per week, using different confidence
levels.
Confidence
Level z x Sample Mean Lower Limit Upper Limit
2 2
0.90 0.10 0.05 1.645 $5.59 $78.25 $69.05 $87.45
0.95 0.05 0.025 1.960 $5.59 $78.25 $67.29 $89.21
0.98 0.02 0.01 2.330 $5.59 $78.25 $65.23 $91.27
0.99 0.01 0.005 2.575 $5.59 $78.25 $63.86 $92.64
You can see from the table that by increasing the confidence level, our interval estimate
of the population mean becomes wider and less precise.
Estimators Characteristics
Selecting the right sample statistic to estimate a parameter value depends on the statistic
characteristics. Desirable characteristics for an estimator are:
Unbiased
An unbiased estimator is one whose expected value is equal to the parameter it estimates.
Consistency
An unbiased estimator is said to be consistent if the difference between the estimator and
the parameter grows smaller as the sample size increases. (Addresses third disadvantage
of point estimation)
Relative Efficiency
Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 192
(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
For two unbiased estimators, the one with a smaller standard deviation is said to be
relatively efficient.
−
x − z x x x + z x
2 2
The width of the confidence interval is calculated by subtracting the lower limit
from the upper limit:
( x + z x ) − ( x − z x )
2 2
= 2 z
2 n
and is therefore affected by:
As the confidence level increases the interval estimate of the true population mean
becomes wider and less precise.
We can reduce the width of the interval but maintain the same confidence level. We do
this by increasing the sample size n, but keep the same confidence level. (This is the
principal of consistency)
Suppose we increase the size of our class from 45 to 90 students. Our standard error
37.50
becomes = $3.95
n 90
Since
x = $78.25 = 5% z = 1.645 = 37.5 n = 90
2 2
then
(37.50)
z x = 1.645 = 6.50
2 90
Our new 90% confidence interval for our original sample will have:
(37.50)
Upper limit x + z x = $78.25 + 1.645 = $84.75
2 90
(37.50)
Lower limit x − z x = $78.25 − 1.645 = $71.75
2 90
90%
5% 5%
$71.75 $78.25 $84.75
90%
According to our results, our 90% confidence interval for this random sample of students’
average takeout food expenditure is between $71.25 and $84.75.
Increasing the sample size from 45 to 90 has reduced the confidence interval from
($69.05, $87.45) to ($71.75, $84.75) which is a more precise interval.
Using Excel
=CONFIDENCE(α,σ,n) is =CONFIDENCE(0.1,37.5,90) giving 6.50185 (margin of
error, E). We now subtract E, then add E to the point estimate to obtain the lower and
upper confidence limits.
Example
What sample size would we need for a 95% confidence interval that has margin of error
of $8.00 in our “take out meal” example? We already have the population standard
deviation as $37.50, and sample mean is $78.25.
We need to solve the margin of error equation, E = z x = z
2 2 n
95%
2.5% 2.5%
–1.96 0 1.96
95%
x = $78.25 E = 8 and = 37.50 . We need to find n .
Substitute into the margin of error formula E = z x = z
2 2 n
(37.50)
$8 = 1.96
n
So n = 84.41 approximately 85 – rounding to next number.
Answer
1. We do not know. We can use the $45,420 as a point estimate.
2. Using a 95% confidence level (this is an assumption as we are not given a confidence
level to test with) we obtain:
= 2.5% z = 1.96
2 2
We know that x = $45,420 s = $2,050 n = 256
s
And since ˆ x = then
n
( 2,050)
z ˆ x = 1.96 = 251.13
2 256
Using Excel=CONFIDENCE(α,σ,n) is =CONFIDENCE(0.05,2050.256) giving 251.12
(2,050)
Upper limit x + z ˆ x = $45,420 + 1.96 = $45,671
2 256
(2,050)
Lower limit x − z ˆ x = $45,420 − 1.96 = $45,169
2 256
3. Suppose we select many samples of 256 managers, perhaps several hundred. For each
sample we compute the mean and standard deviation and then construct a 95%
confidence interval. We would expect about 95% of these confidence intervals to contain
the population mean. About 5% would not.
In this case, we can use a very similar formula replacing the normal distribution with the
student’s t-distribution. This distribution was developed in 1908, by an Englishman from
the Guinness Brewery, called William Gosset.
His papers were written under the assumed name of “Student”.
It takes account of the variability in the estimate s (of ), due to the small sample
size.
The degrees of freedom (df) which are dependent on the sample size, indicate the
level of variability in the estimate, 𝑥̅ , and hence sampling distribution. The
smaller the df (and hence the sample size) the more variable the sampling
distribution. The shape of the curve depends on the degrees of freedom which,
when dealing with the sample mean, would be equal to n – 1. One degree of
freedom is taken away for each constraint.
The t-distribution is flatter than the normal distribution. As the number of degrees
of freedom increase, the shape of the t-distribution becomes very similar to the
normal distribution. With more than 30 degrees of freedom (a sample size of 30
or more), the 2 distributions are almost identical.
It is the number of free choices we have after something has been decided such as the
sample mean.
Given we have a sample of size n=3, and a mean of 10, we can only vary 2 values (n – 1).
After we have chosen those two numbers, we have now fixed the value of the remaining
number because our mean must equal 10.
We can now set up the confidence intervals for the mean using a small sample:
−
x − t n−1ˆ x x x + t n−1ˆ x
2 2
Where:
Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 202
(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
x = the sample mean
= the significance level
n = sample size
s = sample standard deviation of the mean.
t n −1 = the critical t value – with (n − 1) degrees of freedom
2
s
ˆ x = the standard error of the mean
n
s
X t n −1 *
2 n
We may need to find DESCRIPTIVE STATISTICS which are within DATA ANALYSIS
in Excel.
If we are given the sample mean, X , but we are not given the raw data (all the individual
observations), then we may use the Excel function
which returns the t value of the student’s t-distribution as a function of the probability and
the degrees of freedom. We do not have to look up the figure manually from the t-tables.
Infact the t table has a limited amount of values given. Excel does not have this limitation.
Example
If we wanted to find the value of t, to use in finding the confidence interval where the
significance level was 5% and the number of observations is 10, we would use
=TINV(0.05, 9)
n −1
which will return the value of 2.262157. This is the value for t = t 2.5% . Notice that we
9
did not divide by 2 ( = 5% = 0.05 in this example) in the Excel formula. Excel does
that for us automatically.
Example-Using t-Tables
Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 204
(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
Let the following represent the amount spent on takeout meals for 10 people, i.e. our
sample size is 10. Assume the data come from a normal distribution.
29 70 89 100 48 40 137 75 39 88
We do not know . We will construct a 95% confidence interval around the sample
mean.
n−1
To determine the value t in this example we need to calculate the number of degrees
2
9
of freedom. As n = 10, we have n – 1 = 9 d.f. This corresponds to t 2.5% = 2.262 from
the t-distribution tables.
95% - t-test
2.5% 2.5%
–2.262 0 2.262
95%
X = $71.50 (worked out from the sample above) and s = $33.50 (calculated from
above)
t 92.5% = 2.262 from t table (𝑂𝑟 𝑇𝐼𝑁𝑉(0.05,9) = 2.262157 𝑓𝑟𝑜𝑚 𝐸𝑥𝑐𝑒𝑙)
s $33.50
ˆ x = = = $10.59
n 10
Now we can construct the 95% confidence interval:
Example-Using Excel
Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 205
(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
Let the following represent the amount spent on take out meals for 10 people, i.e. our
sample size is 10. Assume the data come from a normal distribution.
29 70 89 100 48 40 137 75 39 88
We do not know .
Go in to Data Analysis in Excel. Then click on the Descriptive Statistics option and press
OK.
The input range is the data we were given, for the amount spent on take out meals.
Press Output Range, if you want to specify where the output will go. The cursor
sometimes moves up to the Input Range box at this stage. Bring it back down to the
Output Range.
Check confidence level for mean. For this question the confidence level is 95%.
Press OK
47.53287 95.46713
So the final answer is (47.53287, 95.46713). This answer is more accurate than the
previous solution which did not use excel.
Student's t-distribution
=TDIST(t,degrees of freedom,1)
Student's t-distribution
1 minus
=TDIST(t,degrees of freedom,1)
1- TDIST (t, degrees of freedom, 1) gives the area to the left of a positive value of t.
Note that Excel does not work for negative values of t. However, since the
distribution is symmetric, we use this property to derive the areas for negative t
values.
Student's t-distribution
-t
Student's t-distribution
Student's t-distribution
1 minus
=TDIST(t,degrees of freedom,1)
-t
Student's t-distribution
=TDIST(t,degrees of freedom,2)
-t t
If the samples are large enough we may use the normal distribution as an approximation
to the binomial.
The conditions that must apply for this to be the case are:
If 𝑛 ∗ 𝑝 ≥ 5 𝑎𝑛𝑑 𝑛 ∗ 𝑞 ≥ 5 where:
𝑝(1 − 𝑝)
𝜎𝑝 = √
𝑛
Formulas for Confidence Intervals for the Proportion with Large Samples
𝑝𝑠 (1 − 𝑝𝑠 )
𝜎̂𝑝 = √
𝑛
During an investigation of a facility which produces light bulbs a sample of 200 light
bulbs was randomly selected. It was found that 11 of this sample were defective.
Calculate the 95% confidence interval around this sample proportion.
Solution
𝑛 = 200
11
𝑝𝑠 = = 0.055
200
𝑝𝑠 (1 − 𝑝𝑠 ) 0.055(1 − 0.055)
𝜎̂𝑝 = √ =√ = 0.0161
𝑛 200
𝛼 = 5% 𝑠𝑜 𝑍𝛼⁄2 = 1.96
known z z
2 n 2 n
s s
Unknown z t
2 n 2 n
3. If we repeatedly draw samples of size 100 from the population of teenagers, 95% of
the values of sample means will be such that the population mean amount of time
teenagers spend on the internet would be somewhere between 6.206 hours and 6.794, and
5% of the values of the sample mean will produce intervals that would not include the
population mean.
4. A quality control engineer is interested in the mean length of sheet insulation being cut
automatically by machine. The desired length of the insulation is 12 feet. It is known that
the standard deviation in the cutting length is 0.15 feet. A sample of 60 cut sheets yields a
mean length of 12.15 feet. This sample will be used to obtain a 99% confidence interval
for the mean length cut by machine.
5. Find the sample size needed to estimate a population mean to within 2 units of the
sample mean, with a 95% confidence level, when the population standard deviation
equals 8.
6. To estimate with 99% confidence the mean of a normal population, whose standard
deviation is assumed to be 6 and the maximum allowable sampling error is assumed to be
1.2, requires a random sample of what size?
7. The sample size needed to estimate a population mean to within 50 units of the sample
mean was found to be 97. The population standard deviation was 250, then what
confidence level was used?
9. How large a sample of state employees should be taken if we want to estimate with
98% confidence the mean salary to within $2000? The population standard deviation is
assumed to be $10,500.
10. The owner of Britten’s egg farm wants to estimate the mean number of eggs laid per
chicken. A sample of 20 chickens shows they laid an average of 20 eggs per month with a
standard deviation of 2 eggs per month.
a) What is the value of the population mean? What is the best estimate of this value?
b) What assumption do we need to make?
c) For the 95% confidence level what is the value of the critical point?
d) Develop the 95% confidence interval for the population mean.
e) Would it be reasonable to conclude that the population mean is 21 eggs?
f) Would it be reasonable to conclude that the population mean is 25 eggs?
11. The American Sugar Producers Association wants to estimate the mean yearly sugar
consumption. A sample of 16 people reveals the mean yearly consumption to be 60
pounds with a standard deviation of 20 pounds.
12. The union representing the Institute of Actuaries (IOA) is considering a proposal to
merge with the Faculty of Actuaries (FOA). According to the IOA union bylaws, at least
three-fourths of the union membership must approve any merger. A random sample of
2,000 current IOA members reveals 1,600 plan to vote for the merger proposal.
a) Develop a 95% confidence interval for the population proportion
b) Basing your decision on this sample information, can you conclude that the
necessary proportion of IOA members favor the merger?
c) Explain your answer to part b).
13. The owner of the Quick Fill Gas Station wished to determine the proportion of
customers who use a credit or debit card for their fuel purchase. A sample of 100
customers was surveyed and it was found that 80 paid at the pump.
14. Foxy TV network is looking in to replacing its prime time CSI show with a family
oriented comedy show. Before a decision is made, a sample of 400 viewers are asked for
their opinions. After viewing the pilot for the comedy, 250 indicated that they would
watch the show and suggested it should replace the CSI show.
15. Silkscreen Printing Inc., purchases coffee mugs on which to print funny logos. A
large shipment was received and the owner wants to ensure that the quality is good and
that they are not defective. A sample of 300 cups was chosen at random and it was found
that 15 were defective.