0% found this document useful (0 votes)
17 views40 pages

TOAE201-LecturerNotes-Chapter 5. Parameter Estimation

Parameter Estimation
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views40 pages

TOAE201-LecturerNotes-Chapter 5. Parameter Estimation

Parameter Estimation
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

TOAE201 Lecture Notes by Vuong Thi Thao Binh

(Some contents of this documant are based on Lecture Notes of Panayiotis Skordi, Fullerton University)

Chapter 5
Parameter Estimation of Random Variables
in economics and business
5.1. Introduction

Inferential Statistics
Statistical inference is the process by which we gather information about populations
from samples.

We can gather information from a sample e.g. the mean. We then use this mean derived
from the sample as an estimate for the population mean.

How good an estimate is this sample mean we have found?


Confidence intervals provide us with an answer to this question

There are two procedures for making inferences:

Estimation

Hypothesis Testing (to be covered later)

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 180


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
Concepts of Estimation
The objective of estimation is to determine the value of a population parameter on the
basis of a sample statistic. There are two types of estimators:

1) Point Estimate

2) Interval Estimate

Point Estimate
A point estimator draws inference about a population by estimating the value of an
unknown parameter using a single value or point.

It is a single value best describing the population of interest. The sample mean is the
most common. The advantage of a point estimate is that it is easy to calculate and easy to
understand.

The disadvantages are:

1) The estimate will almost certainly be wrong i.e. not exactly the figure that would have
been derived from using the whole population (most samples will not be exact
representations of the population).

2) It does not tell us how close the estimator is to the parameter.

3) We would expect that as the sample size increases, the sample would produce results
that are more accurate. The point estimate does not explicitly account for the size of the
sample (it does so implicitly but it is not measured directly)

To deal with these problems we can use an interval estimate.

Interval Estimate

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 181


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
An interval estimator draws inferences about a population by estimating the value of an
unknown parameter using an interval.
It provides a range of values that best describe the population.
To develop an interval estimate we need to understand confidence levels.

Confidence Interval
Range of values used to estimate a population parameter and is associated with a specific
confidence level.

The range of values is constructed from sample data so that the population parameter is
likely to occur in that range at a specified probability – the confidence level.

Confidence Level (1 – )
A confidence level, (1 – ), is the probability that the interval estimate will include the
population parameter .

   
Confidence Level
−
CONFIDENCE INTERVAL

Population Parameter (Reminder)


A numerical description of a population characteristic, such as the mean.

Level of Significance, 
This is the complement to the confidence level. If the confidence level is 90% then the
significance level is 10%. The level of significance is commonly denoted by .
The level of significance is also referred to as the probability of making a type I error
(See Chapter 6).

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 182


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
What is z 2 ?

z is known as the critical z score, which is the number of standard deviations, away
2

from the mean point of zero, on a standard normal chart, that .5-  2 %.of the
observations lie. It is based on the confidence level, 1 −  , chosen.

It is the point on the Z axis such that the area under the standard normal curve to the right
of the point z is  2 .
2

 
 − 

Z
0  
Using the properties of the normal distribution we can derive the following chart from
above:

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 183


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
 
 − 

Z
0  
And this leads to the following chart:

   
 − 

Z
−  0  

Constructing an Interval Estimate with 90% Confidence Level

Suppose we wanted to construct an interval estimate with a 90% confidence level. This
confidence interval corresponds to a z score, z = z 5% , from the standard normal tables
2
equal to 1.645.
90% Confidence Interval

90%

5% 5%
z
0
90%
− z = − z 5% + z = + z 5%
2 2

From standard normal


Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 184
(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
5% of the area under the curve lies to the right of +1.645 and 95% of the area lies to the
left.

This corresponds to a z-score of 1.645 – look at tables.

Note that a z score of z = 1.645 corresponds to a 90% confidence interval.

How is an Interval Estimator Produced from a Sampling


Distribution?
Suppose we have a population with mean  and standard deviation  . Let us assume
further that the population mean is unknown, and we want to estimate its value.

We begin by choosing a sample of size n and calculate the sample mean X .According to
the central limit theorem, the distribution of X , will be normally distributed, if X is
normally distributed. X will be approximately normally distributed if X is not normal
and the sample size n is at least 30.

X −
This means that Z = has a standard normal distribution.

n

Consider the following standard normal distribution:

   
 − 

Z
−  0  

In terms of formulas:

P − z  Z  z  = 1 − 
 2 2 

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 185


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
 
 X − 
P − z    z  = 1 − 
 2  2

 n 

   
P X − z    X + z  = 1 − 
 2 n 2 n

Conclusion

With repeated sampling from the population, the proportion of values of X for which the
interval

 
X − z , X + z
2 n 2 n

includes the population mean  is equal to 1 −  . This gives the confidence interval
estimator of  . So the ( 1 −  ) % confidence interval for  is given by the limits above.
This emphasizes the fact that it is the interval and not  that is random.

5.2. Confidence Interval Estimation for the Mean of a Normal


Distribution

Calculating the Confidence Interval for the Population


Mean
The central limit theorem states that the sampling distribution of the sample means is
approximately normal when the sample contains at least 30 observations.

When  is Known and Sample is large (greater than or equal to


30 observations)
Assume we have a large sample size n, where n  30 (and any population).
Constructing a confidence interval around our sample mean is done with the following
equations:
x + z  x Upper limit of confidence interval
2

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 186


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
x − z  x Lower limit of confidence interval.
2
Where:
x = The sample mean
z = The critical z score, which is the number of standard deviations based on the
2
confidence level

x = = The standard error or standard deviation of the sample mean
n
distribution
 = population standard deviation
= the significance level
n = Sample size
𝜎
𝑧𝛼⁄2 𝜎𝑥̅ = 𝑧𝛼⁄2 = the margin of error, and is denoted by E
√𝑛

   
 − 

x − z  x x x + z  x
2 2

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 187


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
Calculating Z intervals Using Excel

X  z *
2 n

Lower Confidence Limit (LCL) And Upper Confidence LimitUCL)


 
X - z * X + z *
2 n 2 n

AVERAGE(data set) - CONFIDENCE(α,σ,n) AVERAGE(data set) + CONFIDENCE(α,σ,n)

=AVERAGE(data set) in Excel gives us the point estimate or mean of the data
=CONFIDENCE(α,σ,n) in Excel gives us the margin of error, E

where:
 is the significance level
 is the population standard deviation
n is the size of the data set

CONFIDENCE from above is the same as the margin of error, E.

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 188


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
Example
Suppose we have 45 students in this room. Further, let’s say that the average height of
students is 5’ 10” and the standard deviation of the population is 4.1”.

The point estimate for the true underlying height of all students would be 5’10”.

The 90% interval estimate can be found by applying the above formulae.

2
= 5% or 0.05 as  = 10% or 0.10
z = 1.645 Since
2

90% Confidence Interval

90%

5% 5%
–1.645 0 1.645
90%
x = 5’10”  = 4.1 n = 45
4.1  4.1
x = as  x = z  x = 1.645( ) = 1.0
45 n 2 45

4.1
Upper limit x + z  x = 5'10"+(1.645) = 5'11"
2 45
4.1
Lower limit x − z  x = 5'10"−(1.645) = 5'9"
2 45
90% Confidence Interval

90%

5% 5%
5'9" 5'10" 5'11"
90%
According to our results, our 90% confidence interval for this random sample of students’
heights is between 5’9” and 5’11”.

Using Excel

=CONFIDENCE(α,σ,n) is =CONFIDENCE(0.1,4.1,45) giving 1.005321 (margin of


error, E). We now subtract E, then add E to the point estimate to obtain the lower and
upper confidence limits.

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 189


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
Example
Consider the average amount of money spent on takeout food by this universitie’s
students.

We can find the amount spent on take out, per week of this class.

Suppose we find that this class forms a sample of 45 students and that the average
amount spent on take out is $78.25 and the population standard deviation is $37.50.

We can calculate the 90% confidence interval as follows:


2
= 5% or 0.05 as  = 10% or 0.10
z = 1.645
2

x = $78.25  = 37.5 n = 45
37.50 37.50
x = z  x = 1.645( ) = 9.20
45 2 45
37.5
Upper limit x + z  x = $78.25 + (1.645) = $87.44
2 45
37.5
Lower limit x − z  x = $78.25 − (1.645) = $69.05
2 45
90% Confidence Interval

90%

5% 5%
$69.05 $78.25 $87.44
90%
According to our results, our 90% confidence interval for this random sample of students’
average takeout food expenditure, is between $69.05 and $87.45.

Using Excel=CONFIDENCE(α,σ,n) is =CONFIDENCE(0.1,37.5,45) giving 9.195011


(margin of error, E). We now subtract E, then add E to the point estimate to obtain the
lower and upper confidence limits.

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 190


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
Interpreting the Confidence Interval

A confidence interval is a range of values used to estimate a population parameter and is


associated with a specific confidence level. A confidence interval needs to be described
in the context of several samples. If we select 10 samples from “takeout food” population
and construct 90 percent confidence intervals around each of the sample means, then
theoretically, 9 of the 10 intervals will contain the true population mean, which remains
unknown.

It would be wrong to say that there is a 90% probability that the population mean lies
within say, the interval (69.05, 87.45). This would suggest that the mean is a variable.
It would be true to say that (69.05, 87.45) is a 90% confidence interval for  . This
emphasizes the fact that it is the interval and not  that is random. In the long run 90%
of the intervals will include  and 10% will not include  .

The population mean is a FIXED but unknown quantity.

Thus the confidence level applies to the estimation procedure and not to any one interval.

Interpretation of a 90% Confidence Interval

This interval does not include


In constructing ten 90% confidence intervals, we would expect about 90% (i.e. 9) of
these confidence intervals to contain the population mean. About 10% (i.e. 1) would not.

However, a particular confidence interval either contains the population parameter or it


does not. When we construct an interval we do not know if it contains the population
Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 191
(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
mean. To know this would entail us knowing the population mean. If we knew the
population mean we would not be carrying out this analysis.
The Effect of Changing Confidence Levels

So far we have only considered 90% confidence levels. We can choose any confidence
level but the ones most commonly used are shown below:

Confidence Significance
Level Level
(1- )   z − z
2 2 2
0.90 0.10 0.05 1.645 -1.645
0.95 0.05 0.025 1.960 -1.960
0.98 0.02 0.01 2.330 -2.330
0.99 0.01 0.005 2.575 -2.575

Re-work example with “take out meal” expenditure per week, using different confidence
levels.
Confidence
Level   z  x Sample Mean Lower Limit Upper Limit
2 2
0.90 0.10 0.05 1.645 $5.59 $78.25 $69.05 $87.45
0.95 0.05 0.025 1.960 $5.59 $78.25 $67.29 $89.21
0.98 0.02 0.01 2.330 $5.59 $78.25 $65.23 $91.27
0.99 0.01 0.005 2.575 $5.59 $78.25 $63.86 $92.64

You can see from the table that by increasing the confidence level, our interval estimate
of the population mean becomes wider and less precise.

Estimators Characteristics
Selecting the right sample statistic to estimate a parameter value depends on the statistic
characteristics. Desirable characteristics for an estimator are:

Unbiased
An unbiased estimator is one whose expected value is equal to the parameter it estimates.

Consistency
An unbiased estimator is said to be consistent if the difference between the estimator and
the parameter grows smaller as the sample size increases. (Addresses third disadvantage
of point estimation)

Relative Efficiency
Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 192
(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
For two unbiased estimators, the one with a smaller standard deviation is said to be
relatively efficient.

Changing Confidence Levels


Confidence Interval Around
Sample Mean

   
 − 

x − z  x x x + z  x
2 2

The width of the confidence interval is calculated by subtracting the lower limit
from the upper limit:

( x + z  x ) − ( x − z  x )
2 2


= 2 z
2 n
and is therefore affected by:

a) Population standard deviation 

b) Confidence level (1-)

c) The sample size n ( Can address third disadvantage of point estimation)

As the confidence level increases the interval estimate of the true population mean
becomes wider and less precise.

We can reduce the width of the interval but maintain the same confidence level. We do
this by increasing the sample size n, but keep the same confidence level. (This is the
principal of consistency)

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 193


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
Example
Let’s rework our last example.

Suppose we increase the size of our class from 45 to 90 students. Our standard error
 37.50
becomes = $3.95
n 90
Since
x = $78.25  = 5% z = 1.645  = 37.5 n = 90
2 2
then
(37.50)
z  x = 1.645 = 6.50
2 90
Our new 90% confidence interval for our original sample will have:

(37.50)
Upper limit x + z  x = $78.25 + 1.645 = $84.75
2 90
(37.50)
Lower limit x − z  x = $78.25 − 1.645 = $71.75
2 90

90% Confidence Interval

90%

5% 5%
$71.75 $78.25 $84.75
90%

According to our results, our 90% confidence interval for this random sample of students’
average takeout food expenditure is between $71.25 and $84.75.
Increasing the sample size from 45 to 90 has reduced the confidence interval from
($69.05, $87.45) to ($71.75, $84.75) which is a more precise interval.
Using Excel
=CONFIDENCE(α,σ,n) is =CONFIDENCE(0.1,37.5,90) giving 6.50185 (margin of
error, E). We now subtract E, then add E to the point estimate to obtain the lower and
upper confidence limits.

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 194


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
Determining the Sample Size for the Mean
We can calculate the minimum sample size that would be needed to provide a specific
margin of error.

Example
What sample size would we need for a 95% confidence interval that has margin of error
of $8.00 in our “take out meal” example? We already have the population standard
deviation as $37.50, and sample mean is $78.25.

We need to solve the margin of error equation, E = z  x = z
2 2 n

 = 5%or0.05 and z = z 2.5% = 1.96 since:


2

95% Confidence Interval

95%

2.5% 2.5%
–1.96 0 1.96
95%
x = $78.25 E = 8 and  = 37.50 . We need to find n .


Substitute into the margin of error formula E = z  x = z
2 2 n

(37.50)
$8 = 1.96
n
So n = 84.41 approximately 85 – rounding to next number.

So to obtain a 95 % confidence interval that ranges from $78.25 - $8.00 = $70.25 to


$78.25 + $8.00 = $86.25 would require a sample size of 85 students.

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 195


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
When  is Known and Sample is small (less than 30
observations)
This will be the same as for large samples because we assume that the population is
normally distributed.

When  is Unknown and Sample is large (greater than or equal


to 30 observations)
When  is unknown we can still use the same formula. Instead of the population
standard deviation  we use the sample standard deviation s. This is true for a normal
and non-normal population. The formula becomes

x + z ˆ x Upper limit of confidence interval


2

( x − z ˆ x ) Lower limit of confidence interval


2
Where:
x = the sample mean
z = the critical z score, which is the number of standard deviations based on the
2
confidence level
s
ˆ x = = the standard error of the mean
n
z ˆ x = the margin of error, e.g. when carrying out polls
2
 = the significance level
n = sample size
s = sample standard deviation of the mean.

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 196


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
Example
The American management association wishes to have information on the mean income
of middle managers in the retail industry. A random sample of 256 managers reveals a
sample mean of $45,420. The standard deviation of this sample is $2,050. The
association would like answers to the following questions:

1 What is the population mean?


2 What is a reasonable range of values for the population mean?
3 What do these results mean?

Answer
1. We do not know. We can use the $45,420 as a point estimate.

2. Using a 95% confidence level (this is an assumption as we are not given a confidence
level to test with) we obtain:
 = 2.5% z = 1.96
2 2
We know that x = $45,420 s = $2,050 n = 256
s
And since ˆ x = then
n
( 2,050)
z ˆ x = 1.96 = 251.13
2 256
Using Excel=CONFIDENCE(α,σ,n) is =CONFIDENCE(0.05,2050.256) giving 251.12

Our 95% confidence interval for our sample will have:

(2,050)
Upper limit x + z ˆ x = $45,420 + 1.96 = $45,671
2 256
(2,050)
Lower limit x − z ˆ x = $45,420 − 1.96 = $45,169
2 256
3. Suppose we select many samples of 256 managers, perhaps several hundred. For each
sample we compute the mean and standard deviation and then construct a 95%
confidence interval. We would expect about 95% of these confidence intervals to contain
the population mean. About 5% would not.

However, a particular confidence interval either contains the population parameter or it


does not. The following diagram shows the results of selecting samples of 256 managers
from the population and deriving the 95% confidence interval for the population mean.

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 197


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
Note that not all intervals include the population mean. Both the endpoints of the 5th
sample are less than the population mean. We attribute this to sampling error, and it is the
risk we assume when we select the level of confidence.

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 198


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 199
(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 200
(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
When  is Unknown and Sample is small (less than 30
observations)
The only time that we must change the formula slightly is when
1. The sample is small, i.e. less than 30 and
2. We do not know the standard deviation of the population, but we still assume the
population is normal.

Both conditions must hold simultaneously.

In this case, we can use a very similar formula replacing the normal distribution with the
student’s t-distribution. This distribution was developed in 1908, by an Englishman from
the Guinness Brewery, called William Gosset.
His papers were written under the assumed name of “Student”.

The t-distribution is a continuous probability distribution with the following properties:

It is bell shaped and symmetrical.

The area under the curve is equal to 1.0.

It takes account of the variability in the estimate s (of  ), due to the small sample
size.

The degrees of freedom (df) which are dependent on the sample size, indicate the
level of variability in the estimate, 𝑥̅ , and hence sampling distribution. The
smaller the df (and hence the sample size) the more variable the sampling
distribution. The shape of the curve depends on the degrees of freedom which,
when dealing with the sample mean, would be equal to n – 1. One degree of
freedom is taken away for each constraint.

The t-distribution is flatter than the normal distribution. As the number of degrees
of freedom increase, the shape of the t-distribution becomes very similar to the
normal distribution. With more than 30 degrees of freedom (a sample size of 30
or more), the 2 distributions are almost identical.

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 201


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
Degrees of Freedom
The number of values that are free to be varied given information such as the sample
mean is known.

It is the number of free choices we have after something has been decided such as the
sample mean.

Given we have a sample of size n=3, and a mean of 10, we can only vary 2 values (n – 1).

After we have chosen those two numbers, we have now fixed the value of the remaining
number because our mean must equal 10.

We can now set up the confidence intervals for the mean using a small sample:

Upper limit of confidence interval x + t n −1ˆ x


2

Lower limit of confidence interval x − t  ˆ x


n −1

Confidence Interval Around


Sample Mean

   
 − 

x − t n−1ˆ x x x + t n−1ˆ x
2 2

Where:
Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 202
(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
x = the sample mean
 = the significance level
n = sample size
s = sample standard deviation of the mean.
t n −1 = the critical t value – with (n − 1) degrees of freedom
2

s
ˆ x = the standard error of the mean
n

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 203


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
Calculating t intervals Using Excel (When we have a Normal
Distribution & is Unknown & the Sample is small- (less than 30
observations)

s
X  t n −1 *
2 n

Lower Confidence Limit (LCL) And Upper Confidence LimitUCL)


s s
X – t n −1 * X + t n −1 *
2 n 2 n

MEAN – CONFIDENCE LEVEL MEAN + CONFIDENCE LEVEL

The method of calculation will be dependent on the data which is presented to us in a


question.

We may need to find DESCRIPTIVE STATISTICS which are within DATA ANALYSIS
in Excel.

If we are given the sample mean, X , but we are not given the raw data (all the individual
observations), then we may use the Excel function

=TINV(probability, degrees of freedom)


=TINV(  , n − 1 )
where:
 is the significance level
n is the size of the data set

which returns the t value of the student’s t-distribution as a function of the probability and
the degrees of freedom. We do not have to look up the figure manually from the t-tables.
Infact the t table has a limited amount of values given. Excel does not have this limitation.
Example
If we wanted to find the value of t, to use in finding the confidence interval where the
significance level was 5% and the number of observations is 10, we would use
=TINV(0.05, 9)
n −1
which will return the value of 2.262157. This is the value for t  = t 2.5% . Notice that we
9

did not divide  by 2 (  = 5% = 0.05 in this example) in the Excel formula. Excel does
that for us automatically.
Example-Using t-Tables
Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 204
(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
Let the following represent the amount spent on takeout meals for 10 people, i.e. our
sample size is 10. Assume the data come from a normal distribution.

29 70 89 100 48 40 137 75 39 88

We do not know . We will construct a 95% confidence interval around the sample
mean.

n−1
To determine the value t  in this example we need to calculate the number of degrees
2
9
of freedom. As n = 10, we have n – 1 = 9 d.f. This corresponds to t 2.5% = 2.262 from
the t-distribution tables.

95% Confidence Interval

95% - t-test

2.5% 2.5%
–2.262 0 2.262
95%

X = $71.50 (worked out from the sample above) and s = $33.50 (calculated from
above)
t 92.5% = 2.262 from t table (𝑂𝑟 𝑇𝐼𝑁𝑉(0.05,9) = 2.262157 𝑓𝑟𝑜𝑚 𝐸𝑥𝑐𝑒𝑙)
s $33.50
ˆ x = = = $10.59
n 10
Now we can construct the 95% confidence interval:

Upper limit of confidence interval x + t 


n−1
ˆ x = $71.50+2.262($10.59) = $95.45
2

Lower limit of confidence interval x − t 


n−1
ˆ x = $71.50-2.262($10.59) = $47.55
2

Example-Using Excel
Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 205
(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
Let the following represent the amount spent on take out meals for 10 people, i.e. our
sample size is 10. Assume the data come from a normal distribution.

29 70 89 100 48 40 137 75 39 88

We do not know .

Go in to Data Analysis in Excel. Then click on the Descriptive Statistics option and press
OK.

The Descriptive Statistics box will open up.

The input range is the data we were given, for the amount spent on take out meals.

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 206


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
Since the data here is in a column, then press columns.

Press labels only if you have included a column heading.

Press Output Range, if you want to specify where the output will go. The cursor
sometimes moves up to the Input Range box at this stage. Bring it back down to the
Output Range.

Check Summary statistics.

Check confidence level for mean. For this question the confidence level is 95%.

Press OK

The following output will result.

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 207


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
From before we had:

Lower Confidence Limit (LCL) And Upper Confidence LimitUCL)


s s
X – t n −1 * X + t n −1 *
2 n 2 n

MEAN – CONFIDENCE LEVEL MEAN + CONFIDENCE LEVEL

71.5 – 23.96713 71.5 + 23.96713

47.53287 95.46713

So the final answer is (47.53287, 95.46713). This answer is more accurate than the
previous solution which did not use excel.

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 208


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
The t-distribution in Excel
We may also need to evaluate the area under a t-distribution curve, given a particular
value for t.

For this we can use = TDIST (t , deg rees of freedom ,1)

Student's t-distribution

=TDIST(t,degrees of freedom,1)

=TDIST(t,degrees of freedom,1) gives the 2 x area to the right of a positive value of t.


The 1 in the parentheses indicates a one tail distribution.

What do we do if we need the area to the left of a positive value of t?

Student's t-distribution
1 minus
=TDIST(t,degrees of freedom,1)

1- TDIST (t, degrees of freedom, 1) gives the area to the left of a positive value of t.

Note that Excel does not work for negative values of t. However, since the
distribution is symmetric, we use this property to derive the areas for negative t
values.

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 209


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
The area to the left of a negative value of t = area to the right of the corresponding
positive value of t, as can be seen below.

Student's t-distribution

-t

Student's t-distribution

Both of the above areas are given by =TDIST(t,degrees of freedom,1).

What will 1 - =TDIST(t,degrees of freedom,1) give?

Student's t-distribution
1 minus
=TDIST(t,degrees of freedom,1)

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 210


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
Student's t-distribution
1 minus
=TDIST(t,degrees of freedom,1)

-t

As we can see, these areas are the same.

𝑭𝒊𝒏𝒅𝒊𝒏𝒈 𝒕𝒉𝒆 𝑺𝒊𝒈𝒏𝒊𝒇𝒊𝒄𝒂𝒏𝒄𝒆 𝒍𝒆𝒗𝒆𝒍 𝜶,


𝒇𝒐𝒓 𝒂 𝒕 𝑫𝒊𝒔𝒕𝒓𝒊𝒃𝒖𝒕𝒊𝒐𝒏
=TDIST(t,degrees of freedom,2) gives twice the area to the right of a positive value
of t. The 2 within the parentheses indicates two tails. This gives us the significance
level.

Student's t-distribution

=TDIST(t,degrees of freedom,2)

-t t

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 211


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
Confidence Interval for the Proportion

Confidence intervals for estimates of population proportions may be constructed using


sample data.

Recall from before:

The sample proportion, 𝑝𝑠 , can be calculated by

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑢𝑐𝑐𝑒𝑠𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒


𝑝𝑠 =
𝑛 𝑡𝑜𝑡𝑎𝑙 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒

This is just the probability of success for the sample.

If the samples are large enough we may use the normal distribution as an approximation
to the binomial.

The conditions that must apply for this to be the case are:

If 𝑛 ∗ 𝑝 ≥ 5 𝑎𝑛𝑑 𝑛 ∗ 𝑞 ≥ 5 where:

𝑝 = 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑎 𝑠𝑖𝑛𝑔𝑙𝑒 𝑠𝑢𝑐𝑐𝑒𝑠𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛

𝑞 = 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑎 𝑠𝑖𝑛𝑔𝑙𝑒 𝑓𝑎𝑖𝑙𝑢𝑟𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛(𝑞 = 1 − 𝑝)

𝑛 = 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑡𝑒𝑚𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑖𝑛 𝑞𝑢𝑒𝑠𝑡𝑖𝑜𝑛

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 212


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
The standard deviation of such a distribution- known as the standard error of the
proportion, is denoted by 𝜎𝑝 , where:

𝑝(1 − 𝑝)
𝜎𝑝 = √
𝑛

Formulas for Confidence Intervals for the Proportion with Large Samples

Lower Confidence Interval


𝑝𝑠 − 𝑍𝛼⁄2 ∗ 𝜎𝑝

Upper Confidence Interval


𝑝𝑠 + 𝑍𝛼⁄2 ∗ 𝜎𝑝

𝜎𝑝 requires us to have knowledge of the value of 𝑝 - the population proportion. That is


however what we are trying to estimate. If we knew what it was, we would not be trying
to estimate it. So, as is usual, we do not actually know the value of 𝑝 and so we use 𝑝𝑠 as
an estimate of 𝑝. The standard error of the proportion is estimated by 𝜎̂𝑝 , where

𝑝𝑠 (1 − 𝑝𝑠 )
𝜎̂𝑝 = √
𝑛

And the confidence intervals become

Lower Confidence Interval


𝑝𝑠 − 𝑍𝛼⁄2 ∗ 𝜎̂𝑝

Upper Confidence Interval


𝑝𝑠 + 𝑍𝛼⁄2 ∗ 𝜎̂𝑝

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 213


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
Example

During an investigation of a facility which produces light bulbs a sample of 200 light
bulbs was randomly selected. It was found that 11 of this sample were defective.
Calculate the 95% confidence interval around this sample proportion.

Solution

𝑛 = 200

11
𝑝𝑠 = = 0.055
200

𝑝𝑠 (1 − 𝑝𝑠 ) 0.055(1 − 0.055)
𝜎̂𝑝 = √ =√ = 0.0161
𝑛 200

𝛼 = 5% 𝑠𝑜 𝑍𝛼⁄2 = 1.96

Lower Confidence Interval

𝑝𝑠 − 𝑍𝛼⁄2 ∗ 𝜎̂𝑝 =0.055-1.96*0.0161=0.023

Upper Confidence Interval

𝑝𝑠 + 𝑍𝛼⁄2 ∗ 𝜎̂𝑝 =0.055+1.96*0.0161=0.087

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 214


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
So the point estimate for the proportion of defective light bulbs is 5.5% and the 95%
interval estimate runs from 2.3% to 8.7%.

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 215


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
Summary Table for Margin of Error, E

Large Sample Small Sample

 
 known z z
2 n 2 n

s s
 Unknown z t
2 n 2 n

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 216


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
STATS
Exercises Confidence Intervals
1. The temperature readings for 20 winter days in Grand Rapids, Michigan are normally
distributed with mean 5.5 degrees and a standard deviation of 1.5. Determine the 90%
confidence interval estimate for the winter mean temperature.

2. A sample of 49 measurements of tensile strength (roof hanger) are calculated to have a


mean of 2.45 and a standard deviation of 0.25. Determine the 95% confidence interval for
the measurements of all hangers.

3. If we repeatedly draw samples of size 100 from the population of teenagers, 95% of
the values of sample means will be such that the population mean amount of time
teenagers spend on the internet would be somewhere between 6.206 hours and 6.794, and
5% of the values of the sample mean will produce intervals that would not include the
population mean.

a. Determine the 99% confidence interval estimate of the population mean.


b. Determine the 90% confidence interval estimate of the population mean.
c. Determine the 95% confidence interval estimate of the population mean if the
sample size is changed to 300.

4. A quality control engineer is interested in the mean length of sheet insulation being cut
automatically by machine. The desired length of the insulation is 12 feet. It is known that
the standard deviation in the cutting length is 0.15 feet. A sample of 60 cut sheets yields a
mean length of 12.15 feet. This sample will be used to obtain a 99% confidence interval
for the mean length cut by machine.

a. What is the critical value to use in obtaining the confidence interval?


b. Develop the 99% confidence interval for .

5. Find the sample size needed to estimate a population mean to within 2 units of the
sample mean, with a 95% confidence level, when the population standard deviation
equals 8.

6. To estimate with 99% confidence the mean of a normal population, whose standard
deviation is assumed to be 6 and the maximum allowable sampling error is assumed to be
1.2, requires a random sample of what size?

7. The sample size needed to estimate a population mean to within 50 units of the sample
mean was found to be 97. The population standard deviation was 250, then what
confidence level was used?

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 217


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
8. A statistician wants to estimate the mean weekly family expenditure on clothes. She
believes that the standard deviation of the weekly expenditure is $125. Determine with
99% confidence the number of families that must be sampled to estimate the mean
weekly family expenditure on clothes to within $15.

9. How large a sample of state employees should be taken if we want to estimate with
98% confidence the mean salary to within $2000? The population standard deviation is
assumed to be $10,500.

10. The owner of Britten’s egg farm wants to estimate the mean number of eggs laid per
chicken. A sample of 20 chickens shows they laid an average of 20 eggs per month with a
standard deviation of 2 eggs per month.

a) What is the value of the population mean? What is the best estimate of this value?
b) What assumption do we need to make?
c) For the 95% confidence level what is the value of the critical point?
d) Develop the 95% confidence interval for the population mean.
e) Would it be reasonable to conclude that the population mean is 21 eggs?
f) Would it be reasonable to conclude that the population mean is 25 eggs?

11. The American Sugar Producers Association wants to estimate the mean yearly sugar
consumption. A sample of 16 people reveals the mean yearly consumption to be 60
pounds with a standard deviation of 20 pounds.

a) What is the value of the population mean?


b) What distribution do we use?
c) Find the critical point for the 90% confidence level.
d) Develop the 90% confidence interval for the population mean.
e) Would it be reasonable to conclude that the population mean is 63 pounds?

12. The union representing the Institute of Actuaries (IOA) is considering a proposal to
merge with the Faculty of Actuaries (FOA). According to the IOA union bylaws, at least
three-fourths of the union membership must approve any merger. A random sample of
2,000 current IOA members reveals 1,600 plan to vote for the merger proposal.
a) Develop a 95% confidence interval for the population proportion
b) Basing your decision on this sample information, can you conclude that the
necessary proportion of IOA members favor the merger?
c) Explain your answer to part b).

13. The owner of the Quick Fill Gas Station wished to determine the proportion of
customers who use a credit or debit card for their fuel purchase. A sample of 100
customers was surveyed and it was found that 80 paid at the pump.

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 218


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)
a. Estimate the value of the population proportion
b. Calculate the standard error of the proportion
c. Develop a 95% confidence interval for the population proportion
d. Interpret your findings

14. Foxy TV network is looking in to replacing its prime time CSI show with a family
oriented comedy show. Before a decision is made, a sample of 400 viewers are asked for
their opinions. After viewing the pilot for the comedy, 250 indicated that they would
watch the show and suggested it should replace the CSI show.

a. Estimate the value of the population proportion


b. Calculate the standard error of the proportion
c. Develop a 99% confidence interval for the population proportion
d. Interpret your findings

15. Silkscreen Printing Inc., purchases coffee mugs on which to print funny logos. A
large shipment was received and the owner wants to ensure that the quality is good and
that they are not defective. A sample of 300 cups was chosen at random and it was found
that 15 were defective.

a. Estimate the value of the population proportion


b. Develop a 95% confidence interval for the proportion defective
c. The owner of Silkscreen Printing Inc., has an agreement with the supplier, that if
10% or more of the shipment is defective, he may return it. Should he return this
shipment?

Exercise 7.13, 7.14 (pp. 296 – Paul Newbold)

Chapter 5 - Lecture Notes by Vuong Thi Thao Binh 219


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University)

You might also like