Lecture 12
Lecture 12
Lecture 12
Linear Regression:
Test and Confidence Intervals
Fall
2013
Prof.
Yao
Xie,
yao.xie@isye.gatech.edu
H.
Milton
Stewart
School
of
Industrial
Systems
&
Engineering
Georgia
Tech
1
Outline
• Properties
of
β̂
1
and
β̂
0
as
point
estimators
• Hypothesis
test
on
slope
and
intercept
• Confidence
intervals
of
slope
and
intercept
• Real
example:
house
prices
and
taxes
2
Regression analysis
• Step
1:
graphical
display
of
data
—
scatter
plot:
sales
vs.
advertisement
cost
!
!
!
!
!
!
!
• calculate
correlation
3
• Step
2:
find
the
relationship
or
association
between
Sales
and
Advertisement
Cost
—
Regression
4
Simple linear regression
Based on the scatter diagram, it is probably reasonable to assume that the mean of the
random variable Y is related to X by the following simple linear regression model:
Yi = β 0 + β1 X i + ε i i = 1,2,!, n
εi (
ε i ∼ Ν 0, σ 2 )
Intercept Slope Random error
where the slope and intercept of the line are called regression coefficients.
• The case of simple linear regression considers a single regressor or predictor x and a
dependent or response variable Y.
5
the adequacy of the fitted model.
ator
t is of Equation 11-8.
occasionally Given data
convenient to (xgive
1, y1), p , (xn, yto
(x2, y2),symbols
special n),the
let numerator and
uation 11-8. Given data (x1, y1), (x2, y2), p , (xn, nyn), let
Regression a a xi b
coefficients 2
a a xi b a a yi b
n n
a a xi b a a yi b
i"1 i"1
n n
n (11-11)
yˆ i = βˆ0 + βˆ1 xi
Fitted (estimated)
S xy regression model
β̂1 =
S xx
Caveat:
regression
relationship
are
valid
only
for
values
of
the
regressor
variable
within
the
range
the
original
data.
Be
careful
with
extrapolation. 6
Estimation of variance
• Using
the
fitted
model,
we
can
estimate
value
of
the
response
variable
for
given
predictor
!
yˆi = βˆ0 + βˆ1 xi
!
• Residuals:
ri = yi − yˆi
• Our
model:
Yi
=
β0
+
β1Xi
+
εi,
i
=1,…,n,
Var(εi)
=
σ2
• Unbiased
estimator
(MSE:
Mean
Square
Error)
n
!
∑i
r 2
σ̂ =2
MSE = i =1
n−2
7
Punchline
• the
coefficients
!
β̂1 and β̂ 0
!
and
both
calculated
from
data,
and
they
are
subject
to
error.
• if
the
true
model
is
y
=
β
1
x
+
β
0
,
β̂
β̂
0
are
1
and
point
estimators
for
the
true
coefficients
!
• we
can
talk
about
the
``accuracy’’
of
β̂1 and β̂ 0
8
Assessing linear regression model
• Test
hypothesis
about
true
slope
and
intercept
! β1 = ?, β0 = ?
!
• Construct
confidence
intervals
!
β1 ∈ ⎡⎣ β̂1 − a, β̂1 + a ⎤⎦ β 0 ∈ ⎡⎣ β̂ 0 − b, β̂ 0 + b ⎤⎦ with
probability
1− α
!
• Assume
the
errors
are
normally
distributed
(
ε i ~ Ν 0, σ 2 )
9
to show
the $ thei"1
leastthat
squares expected valueofof
estimators
i $ the
$1 regression
is
i"1 i coefficients may be
s. We will investigate the bias and variance properties of the least
11-4 HYPOTHESIS TESTS IN
Properties
$̂e1.fitted E1$̂1 2 ! of
$1 Regression Estimators
or estimated regression line is therefore
(11-15)
ause $̂1 is a linear combination of the observations Yi, we can use
estimator slope
of the For
parameter
true the
slope
o show that the expected value intercept,
$ β. we can show
intercept
$̂01 #
ˆ ˆ in a similar
parameter β manner tha
1 1ŷ of
"! is !1x 0 (
riance of $̂1. Since we 11-4
have HYPOTHESIS
assumed that TESTS
V(&i) !IN#SIMPLE
2 LINEAR REGRESS
, it follows that
hatiseach pair ofcombination
a linear observations satisfies
of the the relationship
observations Yi, the results in ˆ
E1$̂1 2 ! $1 ˆ
E1!0 2 " !0 and V1!0 2 "
(11-15)
edintercept, we can show in a similar manner that
to show that
ˆ #!
yi " ! ˆ x # e, i " 1, 2, p , n
0 1 i i
timator of the true slope2$1. 2
ei " yof Thus, #!ˆ is an unbiased estimator 1 of thex the fit !o
errorintercept
ance i %$̂ŷ
1 .
i is called
Since
V1$̂we 2 the
ˆ
!have residual.
1E1!0 2 " !0 0assumed The
and residual
that ˆ
V(&
V1! 2
i) !
" #
$ 22, itthe
describes c follows
&(11-16) d in
that
S ˆ ˆ 0 n thebe
to the ith observation
is a linear combination ablesy i. !
Later
xx
of
0 and
in
the !
this is not
chapter
observations
1 zero.
we will
Y It, usecan
the Sresiduals
xxshown
results in to(see
pr
i
ation about the adequacy
to show that #$ x %Sxx.
2of the fitted model.
otationally, it is occasionally Theconvenient
estimate to
of give
$ 2 special symbols to the numerato
could be used in Equations 1
ˆ! is an unbiased estimator !
inator
0 unbiased
of Equation estimator
11-8. Given of the
data (x1, y1), (x unbiased
, y2), p0, (xestimator
intercept . The covariance
, yn), let of th
!0 and ! the variance of the slope
ˆ is not zero. 2It can be shown (see Exercise 11-98) 2 and the nintercept. We call th
that
1 #
a a xi b
Sxx. V1$̂ 1
estimators
2 ! the estimated standard
n 2errors of
(11-16) the slop
Snxx
Sx x " a 1xi % x2 " a x i %
2 n
e estimate of $ could be used in2 Equations 2 11-16
i"1 and 11-17 to provi
n (1
iance of the slope and i"1 the intercept. We i"1 call the square roots of the res
ors theEstimated
estimated standard errors of the slope and intercept, respectiv 10
an unbiased estimator of the intercept !0. The covariance of the rand
ˆStandard
nd ! is not zero. errors
It can be of
showncoefficients
(see Exercise 11-98) that cov(! ˆ
1
• We
can
replace
σ
2
with
its
estimator
σ̂
2
…
timate of $2 could be used in Equations 11-16 and 11-17 to provide est
! n
∑
e of the slope and the intercept. We call ri
2the square roots of the resulting
!
he estimated standardσ̂errors
2
= MSE
of the
= i =slope
1 and intercept, respectively.
n−2
!
! ri = yi − yˆi yˆi = βˆ0 + βˆ1 xi
!
ple linear regression the estimated standard error of the slope and
• Using
rerror
ted standard esults
offrom
previous
are
the intercept page,
estimate
the
standard
error
of
coefficients
2 2
ˆ 2" $̂ 1
ˆ 2 " $̂2 c & x
se1!1 and se1!0 n d
B Sxx B Sxx
11
2
ively, where $̂ is computed from Equation 11-13.
distributed
2 with mean zero 2 and variance $ , abbreviated NID(0, $ ).
, abbreviated NID(0, $ ).
t of assessing the adequacy of a linear regression model is testing stat
he modelHypothesisparameters test in simple
and constructing linear
certain regression
confidence intervals. Hypo
near regression is discussed in this section, and Section 11-5 presents me
n model
• we
wisish
testing
to
t statistical
est
ttest
he
hypothesis
whether
the
say,
sand
lope
e.quals
onfidence
o test the intervals. Tothat
hypothesis hypotheses
the slope equals about a the slope
constant, intercept
! The of t
app
nfidence intervals. Hypothesis 1,0
s
we
re a musta
constant
constant, make the!1,0
say, . The appro-
additional assumption that the error component i
ection 11-5 presents methods
lly distributed.
! Thus, the complete assumptions are that the errors are nor
slope and intercept ofHthe 0: !re-
1 " !1,0
distributed with mean zero and variance $2, abbreviated NID(0, $2).
the error ! componentH in: !the( ! (11-1
1 1 1,0
re that the errors are normally (11-18)
! 2
reviated NID(0,
umed a two-sided alternative. $ ). Since the errors ' are NID(0, $ 2
), it follo
rrors•
servationse.g.
' i arerelate
NID(0, a
Yi are NID(! ds
$ 2to
sales,
we
are
interested
in
study
), it &follows
! x , $ 2
i
). Now !ˆ 1 is a linear combination
0 1 i
!ˆ whether
to test
ow 1 is or
combination
thea hypothesis
linear not
ithat
ncrease
ofa
$
oequals
the slope n
ads
awconstant,
ill
increase
say,$
!
1,0
.
iThe
n
a
are sales?
• sale
onstant, say, =
!
1,0
.aThe
ds
+appro-
Hconstant?
0: !1 " !1,0
H1: !1 ( !1,0 (1
sumed a two-sided alternative. Since the errors 'i are NID(0, $2),12 it fo
(11-18) 2 ˆ
hesis if the computed value of this test statistic, t0, is such that
A related
e denominator of the and important
test statistic question…
in Equation 11-22 is just the stan-
whether
ial• case or
not
the
ofslope
of the hypotheses Equationis
11-18 is
zero?
Significance
of
regression
! H0: !1 $ 0
H1: !1 % 0 (11-23)
!
• if
significance
o the β1
=
0,
that
ofmregression.
eans
Y
does
not
to reject H0: !1 $ 0 is
Failure
depend
at there on
X,
relationship
is no linear i.e.,
between x and Y. This situation is
e that this may imply either that x is of little value in explaining the
• Y
and
X
are
independent
best estimator of Y for any x is ŷ $ Y [Fig. 11-5(a)] or that the true
d Y• isIn
not the
advertisement
linear [Fig. 11-5(b)]. example,
Alternatively, if H0: !1 $ 0 is re-
does
of value in aexplaining
ds
increase
the svariability
ales?
or
n
ino
Y (see Fig. 11-6). Rejecting
effect?
her that the straight-line model is adequate [Fig. 11-6(a)] or that,
effect of x, better results could be obtained with the addition of
rms in x [Fig. 11-6(b)]. 13
11-4 HYPOTHESIS TESTS IN SIMPLE LINEAR REG
y y
"0 x x
(a) (b)
( )
error of the slope, so we could write the test 0 t statistic
0 & t as
β̂1 ~ Ν β1,0 , σ 2 / Sxx ! 0 '(2,n#2
(two-‐sided
ˆ #
!1
test)
!1,0
where t0 is computed from Equation 11-19. T0 $ The denominatorˆ
of Equa
se1!1 2 15
error of the slope, so we could write the test statistic as
We willNumbertest for significance
x (%) of
y (%)regression using the model for Prac
xygen
the oxygen Purity
1
purityTests 0.99 of Coefficients
data from Example
90.01
11-1. The hypotheses are t0.005,18 "
2 1.02 89.05
11-2
nce
EXAMPLE 3 Oxygenusing
of regression
Example: 11-2Purity
1.15 the Tests
oxygen model
Oxygen of
91.43for
purity Coefficients
Purity ˆ tests
Tests Practical2Interpretation: Since
of of coefficients
Coefficients criticaltherer
om variables,
Example
r significance4 and
11-1. consequently,
The
1.29hypotheses
of regression H :
using !
93.74 are" !
0 1
the model tis N(! " 1 , "
2.88, !S
for the modelPracticalthe xx ),
value using
of the the
test
Interpretatio
bias
statis
We will 5 test for significance 0
of 1
regression 0.005,18
using for There
Practica is s
1n 2
H(value
2
1.46 96.73
e the
rity slope
EGRESSION
•data discussed
6 from
oxygen
AND
Consider
Example
purity 1.36 in
CORRELATION
t he
data t
11-1.Section
est
from H The: !
94.45
1Example 1 & 11-3.
0
hypotheses
11-1.
100
In
critical
are
The
addition,
region,
hypothesest implying
0.005,18 are " #2.88,22"
that
t
ˆ
the 0"
this : !test
"
has
"of
12.88
H0: !1 7" 0 0.87 87.59
Thereˆ is strong evidence to 0.005,18
support 2 this c
h n #8 2 degrees1.23of freedom, 91.77 and 98 ! 1 is independent
critical region, of "ˆ
implying
with
critical . region
As
a cal tha
1: !we
Hand 1! 9&will
0H0use: !1 #"1.55 " 0 0.01. H From
: !
99.42" Example
0 thisˆ11-1test and is P ! 1.23
Table 2 is11-2
(9
' 10 evidence . This wa
e statistic
mal random10 variables, 1.40 and 0consequently,
1
93.65
96
! 1 is N(! There
1 , " !S strong
xx ), using
There the to
bia
is Table
stron
we have H : ! & 0 H : 93.54& 0 94 with a calculator.
! this test 1n is # !
P this 2 '
1.23 2isth10
1. From!12of (
test P
Purity (y)
11 1.19
perties the slope
Example 1 1
11-1 discussed
and Table1 in 1 Section 11-3. In addition,
11-2 22" Notice
ˆ " ha
1.15 92.52 Table ˆ 11-2
with presents
a calculator.the Minitab ou
#
92
! 2is independent of11.35 with "a 2calcul
anda
eibution with n 2 degrees of freedom, and
ˆ "
! 13
14.947 0.98
n " 20, 90.56
S " 0.68088, $̂ " 1.18 ˆfor. As
and !we
# " 0.01. From
114 will use # 1.01 "!ˆ
Example0.01.
# 11-1
From
! and
89.54xx Table
Example 90 Notice
11-2 that1
11-1 and Table 11-2the t-statistic value
Table 11-2 presents Table 11
the th
operties, 15 the statistic 1
1.11 2 1,0
89.85 reports is P " th
20, we Shave " T
0.68088, $ $̂ " 1.18 88 11.35 and that the reported P-value (11-19)
so the!xx17t-statistic
16 0 in Equation
1.20
x 10-20
90.39
becomes Notice that theNotice t-statistic
This that
statis
x v
th
(a)
ˆ18 2"
1.26 ˆ 2
( S 93.25
xx 2
86 reports the t-statistic for testing the hy
0.85 0.95 2 1.15
1.05 11.35 (b)and
1.25 1.45 11.35
1.35 that the and
1.55reported tha
47
ion•10-20n ! " " 14.947
20,
Calculate
1 becomes S " n " 20,
0.68088,
xxthe
test
!
1.32
ˆ S
93.41
#
xx $̂"
! "0.68088,
1.18
This $̂
statistic " 1.18
is
Hydrocarboncomputed
level ( x) from as t "
Equation
0 46
19
!ˆ 1.43
!ˆ 1
94.98 1,014.947 reports theversus reports
t-statistic the
forrej t-t
t0statistic
20
"t-statistic
1 0.95 T $
" 0
1 87.33
"
Figure 11-1 Scatter diagram
t0 "
asTable 11-1.46.62. "
of oxygen purity
Clearly,
11.35 zero
hydrocarbon
then, the hypothes is
(11-19
c so
in the
Equation 10-20 in Equation
becomes 10-20
2" 2 becomes
level from
This statistic isThis statisticfr
computed
!
hn #1
" 2$̂ %Sxx of freedom
2 degrees
14.9472
"
ˆ
se1!1 2
11.35
ˆ ( S
21.18zero
underxx %H : !rejected.
0.68088
0is 1 $ !1,0. We would rejec
ˆ! 2 ! 21.18 as t0 " 46.62. as t
Clearly,
0 " 46.62
then
1ity Tests
1 ! %0.6808814.947
ˆ
1 of 1Coefficients
ˆ ! ˆ
!1 14.947 zero is rejected. zero is rejecte
ibution t
"0 with" n" "
2# 2 degrees ˆ of " " 11.35
freedom " 11.35
under H0: !1 $ !1,0. We would rejec
%Sxx using
ssion ! 2$̂
ˆ
se1!1 2the % model
S se1!
21.18 %for
xx 2
0.68088
1 21.18 % Practical Interpretation: Since t
0.68088
11-4.2 Analysis of Variance Approach to Test Significa
of11-1. 0 0 & t'(2,n#2
0 tApproach
• Threshold
The hypotheses
Variance = t0.005,18
areto Test " 2.88, the
Significance of value of the(11-20)
Regression test st
• Reject
11-4.2 H0
since 0 tof
Analysis 0 &
A critical
Variance
t'(2,n#2
method region,
Approach
called the implying
to Test
analysis of Significance
variance 16 !1
that H(11-20
0: can
nalysis of Variance 0Approach
A method called the analysis of There to
varianceTest Significance
can beevidence
used to test of Regres
is strong to for signific
support th
ˆ #!
! T0 $ ˆ 2
1 1,0 se1!
T0 $ A similar
ˆ 2
procedure can
1 be used to test hypothes
Use t-test for1 intercept
se1!
similar procedure can be used to test hypotheses about the intercept. To test
H0: !0 $
• Use
be used a
shypotheses
to test imilar
form
about of
tthe
est
intercept. To testH0: !0 $ !0,0 H1: !0 %
! H0: !0 $ !0,0 H1: !0 % !0,0
!
H : !we%would
! use the statistic (11-21)
e would use the statistic
1 0 0,0
!
ˆ #!
! ˆ #!
! ˆ #!
!
c • Test
statistic
0 0,0 0 0,0 0 0,0
T0 $ 2
$ T
ˆ 0
se1! 2
$ 2
Test Statistic 1 x 0 1 x
! ˆ #! ˆ "ˆ#c n! )
2
d
! 0 0,0 B!0 0,0 Sxx "ˆ 2
cn )
T0 $ $ ˆ B(11-22) Sx
! 1 x 2 se1!0 2
"ˆ Hc0nhypothesis
nd rejectUnder
2
the null ,
)
T ~
0
S
t ddistribution
if the computedw ith
n
value -‐2
d egree
of o
this test f
f reedom
statistic, t0, is s
00 & t ! B. Note thatand
'(2,n#2 thexxdenominator reject theofnull hypothesis
the test if the computed
statistic in Equation 11-22 is just
•
othesis Reject
ard error
if the H0
if 0 t0 0 &
of the intercept.
computed
t'(2,n#2
value of
. Note
this test
that the denominator
statistic, t , is such
of the te
that
A very important special case of the hypotheses of Equation 0 11-18 is
dard
the denominator of the testerror of the
statistic intercept.
in Equation 11-22 is just the stan-
t. A very important
H0: !1 $ 0special case of the hypothe
17
Class activity
Given the regression line:
y = 22.2 + 10.5 x estimated for x = 1,2,3,…,20
!
1. The estimated slope is:
A. βˆ = 22 .2 B. βˆ = 10.5 C. biased
1 1
!
!
!
2. The predicted value for x*=10 is
A. y*=22.2 B. y*=127.2 C. y*=32.7
!
!
!
3. The predicted value for x*=40 is
A. y*=442.2 B. y*=127.2 C. Cannot extrapolate
18
Class activity
1. The estimated slope is significantly different from zero when
2
βˆ S XX βˆ S XX βˆ1 S XX
1
> tα / 2,n − 2 1
< tα / 2,n − 2 > Fα / 2,n −1,1
A. σˆ B. σˆ C. ! σˆ
2. The estimated intercept is plausibly zero when
A. Its confidence interval contains 0.
βˆ0 S XX βˆ0
B. < tα / 2,n − 2 C. > tα / 2,n − 2
σˆ σˆ 1 / n + x / S xx
2
19
where t0 is computed from Equation 11-19. The denominator of Equa
ˆ #
!
error of the slope, so we could write the test !1,0 as
1 statistic
Confidence interval T0 $
se1!ˆ 2
1
ˆ! # !
1 1,0
• we
can
obtain
A similarconfidence
procedure can ibe
nterval
estimates
used to test
T0 $
hypotheses oˆf
1s2 the
about
se1! lope
intercept. T
and
CORRELATION
ESSION AND intercept
A similar procedure can be used to test !0 $ !0,0 about the interce
H0: hypotheses
• width
of
confidence
interval
is
a
measure
H1: !0 %o! f
0,0the
randomoverall
variables,quality
ˆ is N(! , "2!S ), using the bias
and consequently,of
the
regression
! 1 1 xx H0: !0 $ !0,0
2true
2 parameter
ies of the slope discussed we would in Section 11-3. In addition, 1n H
use the statistic # :22"!
ˆ % ( "
! has
1 0 2 0,0
tion with n # 2 degrees of freedom, and ! ˆ is independent of " ˆ . As a
slope intercept
1
!ˆ #! !ˆ #!
rties, the statistic we would use the statisticT0 $ 0 0,0
$
0 0,0
Test Statistic 2 se1!ˆ 2
1 x 0
"ˆ 2 c nˆ ) d ˆ #!
ˆ! # ! B !0 #Sxx !0,0 ! 0 0,0
1 1,0 T0 $ $ ˆ 2
T0 $
Test Statistic 1 x (11-19)
2 se1! 0
and 2" reject 2 the
ˆ ( Sxx null hypothesis if the 2
computed
"ˆ c n ) value d of this test stat
0 t0 0 & t'(2,n#2. Note that the denominator B of the testSstatistic xx in Equation
dard error of the intercept.
tion with n #~
2t
degrees
distribution
and of reject
freedom
A very w ith
under
the
important n-‐2
null H0~
hypothesis
special :! t1
d$istribution
case if!the
of the computed
1,0.hypotheses
We with
would n-‐2
of this
ofvalue
reject
Equation testi
11-18
0 t0 0 & t'(2,n#2. Note that the denominator of the test statistic in Equa
degree
dard of
error
freedom
of the intercept. degree
Ho0:f
!f1reedom
$0 20
A very important special case of the hypotheses of Equation 11
H1: !1 % 0
0 t0 0 & t'(2,n#2 (11-20)
e both both1$distributed
are distributed
1 " $as 1 2 %t2&asˆ t%random
random Sx x variables 1$with
andvariables $n0 2"
0 " with %B "
n2& c2ndegrees
ˆdegrees ' of Bdfreedom.
of freedom. x x This
SThis lead
leads t
following definition of 100(1 " #)% confidence intervals Sx xthe slope and intercept.
on
llowing definition of 100(1 " #)% confidence intervals on the slope and intercept.
are both distributed as t random variables with n " 2 degrees of freedom. This l
Confidence intervals
h distributed as t random variables with n " 2 degrees of freedom. This leads to the
ce following definition of 100(1 " #)% confidence intervals on the slope and interce
ng definition of 100(1 " #)% confidence intervals on the slope and intercept.
on UnderUnder the assumption
the assumption thatobservations
that the the observations are normally
are normally and and independently
independently distribu
distributed
rs " confidence
#)% confidence
ce a 100(1
a 100(1 " #)% interval
interval on theon slope
the slope$1 in$simple
1 in simple linearlinear regression
regression is is
on Under the assumption that the observations are normally and independently distri
der
rs
the assumption that the observations are normally
2 and independently distributed,
2
a 100(1 " #)% confidence 2 &
interval ˆ on the slope $ in simple &
ˆ
2 linear regression
00(1 " #)% confidence interval
$̂ " ton the&slope $(1 in$ simple
ˆ
( $̂ linear
' &ˆ
1 t regression is (11-
$̂1 " t1#%2, n"2#%2, n"2 ( $1 ( 1$̂1 ' 1t# 2, n"2 #%2, n"2 (11-29
B SxB S
x x x2
% B SB S
xx xx 2
&ˆ2 &ˆ &ˆ2 &ˆ
$̂1 a"100(1
Similarly, $̂1 #)%
t#%2, n"2
" " t#confidence
%( ( t$
$1 ( S$̂1 interval
2, n"2 ' 12,( $̂1the
on ' intercept
t#%2, n"2 $(11-29) is (
Similarly, a 100(1 " #)%Bconfidence Sx x Binterval
xx
# % n"2
on theBintercept
Sx x $B 0 isSx x
0
2
milarly, aSimilarly,
100(1 " a 100(1
#)% 1
1confidence2 x
2 "x#)% interval
confidence interval
on the intercepton the$0 intercept
is $0 is
$̂ " t
$̂0 " t#%2, n"2% &ˆ B
0 # 2, n"22 &
ˆ c '
c n ' n d Sx x d
B Sx x
2
2 2 1 x
1 x 1 x 2
" t#%2, n"2 ' &ˆd2 c n ' ( $
$̂0 "&ˆt2#%c2,nn"2 d (($0$̂('$̂0t ' t#%2, n"2&ˆ 2 c &ˆ12 ' c n x' d (11-
B B
Sx x Sx x 0 0 #%2, n"2 B d
S (11-30
B 2n Sx x x x
1 x 2
1 x
( $0 ( $̂0 ' ( t$#%02,( &ˆ 2 tc#n%2,'
n"2$̂0 ' d&ˆ 2 c '(11-30) d (
B n"2
SBxx n Sx x
Oxygen Purity Confidence Interval on the Slope
ygen Purity
onfidence Confidence
interval on the slopeInterval
of the on
re- the This
Slope simplifies to
nce data
the interval on the slope
in Example of the
11-1. re- thatThis simplifies to
Recall
Purity
Oxygen
ata
0.68088,
Confidence
andPurity
in Example ˆ2 )
&
Interval
Confidence
11-1.
1.18 Recall
(see
on the
that
Table
Slopeon the Slope
Interval
11-2). 21
terval on 2the slope of the re- This simplifies to 12.181 ( $1 ( 17.713
8, and
onfidence&
ˆ ) 1.18
11-29 we interval
find (see Table 11-2).
on the slope of the re- This simplifies to ( $ ( 17.713
12.181
B Sx x B Sx x
We will find a 95% confidence interval on the sl
gression line using the data in Example 11-1
XAMPLEExample:
onfidence 11-4
interval oxygen
Oxygenon thePurity purity
intercept Confidencetests
$ 0 is of0.68088,
Interval coefficients
on the Slope
2
AMPLE
We willEXAMPLE 11-4 Oxygen
11-4 Oxygen
find a 95% confidence Purity
$̂ )
interval onPurity
1 Confidence
14.947,
the slope
S )
Confidence Interval
of the re- Interval
xx and &ˆ
on )the
1.18Slo
on the
This simplifies
(se
to
2will find EXAMPLE
a using
95% 11-4 Oxygen
Then, Purity
from Equation Confidence
11-29 we findInterval on
(αthe=ofre-
0.05)
essionWe line
will find confidence
athe95% interval
dataconfidence
in Example on
11-1.
interval theon slope
Recall of
that
the slope the re- ThisThis sims
d gression
) 14.947, We
S )will find
0.68088, a 95%
and &ˆconfidence
2
) in interval
(see
1.18Example Table11-1.on the
11-2). 2 slopethat
of the re-
1ssion line using
xx line theusing data theindata Example 11-1. Recall &
ˆ that
Recall
hen,
x from gression
Equation 11-29line weusing
find the 2 data in Example
$̂ˆ12" t0.025,18 11-1.
( $1Recall
( $̂1 ' that
t0.025,18
$̂
) 14.947, 1 ) 14.947,
S ) 0.68088, S xx ) 0.68088, and
and & ) 1.182 (see
ˆ & ) 1.18 (see
2 Table Table
B Sxx11-2). 11-2).
$̂xx
1 ) 14.947, S xx ) 0.68088,
1 and
x &
ˆ ) 1.18 (see Table 11-2).
en, from Then, from Equation
Equation 11-29 11-29
we find we
2 find Practical Inte
( $ ( $̂0 '
$̂1 "0 tThen, from &ˆ 2
t#%Equation
2, n"2 &
ˆ
or c nt 'we findd
11-29 &ˆ 2
(11-30)
0.025,18 ( $ 1B 1
( $̂ ' Sx xB S
0.025,18 there is strong evid
B Sxx 2 &ˆ 2 xx 2 &ˆ 2 Prac P
$̂ " t &ˆ ( &
ˆ
$ 2
( $̂ ' t &ˆ The&
1.18ˆ 2CI is reasonab
there
$̂1 " t0.025,181
$̂0.025,18
" t (
B $
S 1 ( $̂1 ' t1
1 ( $ ( $̂
0.025,18
14.947
0.025,18
" '
2.101t B S ance is( there
fairly
$ ( is14s
smal
B Sxx
1 0.025,18xx
B Sxx 1 1B Sxx A 0.68088
0.025,18
xx
B Sxx 1
val on TheTheCI iC
or the Slope 1.18 1.18 ance
ance is f
e- or
14.947 " 2.101
This simplifies to 0.68088 ( $ 1 ( 14.947 ' 2.101
A A 0.68088
at 1.18
14.947 " 2.101 1.18
1.18 1.18
( $1 ( 14.947
). 14.947 '"2.10114.947
2.101 "A
12.1812.101
0.68088
( (
$ $
( ( ( $1 ( 14.947
14.947
17.713
AA 0.68088 A 0.68088
1
0.68088 1
11-5.2 1.18 Confidence Interval on the
The
confidence
' interval
2.101
1.18 does
not
1.18 include
0,
so
enough
Practical Interpretation:
' A2.101 This CI does not include zero, so
0.68088
1-5.2evidence
'saying
Confidence2.101 there
i
Interval
A 0.68088
there is strong evidence s
enough
A
on the c
0.68088orrelation
Mean b etween
Response
A
(at # ) 0.05) that the slope is not zero. X
confidence
and
Y
interva.
22
The CI is reasonably narrow (*2.766) because thex0error
. Thisvari-
is a confiden
Wellington [“Prediction, Linear Regression, and a Minimum priate, fit the regression model relating steam
Sum of Relative Errors” (Vol. 19, 1977)] presents data on the the average temperature (x). What is the est
selling price and annual taxes for 24 houses. The data are Graph the regression line.
Example: house selling price and annual taxes
shown in the following table. (b) What is the estimate of expected steam usa
average temperature is 55#F?
(c) What change in mean steam usage is expec
Taxes Taxes monthly average temperature changes by 1#F
Sale (Local, School), Sale (Local, School), (d) Suppose the monthly average temperature is 4
Price/1000 County)/1000 Price/1000 County)/1000 the fitted value of y and the corresponding resi
25.9 4.9176 30.0 5.0500 11-6. The following table presents the high
29.5 5.0208 36.9 8.2464 mileage performance and engine displacement
27.9 4.5429 41.9 6.6969 Chrysler vehicles for model year 2005 (source: U
40.5 7.7841 mental Protection Agency).
25.9 4.5573
(a) Fit a simple linear model relating highway m
29.9 5.0597 43.9 9.0384 lon ( y) to engine displacement (x) in cubic
29.9 3.8910 37.5 5.9894 least squares.
30.9 5.8980 37.9 7.5422 (b) Find an estimate of the mean highway gaso
28.9 5.6039 44.5 8.7951 performance for a car with 150 cubic in
35.9 5.8282 37.9 6.0831 displacement.
(c) Obtain the fitted value of y and the correspon
31.5 5.3003 38.9 8.3607
for a car, the Neon, with an engine displace
31.0 6.2712 36.9 8.1400 cubic inches.
30.9 5.9592 45.8 9.1416
Calculate correlation
= 0.8760
24
n about the adequacy of the fitted model.
is called
onally, it isthe residual. convenient
occasionally The residual describes
to give special the errortointhe
symbols thenumerator
fit of theand
or of Independent
bservation
Equationyi. 11-8. variable
LaterGiven
in this Y:
1,SyalePrice
chapter
data (x 1),we y2), p
(x2,will use, (xthe
n, ynresiduals
), let to provide
ORRELATION
he adequacy of thevfitted model.
Dependent
ariable
X:
Taxes
a
n 2
is occasionally convenient to give special symbols a b to the numerator and
a= 34.6125 a y
n n
xi
ONation 11-8. Given data (x , y ),
1x1 %1 x2 2 "
(x p
2 2 x 2 % n yn), let
, y ), , (x ,
i"1
n = 24 S " xx xi"1 i i = 6.4049
n (11-10)
a a xi b
i"1
n 2
Therefore, the least squares estimates of the slope and inter-
a a
n n
cept are 1x % x2 " = 829.0462
2 2 i"1
xi b a a
a aslope
S "xx i x % i (11-10)
n n n
i"1 i"1 yi b inter-
Sx y " a 1yi %! i % x2 ""a xi yi %
e, the leastn squares estimates of
Sx y = 191.3612
n the
10.17744 i"1
and
i"1
ˆy2 1x" " n14.94748 (11-11)
i"1 1
Sx x i"10.68088
a a xi b a a yi b
n n
Sx y 191.3612
a1 i S i 829.0462 a
n
and
ˆ 1y"% y2 1x " 10.17744 n
y " ! =
% x2 " x =
y
i i
"% 14.94748
0.2308i"1 i"1
(11-11)
0.68088 n
i"1 xx i"1
25
The fitted simple linear regression model (with the coefficients
Fitted
simple
linear
regression
model ŷ = −1.5837 + 0.2308x
!
!
!
!
!
!
!
!
! n
∑i
r 2
• residuals: σ̂ 2 = MSE = i =1
= 0.6088
n−2
26
• standard
error
of
regression
coefficients
0.6088
= = 0.0271
829.0462
⎡ 1 34.6125 2 ⎤
= 0.6088 ⎢ + ⎥ = 0.9514
⎣ 24 829.0462 ⎦
27
in Section 11-3. HIn0: !addition,
discussedpoints? 1 " 0 1n # 22" ˆ 2 ( "2 has There is
degrees of freedom, and H! ˆ1: !is1 & 0
independent of "
ˆ 2
. As a this tes
1
• 11-26.
test
Consider the data from Exercise 11-4 on y " with salesa c
c and we will use # " 0.01. From Example 11-1 and Table 11-2
price and x " taxes paid.
! we have
Ta
(a) Test H0: !1 " 0 using the t-test; use 'α " = 0.05
0.05. Notice
! ! ˆ !ˆ 1#
" !
14.947 n " 20, S " 0.68088, $̂2
" 1.18 11.35 a
(b) Test !1 " 0 using thexxanalysis of variance with ' " reports
1 H0: 1,0 0.05.
T•0 $
calculate
test
statistics
(11-19)
soDiscuss
2" ˆ ( Sthe
2
the t-statistic relationship
in Equation 10-20 of this test to the test from partThis
becomes (a).sta
xx
! as t0 "
ˆ
! 0.2308
14.947 ˆ
!
1
"=
1 zero is
! t0 " " = 8.5166
" 11.35
degrees of freedom under
2$̂ %Sxx
2 H102 : !121.18
ˆ
se1! $ !%1,0
0.0271 . We would reject
0.68088
!
• threshold
11-4.2 Analysis of Variance Approach to Test Signific
0 t0 !0 & t'(2,n#2 = t 0.0025,22 = 3.119 (11-20)
• value
of
test
statistic
Ais
method called
greater
than
the analysis of variance c
threshold
1-19. The denominator of The procedure
Equation 11-19partitions
is the the total variability
standard
• —>
reject
H0
nents as the basis for the test. The analysis o
the test statistic as
28
ˆ #!
!1 1,0
T0 $ (11-19
Under construct
• the assumptioncthat
onfidence
2" the2
(Sxx interval
ˆ observations for
slope
are normally parameterdistributed,
and independently
a 100(1 " #)% confidence interval on the slope $1 in simple linear regression is
&ˆ2 &ˆ2
hn # 2 degrees
$̂1 "oft#freedom
% 2, n"2
under
( $1 ( H
$̂10:'!t1
# %
$
2, n"2
!1,0. We would rejec
(11-29)
B Sx x B Sx x
0.14631 ≤ β1 ≤ 0.3153
ˆ #!
en Purity Confidence Interval
! on1,0
the Slope
1
0 $
e interval on theTslope of the re- This simplifies to
in Example 11-1. Recall se1! that ˆ 1 2
and &ˆ 2 ) 1.18 (see Table 11-2). 12.181 ( $1 ( 17.713 29
e find