Chapter 14 Solutions

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

Chapter 14 - Multiple Regression and Model Building

14-1
CHAPTER 14Multiple Regression and Model Building
14.1 SSE
14.2 Insert x values into the least squares equation and solve for
y .
14.3 a. b
0
= 29.347, b
l
= 5.6128, b
2
= 3.8344
b
0
= meaningless
b
1
= 5.6128 implies that we estimate that mean sales price increases by $5,612.80 for
each increase of 100 square feet in house size, when the niceness rating stays constant.
b
2
= 3.8344 implies that we estimate that mean sales price increases by $3,834.40 for
each increase in niceness rating of 1, when the square footage remains constant.
b. 172.28. From y

= 29.347 + 5.6128(20) + 3.8344(8)


14.4 a. b
0
= 7.5891, b
l
= 2.3577, b
2
= 1.6122, b
3
= 0.5012
b. y

= 8.41099, y

= 7.5891 2.3577(3.70) + 1.6122(3.90) + .5012(6.50)


14.5 a. b
0
= 1946.8020, b
1
= 0.0386, b
2
= 1.0394, b
3
= 413.7578
b. y = 15896.24725, from y = 1946.802 + .03858 (56194) + 1.0394 (14077.88)
413.7578 (6.89) = 15896.25.
c. Therefore, actual hours were 17207.31 15896.25 = 1311.06 hours greater than
predicted.
14.6
o o ,
2
(Variance and standard deviation of the populations of potential error term values)
14.7 a. The proportion of the total variation in the observed values of the dependent variable
explained by the multiple regression model.
b. The adjusted R-squared differs from R-squared by taking into consideration the
number of independent variables in the model.
14.8 Overall F-test used to determine if at least one independent variable is significant.

14.9 (1) 242 . 3 5 . 10 ; 5 . 10
7
6 . 73
) 1 2 ( 10
6 . 73
) 1 (
; 6 . 73
2
= = = =
+
=
+
= = s
k n
SSE
s SSE
(2) Total variation = 7447.5
Unexplained variation = 73.6
Explained variation = 7374
Chapter 14 - Multiple Regression and Model Building
14-2
(3) 99 .
5 . 7447
7374
2
= = R


2
R and close together and close to 1.

(4) F(model)


(5) Based on 2 and 7 degrees of freedom, F
.05
= 4.74. Since F(model) = 350.87 > 4.74, we
reject 0 :
2 1 0
= = | | H by setting . 05 . = o
(6) Based on 2 and 7 degrees of freedom, F
.01
= 9.55. Since F(model) = 350.87 > 9.55, we
reject 0 :
2 1 0
= = | | H by setting . 01 . = o
(7) p-value = 0.00 (which means less than .001). Since this p-value is less than o = .10,
.05, .01, and .001, we have extremely strong evidence that 0 :
2 1 0
= = | | H is false.
That is, we have extremely strong evidence that at least one of x
1
and x
2
is
significantly related to y.

14.10 (1)
235 . 0551 . ; 0551 .
) 1 3 ( 30
4318 . 1
) 1 (
235 . ; 0551 . s ; 4318 . 1
2
2
= = =
+
=
+
=
= = =
s
k n
SSE
s
s SSE

(2) Total variation = 13.4586
Unexplained variation = 1.4318
Explained variation = 12.0268
(3) 881 . ; 894 .
2 2
= = R R





2
R and close together.

987 .
) 1 2 ( 10
1 10
1 10
2
99 .
) 1 (
1
1
2 2
=
|
|
.
|

\
|
+

|
.
|

\
|

=
|
|
.
|

\
|
+

|
.
|

\
|

=
k n
n
n
k
R R
87 . 350
7 / 6 . 73
2 / 7374
)) 1 2 ( 10 /( 6 . 73
2 / 7374
)) 1 ( /( ) variation d Unexplaine (
/ ) variation Explained (
= =
+
=
+
=
k n
k
8818 .
) 1 3 ( 30
1 30
1 30
3
894 .
) 1 (
1
1
2 2
=
|
|
.
|

\
|
+

|
.
|

\
|

=
|
|
.
|

\
|
+

|
.
|

\
|

=
k n
n
n
k
R R
80 . 72
)) 1 3 ( 30 /( 4318 . 1
3 / 0268 . 12
)) 1 ( /( ) variation d Unexplaine (
/ ) variation Explained (
=
+
=
+
=
k n
k
2
R
2
R
Chapter 14 - Multiple Regression and Model Building
14-3
(4) F(model)


(5) Based on 3 and 26 degrees of freedom, F
.05
= 2.98. Since F(model) = 72.80 > 2.98, we
reject 0 :
3 2 1 0
= = = | | | H by setting . 05 . = o
(6) Based on 3 and 26 degrees of freedom, F
.01
= 4.64. Since F(model) = 72.80 > 4.64, we
reject 0 :
3 2 1 0
= = = | | | H by setting . 01 . = o
(7) p-value is less than .001. Since this p-value is less than o = .10, .05, .01, and .001, we
have extremely strong evidence that 0 :
3 2 1 0
= = = | | | H is false. That is, we have
extremely strong evidence that at least one of x
1
, x
2
, x
3
is significantly related to y.

14.11 (1) SSE = 1,798,712.2179
s
2
=
) 1 k ( n
SSE
+
=
) 1 3 ( 16
2179 . 712 , 798 , 1
+
= 149892.68483
s = 68483 . 149892 = 387.15977
(2) total variation = 464,126,601.6
unexplained = 1,798,712.2179
explained = 462,327,889.39
(3) R
2
=
6 . 601 , 126 , 464
39 . 889 , 327 , 462
= .9961
9952 .
) 1 3 ( 16
1 16
1 16
3
9961 .
) 1 k ( n
1 n
1 n
k
R R
2 2
=
|
|
.
|

\
|
+

|
.
|

\
|

=
|
|
.
|

\
|
+

|
.
|

\
|

=


2
R and close to each other and 1.

(4) F =
)) 1 k ( n ( / ) variation d Unexplaine (
k / ) variation Explained (
+
=
)) 1 3 ( 16 ( / 2179 . 712 , 798 , 1
3 / 39 . 889 , 327 , 462
+
=
1028.131
2
R
Chapter 14 - Multiple Regression and Model Building
14-4
(5) Based on 3 and 12 degrees of freedom, F
.05
= 3.49
F = 1028.131 > F
.05
= 3.49. Reject H
0
: 0
3 2 1
= = = | | | at o = .05
(6) Based on 3 and 12 degrees of freedom, F
.01
= 5.95
F = 1028.131 > F
.01
= 5.95. Reject H
0
: 0
3 2 1
= = = | | | at o = .01
(7) p-value = .0001. Reject H
0
at o = .05, .01, and .001.
14.12
j
x
significantly related to y with strong (a) or very strong (b) evidence
14.13 Explanations will vary.
14.14 We first consider the intercept
0
|
(1) b
0
= 29.347,
0
b
s = 4.891, t = 6.00
where t = b
0
/
0
b
s = 29.347/4.891 = 6.00
(2) We reject H
0
:
0
| = 0 (and conclude that the intercept is significant) with o = .05
if t > t
.05/2
= t
.025

Since t
.025
= 2.365 (with n (k + 1) = 10 (2 + 1) = 7 degrees of freedom), we have t
= 6.00 > t
.025
= 2.365.
We reject H
0
:
0
| = 0 with o = .05 and conclude that the intercept is significant at the
.05 level.
(3) We reject H
0
:
0
| = 0 with o = .01 if t > t
.01/2
= t
.005

Since t
.005
= 3.499 (with 7 degrees of freedom), we have t = 6.00 > t
.005
= 3.499.
We reject H
0
:
0
| = 0 with o = .01 and conclude that the intercept is significant at the
.01 level.
(4) The Minitab output tells us that the p-value for testing H
0
:
0
| = 0 is 0.000. Since this
p-value is less than each given value of o, we reject H
0
:
0
| = 0 at each of these
values of o. We can conclude that the intercept
0
| is significant at the .10, .05, .01,
and .001 levels of significance.
(5) A 95% confidence interval for
0
| is
[b
0
t
o/2
0
b
s ] = [b
0
t
.025
0
b
s ]
= [29.347 2.365(4.891)]
= [17.780, 40.914]
This interval has no practical interpretation since
0
| is meaningless.
Chapter 14 - Multiple Regression and Model Building
14-5
(6) A 99% confidence interval for
0
| is
[b
0
t
.005

0
b
s ] = [29.347 3.499(4.891)]
= [12.233, 46.461]
We next consider
1
| .
(1) b
1
= 5.6128,
1
b
s = .2285, t = 24.56
where t = b
1
/
1
b
s = 5.6128/.2285 = 24.56
(2), (3), and (4):
We reject H
0
:
1
| = 0 (and conclude that the independent variable x
1
is significant) at
level of significance o if t > t
o/2
. Here t
o/2
is based on n (k + 1) = 10 3 = 7 d.f.
For o = .05, t
o/2
= t
.025
= 2.365, and for o = .01, t
o/2
= t
.005
= 3.499.
Since t = 24.56 > t
.025
= 2.365, we reject H
0
:
1
| = 0 with o = .05.
Since t = 24.56 > t
.005
= 3.499, we reject H
0
:
1
| = 0 with o = .01.
Further, the Minitab output tells us that the p-value related to testing H
0
:
1
| = 0 is
0.000. Since this p-value is less than each given value of o, we reject H
0
at each of
these values of o (.10, .05, .01, and .001).
The rejection points and p-values tell us to reject H
0
:
1
| = 0 with o = .10, o = .05, o
= .01, and o = .001. We conclude that the independent variable x
1
(home size) is
significant at the .10, .05, .01, and .001 levels of significance.
(5) and (6):
95% interval for
1
| :
[b
1
t
.025
1
b
s ] = [5.6128 2.365(.2285)]
= [5.072, 6.153]
99% interval for
1
| :
[b
1
t
.005
1
b
s ] = [5.6128 3.499(.2285)]
= [4.813, 6.412]
For instance, we are 95% confident that the mean sales price increases by between
$5072 and $6153 for each increase of 100 square feet in home size, when the rating
stays constant.
Last, we consider
2
| .
(1) b
2
= 3.8344,
2
b
s = .4332, t = 8.85
where t = b
2
/
2
b
s = 3.8344/.4332 = 8.85
(2), (3), and (4):
We reject H
0
:
2
| = 0 (and conclude that the independent variable x
2
is significant) at
level of significance o if t > t
o/2
. Here, t
o/2
is based on n (k +1) = 10 3 = 7 d.f.
For o = .05, t
o/2
= t
.025
= 2.365, and for o = .01, t
o/2
= t
.005
= 3.499.
Since t = 8.85 > t
.025
= 2.365, we reject H
0
:
2
| = 0 with o = .05.
Since t = 8.85 > t
.005
= 3.499, we reject H
0
:
2
| = 0 with o = .01.
Chapter 14 - Multiple Regression and Model Building
14-6
Further, the Minitab output tells us that the p-value related to testing H
0
:
2
| = 0 is
0.000. Since this p-value is less than each given value of o, we reject H
0
at each of
these values of o (.10, .05, .01, and .001).
The rejection points and p-values tell us to reject H
0
:
2
| = 0 with o = .10, o = .05, o
= .01, and o = .001.
We conclude that the independent variable x
2
(niceness rating) is significant at the .10,
.05, .01, and .001 levels of significance.
(5) and (6):
95% interval for
2
| :
[b
2
t
.025
2
b
s ] = [3.8344 2.365(.4332)]
= [2.810, 4.860]
99% interval for
2
| :
[b
2
t
.005
2
b
s ] = [3.8344 3.499(.4332)]
= [2.319, 5.350]
For instance, we are 95% confident that the mean sales price increases by between
$2810 and $4860 for each increase of one rating point, when the home size remains
constant.

14.15 Works like Exercise 14.14
y = c | | | | + + + +
3 3 2 2 1 1 0
x x x
n (k + 1) = 30 (3 + 1) = 26
Rejection points:
t
.025
= 2.056 t
.005
= 2.779
H
0
:
0
| = 0 t =
4450 . 2
5891 . 7
= 3.104; Reject H
0
at o = .05, o = .01
H
0
:
1
| = 0 t =
6379 .
3577 . 2
= 3.696; Reject H
0
at o = .05, not .01
H
0
:
2
| = 0 t =
2954 .
6122 . 1
= 5.459; Reject H
0
at o = .05, o = .01
H
0
:
3
| = 0 t =
1259 .
5012 .
= 3.981; Reject H
0
at o = .05, o = .01
p-value for testing H
0
:
1
| = 0 is .001; Reject H
0
at o = .01
H
0
:
2
| = 0 is less than .001; Reject H
0
at o = .001
H
0
:
3
| = 0 is .0005; Reject H
0
at o = .001
95% C.I.: [b
j
2.056
j
b
s ]
99% C.I.: [b
j
2.779
j
b
s ]
Chapter 14 - Multiple Regression and Model Building
14-7
14.16 Works like Exercise 14.14
y = c | | | | + + + +
3 3 2 2 1 1 0
x x x
n (k + 1) = 16 (3 + 1) = 12
Rejection points:
t
.025
= 2.179 t
.005
= 3.055
H
0
:
0
| = 0 t =
1819 . 504
8020 . 1946
= 3.861; Reject H
0
at o = .05, o = .01
H
0
:
1
| = 0 t =
0130 .
0386 .
= 2.958; Reject H
0
at o = .05, not .01
H
0
:
2
| = 0 t =
0676 .
0394 . 1
= 15.386; Reject H
0
at o = .05, o = .01
H
0
:
3
| = 0 t =
5983 . 98
7578 . 413
= 4.196; Reject H
0
at o = .05, o = .01
p-value for testing H
0
:
1
| = 0 is .0120; Reject H
0
at o = .05
H
0
:
2
| = 0 is .0001; Reject H
0
at o = .001
H
0
:
3
| = 0 is .0012; Reject H
0
at o = .01
95% C.I.: [b
j
2.179
j
b
s ]
99% C.I.: [b
j
3.055
j
b
s ]
14.17 You can be x% confident that a confidence interval contains the true average value of y
given particular values of the xs while you can be x% confident that a prediction interval
contains an individual value of y given particular values of the xs.

14.18 The midpoint of both the confidence interval and the prediction interval is the value of y-
hat given by the least squares model.

14.19 a. Point estimate is y = 172.28 ($172,280)
95% confidence interval is [168.56, 175.99]
b. Point prediction is y = 172.28
95% prediction interval is [163.76, 180.80]
c. Stdev Fit = value Distance s = 1.57
This implies that Distance value = (1.57 / s)
2

= (1.57 / 3.242)
2

= 0.2345
The 99% confidence interval for mean sales price is
[ y t
.005
value Distance s ] with t
.005
based on 7 degrees of freedom
Chapter 14 - Multiple Regression and Model Building
14-8
= [172.28 3.499(1.57)]
= [172.28 5.49]
= [166.79, 177.77]
The 99% prediction interval for an individual sales price is
[ y t
.005
value Distance 1 s + ]
= [172.28 3.499(3.242) 2345 . 0 1+ ]
= [172.28 12.60]
= [159.68, 184.88]
14.20 a. 95% PI | | 90255 . 8 , 91876 . 7
890,255 bottles
791,876($3.70) = $2,929,941.20
b. 99% PI; t
.005
= 2.779
[8.41065 2.779(.235) 04 . 1+ ] = [7.74465,9.07665]

14.21 y = 17207.31 is above the upper limit of the interval [14906.2, 16886.3]; this y-value is
unusually high.

14.22 An independent variable that is measured on a categorical scale.
14.23 There is one dummy variable for each value a categorical variable can take on. You use m-
1 dummy variables to model a categorical variable that can take on m values.
14.24 The difference in the average of the dependent variable when the dummy variable equals 1
and when the dummy variable equals 0.
14.25 a. Different y-intercept for the different types of firms.
b.
2
| equals the difference between the mean innovation adoption times of stock
companies and mutual companies.
c. p-value is less than .001; Reject H
0
at both levels of o.

2
| is significant at both levels of o.
95% CI for
2
| : [4.9770, 11.1339]; 95% confident that for any size of insurance firm,
the mean speed at which an insurance innovation is adopted is between 4.9770 and
11.1339 months faster if the firm is a stock company rather than a mutual company.
d. No interaction
14.26 a. The pool coefficient is $25,862.30. Since the cost of the pool is $35,000 you expect to
recoup $25,862.3 / $35,000 = 74%.
Chapter 14 - Multiple Regression and Model Building
14-9
b. There is not an interaction between pool and any other independent variable.
14.27 a.
B
=
B T M B
| | | | = + + ) 0 ( ) 0 (

M
=
M B T M B
| | | | | + = + + ) 0 ( ) 1 (

T
=
T B T M B
| | | | | + = + + ) 1 ( ) 0 (
b. F = 184.57, p-value < .001:
Reject H
0
; conclude there are differences.
c.





95% C.I. for
M
: ( ) | | 454 . 24 , 346 . 18 ] 433 . 1 131 . 2 4 . 21 [ =
95% C.I. for
T
: | | | | 246 . 1 , 354 . 7 433 . 1 ) 131 . 2 ( 300 . 4 =
d. 77.20, [75.040, 79.360], [71.486, 82.914]
e. ] 754 . 28 , 646 . 22 [ )] 433 . 1 ( 131 . 2 700 . 25 [ =
p-value . t significan is , 001 .
T m
<
14.28 a. The point estimate of the effect on the mean of campaign B compared to campaign A
is b
4
= 0.2695.
The 95% confidence interval = [0.1262, 0.4128]
The point estimate of the effect on mean of campaign C compared to campaign A is b
5

= 0.4396.
The 95% confidence interval = [0.2944, 0.5847]
Campaign C is probably most effective even though intervals overlap.
b. y

= 8.7154 2.768(3.7) + 1.6667(3.9) + 0.4927(6.5) + 0.4396 = 8.61621


Confidence interval = [8.5138,8.71862] Prediction interval = [8.28958, 8.94285]
c.
5
| = effect on mean of Campaign C compared to Campaign B.
d.
5
| is significant at alpha = 0.1 and alpha = 0.05 because p-value = 0.0179. Thus there
is strong evidence that
5
| is greater than 0.
( ) 7 . 25 3 . 4 4 . 21
3 . 4
4 . 21
) (
) (
) (
= =
=
=
= + + =
= + =
= + =
T M
T
M
T M T B M B T M
T B T B B T
M B M B B M
b b
b
b
| | | | | |
| | | |
| | | |
Chapter 14 - Multiple Regression and Model Building
14-10
95% C.I.: [0.0320,0.3081], since 0 is not in the interval, we are confident that
5
> 0
14.29 a. No interaction since p-values are so large.
b. y

= 8.61178 (861,178 bottles)


95% prediction interval = [8.27089,8.95266]slightly bigger
14.30 The situation in which the independent variables used in a regression analysis are
related to each other.
Multicollinearity can affect the least squares point estimates, the standard errors of
these estimates, and the t statistics. Multicollinearity can also cause combinations of
values of the independent variables that one might expect to be in the experimental
region to actually not be in this region.
14.31 Answers will vary; Use R
2
, s,
2
R , prediction intervals and C. Also use stepwise
regression and backward elimination.

14.32 a. 0.9999 (Load-BedDays), 0.9353 (Load-Pop) and 0.9328 (BedDays-Pop)

b. x
1
, x
3
, x
4

c. b
1
and b
4

d. yes
e. Model 1; Model 2; Model 1; Model 2; Could choose either model, although Model 2
may be considered better because of the smaller C and p-value and because of the
results of the stepwise regression and backward elimination.
14.33 Second
2
R , smallest C, smallest p-value, second smallest s; Desirable to have a large store
with a small percentage of the floor space for the prescription department.
14.34 When plotting residuals versus predicted values of y you look for the residuals to fall in a
horizontal band to indicate constant variance. When plotting residuals versus the
individual xs you look for the residuals to fall in a random pattern indicating a linear
model is sufficient. When doing a normal probability plot of the residuals you look for a
straight line pattern to conclude that the errors follow a normal distribution. Finally you
plot the residuals versus time (or observation order) and look for a random pattern to
conclude that the error terms are independent.
14.35 By doing a normal probability plot of the residuals.
14.36 a. straight line appearance
b. explanations will vary
Chapter 14 - Multiple Regression and Model Building
14-11
14.37 Autocorrelation still present based on the residual plot below.

14.38 251056.6 7 1821 52 86 56 1 607 29 28000 893 3 626 30 ~ + + = ) . ( . ) . ( , ) ( . , y
14.39 The estimates of the coefficients indicate that at a specified square footage, adding rooms
increases selling price while adding bedrooms reduces selling price. Thus building both a
family room and a living room (while maintaining square footage) should increase sales
price. In addition, adding a bedroom at the cost of another room will tend to decrease
selling price.
14.40 If you do promotions on weekends, then that promotion, on average nets you 4745 4690
= 55 while promotions on day games net you on average 4745 + 5059 = 9804. Thus
promotions on day games gain, on average, more attendance.
They should change when some promotions are done.
14.41 y

20
= b
0
+ b
1
(20) +b
Q2
(0) + b
Q3
(0) + b
Q4
(1) = 8.75 + 0.5(20) + 4.5 = 23.250
14.42 ln y
133
= 4.69618 + .0103075 (133) + .01903 = 6.086108
Point estimate: e
6.0861
= 439.7
95% prediction interval: [e
5.96593
, e
6.20627
] = [389.92, 495.85]
14.43 a. b
0
= 25.7152, b
1
= 4.9762, b
2
= 1.01905, y

= 25.7152 + 4.9762x 1.01905x


2

b. p-values for x and x
2
are 0.000 confirming both terms significant and a quadratic
relationship.
c. y

= 25.7152 + 4.9762(2.44) 1.01905(2.44


2
) = 31.7901 mpg
d. 95% CI: [31.5481, 32.0322]
e. 95% PI: [31.1215, 32.4588]
14.44 a. Both terms have small p-values so both terms are significant so both are important.
b. 8.32725 = 29.1133 + 11.1342*0.20 7.6080*6.5 + 0.6712*6.5^2 -1.4777*6.5^3

Residuals
-7.1
-4.7
-2.4
0.0
2.4
4.7
0 5 10 15 20
Observation
R
e
s
i
d
u
a
l

(
g
r
i
d
l
i
n
e
s

=

s
t
d
.

e
r
r
o
r
)

You might also like