0% found this document useful (0 votes)

151 views11 pages

Ch12 - Multiple Linear Regression

Uploaded by

Truong Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

151 views11 pages

Ch12 - Multiple Linear Regression

Uploaded by

Truong Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

6/28/2020

Learning Objectives for Chapter 12

12
CHAPTER OUTLINE
Multiple Linear Regression
After careful study of this chapter, you should be able to do
the following:
12-1 Multiple Linear Regression Model 12-4 Prediction of New Observations
12-1.1 Introduction 12-5 Model Adequacy Checking 1. Use multiple regression techniques to build empirical models to
12-1.2 Least squares estimation of the 12-5.1 Residual analysis engineering and scientific data.
Applied Statistics and Probability for parameters
12-1.3 Matrix approach to multiple linear
12-5.2 Influential observations 2. Understand how the method of least squares extends to fitting
12-6 Aspects of Multiple Regression multiple regression models.
Engineers regression
12-1.4 Properties of the least squares
Modeling 3. Assess regression model adequacy.
12-6.1 Polynomial regression models
estimators 12-6.2 Categorical regressors & indicator 4. Test hypotheses and construct confidence intervals on the
12-2 Hypothesis Tests in Multiple Linear variables regression coefficients.
Sixth Edition Regression 12-6.3 Selection of variables & model 5. Use the regression model to estimate the mean response, and to
12-2.1 Test for significance of regression building
Douglas C. Montgomery George C. Runger make predictions and to construct confidence intervals and
12-2.2 Tests on individual regression 12-6.4 Multicollinearity
coefficients & subsets of coefficients prediction intervals.
12-3 Confidence Intervals in Multiple 6. Build regression models with polynomial terms.
Chapter 12 Linear Regression 7. Use indicator variables to model categorical regressors.
12-3.1 Use of t-tests
Multiple Linear Regression 12-3.2 Confidence interval on the mean
8. Use stepwise regression and other model building techniques to
select the appropriate set of variables for a regression model.
response
Chapter 12 Title and Outline 2 Chapter 12 Learning Objectives 3
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

12-1: Multiple Linear Regression Model 12-1: Multiple Linear Regression Model 12-1: Multiple Linear Regression Model

12-1.1 Introduction 12-1.1 Introduction 12-1.1 Introduction

• Many applications of regression analysis • For example, suppose that the effective life of a
involve situations in which there are more than cutting tool depends on the cutting speed and the
one regressor variable. tool angle. A possible multiple regression model Any regression model that is
• A regression model that contains more than could be linear in the parameters (the
one regressor variable is called a multiple b ’ s) is a linear regression
Y = b0 + b1x1 + b2 x 2 +  (12-1) model, regardless of the
regression model. shape of the surface that it
where generates .
Y – tool life
x1 – cutting speed
x2 – tool angle
Sec 12-1 Multiple Linear Regression Model 4 Sec 12-1 Multiple Linear Regression Model 5 Sec 12-1 Multiple Linear Regression Model 6
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

12-1: Multiple Linear Regression Model 12-1: Multiple Linear Regression Model
12-1.2 Least Squares Estimation of the Parameters
12-1.1 Introduction
The method of least squares may be used to estimate the regression
coefficients in the multiple regression model,
the expected change in Y when xi is changed Y = b0 + b1x1 + b2 x 2 +…+ bK x K + . Suppose that n > k observations are
by one unit (say) is a function of x1 and x2. available, and let xij denote the ith observation or level of variable xj. The
the expected change in y when x1 is changed by observations are
one unit (say) is a function of both x1 and x2 (xi1, xi2, , xik, yi), i = 1, 2, , n and n  k

changing x1 from 2 to 8 produces a much It is customary to present the data for multiple regression in a table such as
smaller change in E(Y) when x2 = 2 than when Table 12-1.
x2 = 10. Interaction effects occur frequently in The quadratic and interaction terms in this Table 12-1 Data for Multiple Linear Regression
the study and analysis of real-world systems, model produce a mound-shaped function. y x1 x2  xk
and regression methods are one of the 
y1 x11 x12 x 1k
techniques that we can use to describe them. 
y2 x21 x22 x 2k

yn xn1 xn2  x nk
Sec 12-1 Multiple Linear Regression Model 7 8 Sec 12-1 Multiple Linear Regression Model 9
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

1
6/28/2020

12-1: Multiple Linear Regression Model 12-1: Multiple Linear Regression Model 12-1: Multiple Linear Regression Model
12-1.2 Least Squares Estimation of the Parameters 12-1.2 Least Squares Estimation of the Parameters EXAMPLE 12-1 Wire Bond Strength, we used data on pull strength of
a wire bond in a semiconductor manufacturing process, wire length,
• The least squares function is given by • The least squares normal equations are and die height to illustrate building an empirical model.
2 Observation Pull Wire Length Die Observation Pull Wire Die Height
n n  k  n n n n
L =  i2 =   yi − b0 −  b j xij 
 
(12-2) nbˆ 0 + bˆ 1  xi1 + bˆ   xi 2 +  + bˆ k  xik =  yi Number Strength
y
x1 Height
x2
Number Strength
y
Length
x1
x2

i =1  i =1 i =1 i =1 i =1
i =1 j =1  1 9.95 2 50 14 11.66 2 360
n n n n n
• The least squares estimates must satisfy bˆ 0  xi1 + bˆ   xi21 + bˆ   xi1 xi 2 + + bˆ k  xi1 xik =  xi1 yi 2
3
24.45
31.75
8
11
110
120
15
16
21.65
17.89
4
4
205
400
i =1 i =1 i =1 i =1 i =1
 4 35.00 10 550 17 69.00 20 600
     
L n k 5 25.02 8 295 18 10.30 1 585
=−2  y − bˆ − bˆ x  = 0
0  j ij
(12-3) n n n n n 6 16.86 4 200 19 34.93 10 540
b0 bˆ ˆ
0 , b1 ,, b k
ˆ i =1
 i
 j =1


bˆ 0  xik + bˆ 1  xik xi1 + bˆ 2  xik xi 2 +  + bˆ k  xik2 =  xik yi (12-5) 7
8
14.38
9.60
2
2
375
52
20
21
46.59
44.88
15
15
250
290
i =1 i =1 i =1 i =1 i =1
9 24.35 9 100 22 54.12 16 510
and 10 27.50 8 300 23 56.63 17 590
L  n k 
= − 2   yi − bˆ 0 −  bˆ j xij  xij = 0 j = 1, 2, , k (12-4) • The solution to the normal equations are the least 11 17.08 4 412 24 22.13 6 100
b j 
i =1 
 squares estimators of the regression coefficients.
12 37.00 11 400 25 21.15 5 400
bˆ 0 , bˆ 1 ,, bˆ k j =1  13 41.95 12 500
Sec 12-1 Multiple Linear Regression Model 10 Sec 12-1 Multiple Linear Regression Model 11 Sec 12-1 Multiple Linear Regression Model 12
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

12-1: Multiple Linear Regression Model R output 12-1: Multiple Linear Regression Model
Example 12-1
library(psych)

These displays can be pairs.panels(ex12,

Specifically, we will fit the multiple linear regression model
helpful in visualizing method = "pearson",

the relationships hist.col = "#00AFBB",

among variables in a density = TRUE Y= b 0 + b 1x1 + b 2x 2 + 
multivariable data set. ellipses = TRUE
For example, the plot ) where Y = pull strength, x1 = wire length, and x2 = die height.
indicates that there is
a strong linear 25 25 25

relationship between n = 25,  yi = 725.82,  xi1 = 206,  xi 2 = 8, 294

i =1 i =1 i =1
strength and wire
25 25 25
length.
 i =1
x = 2, 396,
2
i1 x
i =1
2
i2 = 3, 531,848,  xi1 xi 2 = 77,177
i =1
25 25


i =1
xi1 yi = 8, 008.47,  xi 2 yi = 274,816.71
i =1

Figure 12-4 Matrix of computer-generated scatter plots for the wire bond
pull strength data in Table 12-2.
Sec 12-1 Multiple Linear Regression Model 13 14 Sec 12-1 Multiple Linear Regression Model 15
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

12-1: Multiple Linear Regression Model 12-1: Multiple Linear Regression Model 12-1: Multiple Linear Regression Model
Example 12-1 Example 12-1 12-1.3 Matrix Approach to Multiple Linear Regression
For the model Y = b0 + b1x1 + b2 x 2 + , the normal equations (12-5) are The solution to this set of equations is
n n n
bˆ  = 2.26379 , bˆ  = 2.74427 , bˆ  = 0.01253 Suppose the model relating the regressors
1  xi1 + b 2  xi 2 =  yi
ˆ +b
nb ˆ ˆ
0
i =1 i =1 i =1 to the response is
n n n n Therefore, the fitted regression equation is
ˆ
b   xi1 + bˆ   xi21 ˆ
+b   xi1xi 2 =  xi1 yi yi = b0 + b1xi1 + b2 xi 2 +  + bk xik + i i = 1, 2, , n
i =1 i =1 i =1 i =1
n n n n yˆ = 2.26379 + 2.74427x1 + 0.01253x2
ˆ
b   xi 2 + bˆ   xi1xi 2 ˆ
+b   xi22 =  xi 2 yi
i =1 i =1 i =1 i =1
In matrix notation this model can be written as
Inserting the computed summations into the normal equations, we obtain Practical Interpretation: This equation can be used to
25bˆ 0 + 206bˆ 1 + 8294bˆ 2 = 725.82 predict pull strength for pairs of values of the regressor
y = Xb +  (12-6)
variables wire length (x1) and die height (x2).
206bˆ  + 2396bˆ  + bˆ  = 8,008.47
8294bˆ  + 77,177bˆ  + 3,531,848bˆ  = 274,816.71
Sec 12-1 Multiple Linear Regression Model 16 Sec 12-1 Multiple Linear Regression Model 17 Sec 12-1 Multiple Linear Regression Model 18
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

2
6/28/2020

12-1: Multiple Linear Regression Model 12-1: Multiple Linear Regression Model 12-1: Multiple Linear Regression Model
12-1.3 Matrix Approach to Multiple Linear Regression 12-1.3 Matrix Approach to Multiple Linear Regression 12-1.3 Matrix Approach to Multiple Linear Regression
The fitted regression model is
We wish to find the vector of least squares k
where estimators that minimizes: yˆi = bˆ 0 +  bˆ j xij i = 1, 2, , n (12-8)
j =1
n
 y1  1 x11 x12  x1k  b0   1  =  = (y − Xb ) (y − Xb )
y  1 x  b    L=  2
i
In matrix notation, the fitted model is
21 x22  x2 k  i =1 ˆ
y =  2 X= b =  1  and  =  2  yˆ = Xb
       
        The resulting least squares estimate is
 yn  1 xn1 xn 2  xnk  bk   n  The difference between the observation yi and the fitted
value yˆ i is a residual, say, ei = yi − yˆi . The (n  1) vector of
b̂ = (XX) −1 Xy (12-7) residuals is denoted by

e = y − yˆ (12-9)

Sec 12-1 Multiple Linear Regression Model 19 Sec 12-1 Multiple Linear Regression Model 20 Sec 12-1 Multiple Linear Regression Model 21
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

12-1: Multiple Linear Regression Model Example 12-2 12-1: Multiple Linear Regression Model
1 50   9.95
Example 12-2 1
2
8 110   24.45  Example 12-2
   
1 120   31.75 
1 2 50 
11
    The XX matrix is
In Example 12-1, we illustrated fitting the multiple 1 1  1 
110 
1 10 550   35.00 
XX =  2 8  5  
1 8 295   25.02  1 8
   
regression model 1 4 200   16.86     
50 110  400 1 5
1   14.38 

2 375
   
1 2 52   9.60   400
y = b0 + b1x1 + b2x2 +  1

1
9 100 

300 
 24.35 
 
 27.50 
 25 206 8,294 
=  206 77,177 
8
    2,396
1 4 412   17.08 
1 400   37.00 
8,294 77,177 3,531,848
11
where y is the observed pull strength for a wire 
X = 1 12
  
500  y =  41.95 
1   11.66 
bond, x1 is the wire length, and x2 is the die height. 
2 360
   and the Xy vector is
1 4 205   21.65 
The 25 observations are in Table 12-2. We will now 1 400   17.89   9.95 
4
1 1  1   725.82 
24.45 
   

Xy =  2 8  5   = 8,008.47 
1 20 600   69.00 
use the matrix approach to fit the regression model    
1 1 585   10.30     
50 110  400 21.15 274,816.71
1 540   34.93
above to these data. The model matrix X and y 
10
  
1 15 250   46.59   
vector for this model are 1

15 290 

 44.88 
  The least squares estimates are found from Equation 12-7 as
1 16 510   54.12 
1 17 590   56.63
b̂ = (XX) −1 Xy
   
1 6 100   22.13
   
1 5 400   21.15 
Sec 12-1 Multiple Linear Regression Model 22 Sec 12-1 Multiple Linear Regression Model 23 Sec 12-1 Multiple Linear Regression Model betahat<-inv(t(X)%*%X)%*%(t(X)%*%Y) 24
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

12-1: Multiple Linear Regression Model 12-1: Multiple Linear Regression Model 12-1: Multiple Linear Regression Model
Example 12-2 Example 12-2 Example 12-2
bˆ 0   25 −1 This regression model can be used to predict values of pull strength for various Table 12-3 Observations, Fitted Values, and Residuals for Example 12-2
or 206 8, 294   725.82 
   values of wire length (x1) and die height (x2). We can also obtain the fitted values
 bˆ 1  =  206 2,396 77,177   8,008.37 
 ˆ  8, 294 77,177 3,531,848  274,811.31 by substituting each observation (xi1, xi2), i = 1, 2, , n, into the equation. For
b2      Observation Observation
example, the first observation has x11 = 2 and Number yi ŷi ei = yi − yˆi Number yi ŷi ei = yi − yˆi
 0.214653 −0.007491 −0.000340   725.82  x12 = 50, and the fitted value is 1 9.95 8.38 1.57 14 11.66 12.26 –0.60
=  −0.007491 0.001671 −0.000019   8,008.47  yˆ1 = 2.26379 + 2.74427 x11 + 0.01253x12 2 24.45 25.60 –1.15 15 21.65 15.81 5.84
–2.20 –0.36
 −0.000340 −0.000019 +0.0000015  274,811.31 = 2.26379 + 2.74427(2) + 0.01253(50) 3 31.75 33.95 16 17.89 18.25
4 35.00 36.60 –1.60 17 69.00 64.67 4.33
 2.26379143 = 8.38 5 25.02 27.91 –2.89 18 10.30 12.34 –2.04
=  2.74426964  The corresponding observed value is y1 = 9.95. The residual corresponding to the 6 16.86 15.75 1.11 19 34.93 36.47 –1.54
 0.01252781 first observation is 7 14.38 12.45 1.93 20 46.59 46.56 0.03
e1 = y1 − yˆ1 8 9.60 8.40 1.20 21 44.88 47.06 –2.18
9 24.35 28.21 –3.86 22 54.12 52.56 1.56
Therefore, the fitted regression model with the regression coefficients = 9.95 − 8.38
10 27.50 27.98 –0.48 23 56.63 56.31 0.32
rounded to five decimal places is = 1.57 11 17.08 18.40 –1.32 24 22.13 19.98 2.15
yˆ = 2.26379 + 2.74427x1 + 0.01253x2 12 37.00 37.46 –0.46 25 21.15 21.00 0.15
Table 12-3 displays all 25 fitted values and the corresponding residuals. The 13 41.95 41.46 0.49
This is identical to the results obtained in Example 12-1. fitted values and residuals are calculated to the same accuracy as the original
data.
Sec 12-1 Multiple Linear Regression Model 25 Sec 12-1 Multiple Linear Regression Model 26 Sec 12-1 Multiple Linear Regression Model 27
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

3
6/28/2020

Minitab output 12-1: Multiple Linear Regression Model 12-1: Multiple Linear Regression Model
Estimating 2 12-1.4 Properties of the Least Squares Estimators

Unbiased estimators:
An unbiased estimator of 2 is E (bˆ ) = E (XX )−1 XY  
n
 ei2

= E (XX )−1 X(Xb +  ) 

ˆ2 = i =1 =
SS E (1210) = E (XX ) −1
XXb + (XX )−1 X 
n− p n− p =b
Covariance Matrix:
C00 C01 C02 
C = ( XX)−1 = C10 C11 C12 
C20 C21 C22 
29 Sec 12-1 Multiple Linear Regression Model 30 Sec 12-1 Multiple Linear Regression Model 31
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

12-1: Multiple Linear Regression Model R output R output

>RegModel.1 <- lm(Strength~Height+Length, data=Ex12) > Anova(RegModel.1, type="II")
Response: Strength
12-1.4 Properties of the Least Squares Estimators Sum Sq Df F value Pr(>F)
>summary(RegModel.1) Height 104.9 1 20.041 0.0001883 ***
Call: Length 4507.5 1 861.011 < 2.2e-16 ***
lm(formula = Strength ~ Height + Length, data = Ex12) Residuals 115.2 22
Individual variances and covariances: Residuals: Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ‘ 1
> RegModel.2<-lm(Strength~1+Height+Length, data=Ex12)
Min 1Q Median 3Q Max
V (bˆ j ) =  2 C jj ,
> summary(RegModel.2)
j = 0, 1, 2 -3.865 -1.542 -0.362 1.196 5.841 Call:
Coefficients: lm(formula = Strength ~ 1 + Height + Length, data = Ex12)

cov(bˆ i , bˆ j ) =  2 Cij , i j Estimate Std. Error t value Pr(>|t|) Residuals:

Min 1Q Median 3Q Max
(Intercept) 2.263791 1.060066 2.136 0.044099 *
-3.865 -1.542 -0.362 1.196 5.841
Height 0.012528 0.002798 4.477 0.000188 *** Coefficients:
Length 2.744270 0.093524 29.343 < 2e-16 *** Estimate Std. Error t value Pr(>|t|)
In general, --- (Intercept) 2.263791 1.060066 2.136 0.044099 *
Height 0.012528 0.002798 4.477 0.000188 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Length 2.744270 0.093524 29.343 < 2e-16 ***

cov(bˆ ) =  2 ( XX) −1 =  2C Residual standard error: 2.288 on 22 degrees of freedom

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.9811, Adjusted R-squared: 0.9794 Residual standard error: 2.288 on 22 degrees of freedom
F-statistic: 572.2 on 2 and 22 DF, p-value: < 2.2e-16 Multiple R-squared: 0.9811, Adjusted R-squared: 0.9794
Sec 12-1 Multiple Linear Regression Model 32 33 F-statistic: 572.2 on 2 and 22 DF, p-value: < 2.2e-16 34
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

12-2: Hypothesis Tests in Multiple Linear Regression 12-2: Hypothesis Tests in Multiple Linear Regression 12-2: Hypothesis Tests in Multiple Linear Regression

12-2.1 Test for Significance of Regression 12-2.1 Test for Significance of Regression EXAMPLE 12-3 Wire Bond Strength ANOVA We will test for
significance of regression (with  = 0.05) using the wire bond pull
strength data from Example 12-1. The total sum of squares is
The appropriate hypotheses are Table 12-9 Analysis of Variance for Testing
Significance of Regression in Multiple Regression  n 
2
H 0 : b1 = b 2 =  = b k = 0   yi 
  (725.82) 2
SST = y y −  i =1  = 27,178.5316 −
H1: b j  0 for at least one j (1211) Source of Degrees of n 25
Variation Sum of Squares Freedom Mean Square F0 = 6105.9447
Regression SSR k MSR MSR/MSE
The test statistic is Error or SSE n–p MSE
residual
SS R /k MS R Total SST n–1
F0 = = (1212)
SS E /(n − p) MS E

Sec 12-2 Hypothesis Tests in Multiple Linear Regression 35 Sec 12-2 Hypothesis Tests in Multiple Linear Regression 36 Sec 12-2 Hypothesis Tests in Multiple Linear Regression 37
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

4
6/28/2020

12-2: Hypothesis Tests in Multiple Linear Regression 12-2: Hypothesis Tests in Multiple Linear Regression 12-2: Hypothesis Tests in Multiple Linear Regression

Example 12-3 Example 12-3 Example 12-3

The analysis of variance is shown in Table 12-10. To test H0: b1 = b2 = 0,
The regression or model sum of squares is computed as follows: we calculate the statistic Table 1210 Test for Significance of' Regression for Example 12-3
2 MS R 2995.3856
  n f0 = = = 572.17
  yi  MS E 5.2352 Source of Sum of Degrees of Mean
  (725.82) 2
SS R = bˆ  X y −  i =1  = 27,063.3581 −
Variation Squares Freedom Square f0 Pvalue
Regression 5990.7712 2 2995.3856 572.17 1.08E-19
n 25 Since f0 > f0.05,2,22= 3.44 (or since the Pvalue is considerably smaller than 
Error or residual 115.1735 22 5.2352
= 5990.7712 = 0.05), we reject the null hypothesis and conclude that pull strength is
linearly related to either wire length or die height, or both. Total 6105.9447 24

and by subtraction Practical Interpretation: Rejection of H0 does not necessarily imply that
the relationship found is an appropriate model for predicting pull strength
SSE = SST − SSR = y y − b X y = 115.1716 as a function of wire length and die height. Further tests of model
adequacy are required before we can be comfortable using this model in
practice.

Sec 12-2 Hypothesis Tests in Multiple Linear Regression 38 Sec 12-2 Hypothesis Tests in Multiple Linear Regression 39 Sec 12-2 Hypothesis Tests in Multiple Linear Regression 40
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

12-2: Hypothesis Tests in Multiple Linear Regression 12-2: Hypothesis Tests in Multiple Linear Regression 12-2: Hypothesis Tests in Multiple Linear Regression

R2 and Adjusted R2 R2 and Adjusted R2 12-2.2 Tests on Individual Regression Coefficients

The coefficient of multiple determination The adjusted R2 is and Subsets of Coefficients

The hypotheses for testing the significance of any

SS SS individual regression coefficient:
R2 = R = 1 − E (1213)
SST SST
H0: bj = bj0
• The adjusted R2 statistic penalizes the analyst for adding terms to the model. H1: bj  bj0 (12-15)
• It can help guard against overfitting (including regressors that are not really
useful)
• For the wire bond pull strength data, we find that R2 =
SSR/SST = 5990.7712/6105.9447 = 0.9811. •The adjusted R-squared increases only if the new term improves the
model more than would be expected by chance. It decreases when a
• Thus, the model accounts for about 98% of the variability predictor improves the model by less than expected by chance. The adjusted
in the pull strength response. R-squared can be negative, but it’s usually not. It is always lower than
the R-squared.

Sec 12-2 Hypothesis Tests in Multiple Linear Regression 41 Sec 12-2 Hypothesis Tests in Multiple Linear Regression 42 Sec 12-2 Hypothesis Tests in Multiple Linear Regression 43
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

regression model is given by which reduces to The variance of the estimated mean response is

bˆ j − t/2, n − p 
ˆ 2C jj  b j  bˆ j + t/2, n − p 
ˆ 2C jj (12-22)
2.55029  b1  2.93825
( )
V ˆ Y | x0 =  2 x0 (XX )−1 x 0 (12-24)
Also, computer software such as Minitab can be used to help calculate this
confidence interval. From the regression output in Table 10-4,bˆ 1 = 2.74427 and the
standard error of bˆ 1 = 0.0935 . This standard error is the multiplier of the t-table
constant in the confidence interval. That is, 0.0935 = (5.2352 )(0.001671 ) .
Consequently, all the numbers are available from the computer output to
construct the interval and this is the typical method used in practice.
Sec 12-3 Confidence Intervals in Multiple Linear Regression 54 Sec 12-3 Confidence Intervals in Multiple Linear Regression 55 Sec 12-3 Confidence Intervals in Multiple Linear Regression 56
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

5
6/28/2020

12-3: Confidence Intervals in Multiple Linear Regression 12-3: Confidence Intervals in Multiple Linear Regression 12-3: Confidence Intervals in Multiple Linear Regression
Example 12-8 Wire Bond Strength Confidence Interval on the Example 12-8
12-3.2 Confidence Interval on the Mean Response Mean Response
The variance of ̂ Y |x0 is estimated by
Definition The engineer in Example 12-1 would like to construct a 95% CI on the
mean pull strength for a wire bond with wire length x1 = 8 and die height  .214653 −.007491 −.000340   1
For the multiple linear regression model, a 100(1 - )% x2 = 275. Therefore, ˆ 2 x0 (XX)−1 x 0 = 5.2352 [1 8 275]  − .007491 .001671 − .000019   8 
confidence interval on the mean response at the point  1   − .000340 − .000019 .0000015  275 
x01, x02, , x0k is x0 =  8  = 5.2352 (0.0444) = 0.23244
275
ˆ Y |x 0 − t /2,n− p ˆ 2 x0 ( X X) −1 x 0 Therefore, a 95% CI on the mean pull strength at this point is found
(12-25) The estimated mean response at this point is found from Equation 12-23 from Equation 12-25 as
  Y |x 0  ˆ Y |x 0 + t /2,n− p ˆ 2 x0 ( X X) −1 x 0 as
27.66 − 2.074 0.23244  Y |x 0  27.66 + 2.074 0.23244
2.26379
ˆ Y |x0 = x0bˆ = 1 8 275 2.74427 = 27.66 which reduces to
26 .66  Y | x 0  28 .66
 0.01253

Sec 12-3 Confidence Intervals in Multiple Linear Regression 57 Sec 12-3 Confidence Intervals in Multiple Linear Regression 58 Sec 12-3 Confidence Intervals in Multiple Linear Regression 59
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

12-4: Prediction of New Observations 12-4: Prediction of New Observations 12-4: Prediction of New Observations
A point estimate of the future observation Y0 is Example 12-9 Wire Bond Strength Confidence Interval
Suppose that the engineer in Example 12-1 wishes to construct a 95%
yˆ 0 = x0b̂ prediction interval on the wire bond pull strength when the wire length is
x1 = 8 and the die height is x2 = 275. Note that x0 = [1 8 275] , and the point
estimate of the pull strength is yˆ 0 = x0bˆ = 27.66 . Also, in Example 12-8 we
A 100(1-)% prediction interval for this future observation calculated x0 ( X X) −1 x0 = 0.04444 . Therefore, from Equation 12-26 we have
is
27.66 − 2.074 5.2352(1 + 0.0444)  Y0  27.66
yˆ 0 − t /2, n− p ˆ 2 (1 + x0 ( X X) −1 x 0 ) (12-26) + 2.074 5.2352(1 + 0.0444)

−1
 Y0  yˆ 0 − t /2, n− p ˆ (1 + x0 ( X X) x 0 )
2 and the 95% prediction interval is

22.81  Y0  32.51

Notice that the prediction interval is wider than the confidence interval on
the mean response at the same point, calculated in Example 12-8. The
Minitab output in Table 12-4 also displays this prediction interval.
Sec 12-4 Prediction of New Observations 60 Sec 12-4 Prediction of New Observations 61 62
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

12-5: Model Adequacy Checking 12-5: Model Adequacy Checking Model Adequacy Checking
Assumptions 12-5.1 Residual Analysis 𝑎𝑣𝑒𝑟𝑎𝑔𝑒(𝑒𝑖) = 0

1. The relationship between the response y and the regressors is

linear, at least approximately.

2. The error term ε has zero mean. plotting residuals is a very effective
way to investigate how well the regression
model fits the data and to check the
3. The error term ε has constant variance σ2 . assumptions

4. The errors are uncorrelated.

5. The errors are normally distributed.

Sec 12-5 Model Adequacy Checking 63 Sec 12-5 Model Adequacy Checking 64 65
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

6
6/28/2020

12-5: Model Adequacy Checking 12-5: Model Adequacy Checking

12-5.1 Residual Analysis
Example 12-10 12-5.1 Residual Analysis
The residuals for the model from Example 12-1 are shown in Table 12-3. A Example 12-10
normal probability plot of these residuals is shown in Fig. 12-6. No severe
deviations from normality are obviously apparent, although the two largest The standardized residuals
residuals (e15 = 5.84 and e17 = 4.33) do not fall extremely close to a ei ei
straight line drawn through the remaining residuals.
di = = (12-27)
MS E ̂ 2
are often more useful than the ordinary residuals when assessing
residual magnitude. For the wire bond strength example, the
standardized residuals corresponding to e15 and e17 are
d15 = 5.84 / 5.2352 = 2.55 and d17 = 4.33 / 5.2352 = 1.89 , and they do not seem
unusually large. Inspection of the data does not reveal any error in
Figure 12-6 Normal probability plot collecting observations 15 and 17, nor does it produce any other
of residuals. reason to discard or modify these two points.
66 Sec 12-5 Model Adequacy Checking 67 Sec 12-5 Model Adequacy Checking 68
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

12-5: Model Adequacy Checking 12-5: Model Adequacy Checking

12-5.1 Residual Analysis 12-5.1 Residual Analysis
Example 12-10
The residuals are plotted against ŷ in Fig. 12-7, and against x1 and x2 in Figs. 12-8 and 12-9,
Example 12-10
respectively.* The two largest residuals, e15 and e17, are apparent. Figure 12-8 gives some Either the relationship between strength and wire length is not
indication that the model underpredicts the pull strength for assemblies with short wire length
(x1  6) and long wire length (x1  15) and overpredicts the strength for assemblies with linear (requiring that a term involving x12 , say, be added to the
intermediate wire length (7  x1  14). The same impression is obtained from Fig. 12-7. Either model), or other regressor variables not presently in the model
the relationship between strength and wire length is not linear (requiring that a term involving affected the response.
x12 , say, be added to the model), or other regressor variables not presently in the model
affected the response.

Figure 12-8 Plot of residuals

against x1.

Figure 12-7 Plot of residuals against

69 Sec 12-5 Model Adequacy Checking 70 Sec 12-5 Model Adequacy Checking 71
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

12-5: Model Adequacy Checking 12-5: Model Adequacy Checking 12-5: Model Adequacy Checking
12-5.1 Residual Analysis 12-5.1 Residual Analysis 12-5.1 Residual Analysis
Example 12-10 Since each row of the matrix X corresponds to a vector,
ei
ri = i = 1, 2, , n (12-28)
say xi = 1, xi1, xi2 , , xik  , another way to write the diagonal
ˆ 2 (1 − hii )
Figure 12-9 Plot of residuals against x2.

elements of the hat matrix is

where hii is the ith diagonal element of the matrix hii = xi (X X )−1 xi (12-29)

H = X(XX)-1X The variance of the ith residual is

The H matrix is sometimes called the “hat” matrix, since
V(ei) = 2(1 - hii), i = 1, 2, , n
yˆ = Xbˆ = X( X X) −1 X y = Hy

Sec 12-5 Model Adequacy Checking 72 Sec 12-5 Model Adequacy Checking 73 Sec 12-5 Model Adequacy Checking 74
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

7
6/28/2020

12-5: Model Adequacy Checking 12-5: Model Adequacy Checking 12-5: Model Adequacy Checking
12-5.1 Residual Analysis EXAMPLE 12-11 Wire Bond Strength Cook’s Distances Table
12-5.2 Influential Observations
To illustrate, consider the two observations identified in the wire bond strength 12-12 lists the values of the hat matrix diagonals hii and Cook’s
data (Example 12-10) as having residuals that might be unusually large, Cook’s distance measure distance measure Di for the wire bond pull strength data in Example
observations 15 and 17. The standardized residuals are 12-1. To illustrate the calculations, consider the first observation:
e 5.84 e 4.33
d15 = 15 = = 2.55 d17 = 17 = = 1.89
(bˆ (i ) − bˆ )X X(bˆ (i ) − bˆ )
and r12 h11
ˆ 2 5.2352 MS E 5.2352 D1 = 
Di = i = 1, 2, , n p (1 − h11 )
Now h15,15 = 0.0737 and h17,17 = 0.2593, so the studentized residuals are pˆ 2
[e1/ MS E (1 − h11 ) ]2 h11
r15 =
e15
=
5.84
= 2.65 =− 
(
ˆ 2 1 − h15,15 )
5.2352(1 − 0.0737)
r2 hii
p (1 − h11 )
Di = i i = 1, 2, , n (12-30) [1.57 / 5.2352(1 − 0.1573) ]2
p (1 − hii )
and 0.1573
r17 =
e17
=
4.33
= 2.20 = 
(
ˆ 2 1 − h17,17 ) 5.2352(1 − 0.2593) 3 (1 − 0.1573)
= 0.035
Notice that the studentized residuals are larger than the corresponding
The Cook distance measure Di does not identify any potentially
standardized residuals. However, the studentized residuals are still not so large
as to cause us serious concern about possible outliers. influential observations in the data, for no value of Di exceeds unity.
Sec 12-5 Model Adequacy Checking 75 Sec 12-5 Model Adequacy Checking 76 Sec 12-5 Model Adequacy Checking 77
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

12-5: Model Adequacy Checking Example 12-6: Aspects of Multiple Regression Modeling
Example 12-11 12-6.1 Polynomial Regression Models
TABLE • 12-8 Influence Diagnostics for the Wire Bond Pull Strength Data
Nghiên cứu ảnh hưởng của các yếu tố
như nhiệt độ, thời gian, và thành phần hóa
The linear model Y = Xb +  is a general model that can be used to
Observations hii Cook’s Distance Measure Di Observations i hii Cook’s Distance
i Measure Di học đến sản lượng CO2. Số liệu của fit any relationship that is linear in the unknown parameters b.
1 0.1573 0.035 14 0.1129 0.003 nghiên cứu này có thể tóm lược trong
bảng. Mục tiêu chính của nghiên cứu là This includes the important class of polynomial regression
2 0.1116 0.012 15 0.0737 0.187 tìm một mô hình hồi qui tuyến tính để tiên models. For example, the second-degree polynomial in one
3 0.1419 0.060 16 0.0879 0.001 đoán sản lượng CO2, cũng như đánh giá
độ ảnh hưởng của các yếu tố này. variable
4 0.1019 0.021 17 0.2593 0.565
y = sản lượng CO2;
5 0.0418 0.024 18 0.2929 0.155
X1 = thời gian (phút); Y = b 0 + b 1x + b 11x2 +  (12-31)
6 0.0749 0.007 19 0.0962 0.018 X2 = nhiệt độ (C);
7 0.1181 0.036 20 0.1473 0.000 X3 = phần trăm hòa tan;
8 0.1561 0.020 21 0.1296 0.052
X4 = lượng dầu (g/100g); and the second-degree polynomial in two variables
X5 = lượng than đá;
9 0.1280 0.160 22 0.1358 0.028 X6 = tổng số lượng hòa tan;
10 0.0413 0.001 23 0.1824 0.002
X7 = số hydrogen tiêu thụ. Y = b + b1x1 + b2 x2 + b11x12 + b 22 x22 + b12 x1 x2 +  (12-32)
11 0.0925 0.013 24 0.1091 0.040
12 0.0526 0.001 25 0.0729 0.000 are linear regression models.
13 0.0820 0.001
Sec 12-5 Model Adequacy Checking 78 79 Sec 12-6 Aspects of Multiple Regression Modeling 80
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

12-6: Aspects of Multiple Regression Modeling 12-6: Aspects of Multiple Regression Modeling Example 12-12

EXAMPLE 12-12 Airplane Sidewall Panels Sidewall panels for the Example 12-12 We will fit the model
interior of an airplane are formed in a 1500-ton press. The unit
manufacturing cost varies with the production lot size. The data Y = b 0 + b 1x + b 11x2 + 
shown below give the average cost per unit (in hundreds of dollars)
for this product (y) and the production lot size (x). The scatter The y vector, the model matrix X and the b vector are as
diagram, shown in Fig. 12-11, indicates that a second-order follows: 1.81 1 20 400
polynomial may be appropriate. 1.70 1 25 625
  
1.65 1 30 900
   
1.55 1 35 1225
y 1.81 1.70 1.65 1.55 1.48 1.40 1.48 1 40 1600
    b0 
x 20 25 30 35 40 50 
y=
1.40
X = 
1 50 2500
b = b1 
1.30 1 60 3600
Figure 12-11 Data for     b11 
y 1.30 1.26 1.24 1.21 1.20 1.18 Example 12-11. 1.26 1 65 4225
1.24 1 70 4900
   
x 60 65 70 75 80 90 1.21 1 75 5625
   
1.20 1 80 6400
1.18 1 90 8100
Sec 12-6 Aspects of Multiple Regression Modeling 81 Sec 12-6 Aspects of Multiple Regression Modeling 82 Sec 12-6 Aspects of Multiple Regression Modeling 83
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

8
6/28/2020

12-6: Aspects of Multiple Regression Modeling R output 12-6: Aspects of Multiple Regression Modeling
Example 12-12
12-6.2 Categorical Regressors and Indicator Variables
Solving the normal equations X X b̂ = X y gives the fitted model > summary(lm(y ~ poly(x, 2, raw = TRUE), data = Polyreg))

yˆ = 2.19826629 − 0.02252236 x + 0.00012507 x 2 Call: • Many problems may involve qualitative or categorical
lm(formula = y ~ poly(x, 2, raw = TRUE), data = Polyreg)
variables.
Conclusions: The test for significance of regression is shown in
Residuals: • The usual method for the different levels of a qualitative
Table 12-13. Since f0 = 1762.3 is significant at 1%, we conclude that at Min 1Q Median 3Q Max
least one of the parameters b1 and b11 is not zero. Furthermore, the -0.0174763 -0.0065087 0.0001297 0.0071482 0.0151887 variable is to use indicator variables.
standard tests for model adequacy do not reveal any unusual behavior, • For example, to introduce the effect of two different operators
Coefficients:
and we would conclude that this is a reasonable model for the sidewall Estimate Std. Error t value Pr(>|t|) into a regression model, we could define an indicator variable as
(Intercept) 2.198266288 0.022549823 97.48 6.38e-15 *** follows:
panel cost data. poly(x, 2, raw = TRUE)1 -0.022522358 0.000942435 -23.90 1.88e-09 ***
poly(x, 2, raw = TRUE)2 0.000125065 0.000008658 14.45 1.56e-07 ***
TABLE • 12-9 Test for Significance of Regression for the Second-Order Model in ---
0 if the observation is from operator 1
Example 12-12 Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
x=
Source of
Residual standard error: 0.01219 on 9 degrees of freedom 1 if the observation is from operator 2
Multiple R-squared: 0.9975, Adjusted R-squared: 0.9969
Variation Sum of Squares Degrees of Freedom Mean Square f0 P-value F-statistic: 1767 on 2 and 9 DF, p-value: 2.096e-12
Regression 0.52516 2 0.26258 1762.28 2.12E-12
Error 0.00134 9 0.00015
Total 0.5265 11
Sec 12-6 Aspects of Multiple Regression Modeling 84 85 Sec 12-6 Aspects of Multiple Regression Modeling 86
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

12-6: Aspects of Multiple Regression Modeling 12-6: Aspects of Multiple Regression Modeling 12-6: Aspects of Multiple Regression Modeling
EXAMPLE 12-13 Surface Finish A mechanical engineer is Example 12-13 Example 12-13
investigating the surface finish of metal parts produced on a The parameters in this model may be easily interpreted. If x2 = 0, the
TABLE • 12-11 Surface Finish Data for Example 12-13
lathe and its relationship to the speed (in revolutions per minute) model becomes
Observation Surface RPM Type of Observation Surface RPM Type of
of the lathe. The data are shown in Table 12-11. Note that the Number, i Finish yi Cutting Tool Number, i Finish yi Cutting
data have been collected using two different types of cutting Tool Y = b0 + b1x1 + 
tools. Since the type of cutting tool likely affects the surface 1 45.44 225 302 11 33.50 224 416

finish, we will fit the model 2 42.03 200 302 12 31.23 212 416 which is a straight-line model with slope b1 and intercept b0. However, if
3 50.10 250 302 13 37.52 248 416
x2 = 1, the model becomes
Y = b 0 + b 1x1 + b 2x2 +  4 48.75 245 302 14 37.13 260 416
Y = b0 + b1x1 + b2(1) +  = (b0 + b2) + b1x1 + 
5 47.92 235 302 15 34.70 243 416
where Y is the surface finish, x1 is the lathe speed in revolutions which is a straight-line model with slope b1 and intercept b0 + b2. Thus,
6 47.79 237 302 16 33.92 238 416
per minute, and x2 is an indicator variable denoting the type of the model Y = b0 + b1x + b2x2 +  implies that surface finish is linearly
7 52.26 265 302 17 32.13 224 416
cutting tool used; that is, related to lathe speed and that the slope b1 docs not depend on the
0, for tool type 302 8 50.52 259 302 18 35.47 251 416
type of cutting tool used. However, the type of cutting tool does affect
x2 =  9 45.58 221 302 19 33.49 232 416 the intercept, and b2 indicates the change in the intercept associated
1, for tool type 416 10 44.78 218 302 20 32.29 216 416 with a change in tool type from 302 to 416.
Sec 12-6 Aspects of Multiple Regression Modeling 87 Sec 12-6 Aspects of Multiple Regression Modeling 88 Sec 12-6 Aspects of Multiple Regression Modeling 89
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

Example 12-13 12-6: Aspects of Multiple Regression Modeling 12-6: Aspects of Multiple Regression Modeling
The model matrix X and y vector for this problem are as follows: Example 12-13 Example 12-13
1 225 0 45.44
 0   The fitted model is
1 200 42.03
1 250 0 50.10  TABLE • 12-12 Analysis of Variance for Example 12-13
   
1
1
245 0 48.75 yˆ = 14.27620 + 0.14115x1 − 13.28020x2 Source of Sum of Degrees of Mean Square f0 P-value
235 0 47.92
    Variation Squares Freedom
1 237 0 47.79
1 265 0 52.26  Conclusions: The analysis of variance for this model is shown in Regression 1012.0595 2 506.0297 1103.69 1.02E-18
   
1 259 0 50.52  Table 12-12. Note that the hypothesis H0: b1 = b2 = 0 (significance of
1 221 0 45.58 regression) would be rejected at any reasonable level of significance SSR(b1|b0) 130.6091 1 130.6091 284.87 4.70E-12
   
1 218 0 44.78 because the P-value is very small. This table also contains the sums of
X=  y= 
1 224 1 33.50  squares SSR(b2|b1,b0) 881.4504 1 881.4504 1922.52 6.24E-19
1 212 1 31.23 
   
1 248 1 37.52 
1 260 1 37.13  SSR = SSR(b1,b2|b0) Error 7.7943 17 0.4585
   
1 243 1 34.70  = SSR(b1|b0) + SSR(b2|b1,b0)
1 238 1 33.92  Total 1019.8538 19
   
1 1 32.13 

224
   so a test of the hypothesis H0: b2 = 0 made. Since this hypothesis is also
1 251 1 35.47  rejected, we conclude that tool type has an effect on surface finish.
1 232 1 33.49 
   
1 216 1 32.29 
Sec 12-6 Aspects of Multiple Regression Modeling 90 Sec 12-6 Aspects of Multiple Regression Modeling 91 Sec 12-6 Aspects of Multiple Regression Modeling 92
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

9
6/28/2020

Logistic regression 12-6: Aspects of Multiple Regression Modeling 12-6: Aspects of Multiple Regression Modeling
12-6.3 Selection of Variables and Model Building
12-6.3 Selection of Variables and Model Building All Possible Regressions
EXAMPLE 12-14 Wine Quality Table 12-13 presents data on taste-testing

SS E ( p )
38 brands of pinot noir wine (the data were first reported in an article by
Kwan, Kowalski, and Skogenboe in an article in the Journal of Agricultural
Cp = − n + 2p (12-33) and Food Chemistry, Vol. 27, 1979, and it also appears as one of the
ˆ 2 default data sets in Minitab). The response variable is y = quality, and we
wish to find the “best” regression equation that relates quality to the other
five parameters.
2 Figure 12-12 is the matrix of scatter plots for the wine quality data, as
n 
n
e 
PRESS =  ( yi − yˆ (i ) ) 2 =   i 
constructed by Minitab. We notice that there are some indications of
possible linear relationships between quality and the regressors, but there
i =1  1 − hii 
Call:
glm(formula = prob ~ conc, family = "binomial", weights = total) (Dispersion parameter for binomial family taken to be 1) i =1 is no obvious visual impression of which regressors would be appropriate.
Table 12-14 lists the all possible regressions output from Minitab. In this
Coefficients: Null deviance: 198.7115 on 8 degrees of freedom
Estimate Std. Error z value Pr(>|z|)
analysis, we asked Minitab to present the best three equations for each
Residual deviance: 8.5568 on 7 degrees of freedom 2 2
(Intercept) 22.708 2.266 10.021 <2e-16 *** subset size. Note that Minitab reports the values of R , Radj , C p , and
AIC: 37.096 S = MS E for each model.
conc -10.662 1.083 -9.849 <2e-16 ***
---
93 Sec 12-6 Aspects of Multiple Regression Modeling 94 Sec 12-6 Aspects of Multiple Regression Modeling 95
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

12-6: Aspects of Multiple Regression Modeling 12-6.3: Selection of Variables and Model Building - Stepwise Regression
12-6: Aspects of Multiple Regression Modeling
12-6.3 Selection of Variables and Model Building TABLE • 12-14 All Possible Regressions Computer Output for the Wine Quality Data Example 12-15
All Possible Regressions – Example 12-14 Best Subsets Regression: Quality versus Clarity, Aroma,  TABLE • 12-15 Stepwise Regression Output for the Wine Quality Data
Response is quality Stepwise Regression; Quality versus Clarity, Aroma, 
O Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15
From Table 12-14 we see that the three-variable equation with C a Response is Quality on 5 predictors, with N = 38
x2 = aroma, x4 = flavor, and x5 = oakiness produces the l F k
Step 1 2 3
a A l i
minimum Cp equation, whereas the four-variable model, which r r B a n Constant 4.941 6.912 6.467
i o o v e Flavor 1.57 1.64 1.20
adds x1 = clarity to the previous three regressors, results in t m d o s T-Value 7.73 8.25 4.36
Vars R-Sq R-Sq (adj) C–p S y a y r
maximum Radj
2 (or minimum MS ).
E 1 62.4 61.4 9.0 1.2712 X
S
P-Value 0.000 0.000 0.000
1 50.0 48.6 23.2 1.4658 X Oakiness -0.54 -0.60
1 30.1 28.2 46.0 1.7335 X T-Value -1.95 -2.28
The three-variable model is 2 66.1 64.2 6.8 1.2242 X X P-Value 0.059 0.029
2 65.9 63.9 7.1 1.2288 X X
yˆ = 6.47 + 0.580x2 + 1.20x4 − 0.602x5 2 63.3 61.2 10.0 1.2733 X X
Aroma
T-Value
0.58
2.21
3 70.4 67.8 3.9 1.1613 X X X
3 68.0 65.2 6.6 1.2068 X X X P-Value 0.034
and the four-variable model is 3
4
66.5
71.5
63.5
68.0
8.4
4.7
1.2357
1.1568 X X
X X
X
X
X
S 1.27 1.22 1.16
R-Sq 62.42 66.11 70.38
yˆ = 4.99 + 1.79x1 + 0.530x2 + 1.26x4 − 0.659x5 4
4
70.5
69.3
66.9
65.6
5.8
7.1
1.1769
1.1996 X
X X
X
X
X
X
X R-Sq(adj) 61.37 64.17 67.76
5 72.1 67.7 6.0 1.1625 X X X X X C–p 9.0 6.8 3.9
Sec 12-6 Aspects of Multiple Regression Modeling 96 Sec 12-6 Aspects of Multiple Regression Modeling 97 Sec 12-6 Aspects of Multiple Regression Modeling 98
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

12-6.3: Selection of Variables and Model Building - Backward Regression 12-6: Aspects of Multiple Regression Modeling 12-6: Aspects of Multiple Regression Modeling
Example 12-15 12-6.4 Multicollinearity 12-6.4 Multicollinearity
TABLE • 12-16 Backward Elimination Output for the Wine Quality Data
Stepwise Regression: Quality versus Clarity, Aroma,  The multiple regression model: Four primary sources of multicollinearity :
Backward elimination. Alpha-to-Remove: 0.1 1. The data collection method employed
Response is Quality on 5 predictors, with N = 38 y is an n × 1 vector of responses,
X is an n × p matrix of the regressor variables, 2. Constraints on the model or in the
Step 1 2 3
Constant 3.997 4.986 6.467 β is a p × 1 vector of unknown constants, population
Clarity 2.3 1.8
and ε is an n × 1 vector of random errors, with ε i ∼ NID(0, σ 2 ). 3. Model specification
T-Value 1.35 1.12
P-Value 0.187 0.269
4. An overdefined model
Aroma 0.48 0.53 0.58 This matrix exist if the regressors ar linearly independent: no
T-Value 1.77 2.00 2.21 column of the X matrix is a linear combination of the others
P-Value 0.086 0.054 0.034
Body 0.27
colums
T-Value 0.82
P-Value 0.418 When there are near - linear dependencies among the regressors, the problem of
Flavor 1.17 1.26 1.20 multicollinearity is said to exist.
T-Value 3.84 4.52 4.36
P-Value 0.001 0.000 0.000
Oakiness -0.68 -0.66 -0.60 The vectors X1 , X2 , . . . , Xp are linearly dependent if there is a set of constants
T-Value -2.52 -2.46 -2.28 t1 , t2 , . . . , tp , not all zero, such that:
P-Value 0.017 0.019 0.029
S 1.16 1.16 1.16
R-Sq 72.06 71.47 70.38
R-Sq(adj) 67.69 68.01 67.76
C-p 6.0 4.7 3.9
Sec 12-6 Aspects of Multiple Regression Modeling 99 Sec 12-6 Aspects of Multiple Regression Modeling 100 Sec 12-6 Aspects of Multiple Regression Modeling 101
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

10
6/28/2020

12-6: Aspects of Multiple Regression Modeling 12-6: Aspects of Multiple Regression Modeling 12-6: Aspects of Multiple Regression Modeling
Effect of Multicollinearity Multicollinearity Diagnostics 12-6.4 Multicollinearity
Examination of the Correlation Matrix
Variance Inflation Factor (VIF)
➢ the least - squares normal equations are:
1
VIF (b j ) = j = 1, 2, , k (12-34)
(1 − R 2j )
➢ the inverse of ( X ′ X ) is

Practical experience indicates that if any of the VIFs exceeds 5 or 10,

➢ the estimates of the regression coefficients are it is an indication that the associated regression coeffi cients are poorly
estimated because of multicollinearity.

If there is strong multicollinearity between x1 and x2 ,

then the correlation coefficient r12 will be large.
large variances and covariances
for the least - squares estimators of
the regression coefficients
Sec 12-6 Aspects of Multiple Regression Modeling 102 Sec 12-6 Aspects of Multiple Regression Modeling 103 Sec 12-6 Aspects of Multiple Regression Modeling 104
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

12-6: Aspects of Multiple Regression Modeling Diagnostics for leverage and influence How-to
12-6.4 Multicollinearity
• focus attention on the diagonal elements hii of the hat matrix H:
The presence of multicollinearity can be detected in several ways. Two of
the more easily understood of these are:

1. The variance inflation factors, defined in equation 12-34, are very useful • large hat diagonals reveal observations that are potentially influential
measures of multicollinearity. The larger the variance inflation factor, the • assume that any observation for which the hat diagonal exceeds twice the
more severe the multicollinearity. Some authors have suggested that if any average 2p/n is remote enough from the rest of the data to be considered a
variance inflation factor exceeds 10, multicollinearity is a problem. Other leverage point
authors consider this value too liberal and suggest that the variance inflation
factors should not exceed 4 or 5. Minitab will calculate the variance inflation
factors. Table 12-4 presents the Minitab multiple regression output for the
wire bond pull strength data. Since both VIF1 and VIF2 are small, there is no
problem with multicollinearity.

2. If the F-test for significance of regression is significant, but tests on the An example of a leverage point An example of an influencial observation
individual regression coefficients are not significant, multicollinearity may be
present.

Sec 12-6 Aspects of Multiple Regression Modeling 105 106 107

Wrap up Important Terms & Concepts of Chapter 12

• Model: fit<-lm()
• Outliers All possible regressions Polynomial regression model
# Assessing Outliers Analysis of variance test in multiple Prediction interval on a future
outlierTest(fit) # Bonferonni p-value for most extreme obs regression observation
qqPlot(fit, main="QQ Plot") #qq plot for studentized resid
Categorical variables PRESS statistic
leveragePlots(fit) # leverage plots
• Non-normality Confidence intervals on the mean Residual analysis & model adequacy
# Normality of Residuals response checking
# qq plot for studentized resid Cp statistic Significance of regression
qqPlot(fit, main="QQ Plot")
# distribution of studentized residuals Extra sum of squares method Stepwise regression & related methods
library(MASS)
sresid <- studres(fit)
Hidden extrapolation Variance Inflation Factor (VIF)
hist(sresid, freq=FALSE, Indicator variables
main="Distribution of Studentized Residuals")
xfit<-seq(min(sresid),max(sresid),length=40) Inference (test & intervals) on individual
yfit<-dnorm(xfit) model parameters
lines(xfit, yfit)
Influential observations
• Multi-collinearity
# Evaluate Collinearity Model parameters & their interpretation
vif(fit) # variance inflation factors in multiple regression
sqrt(vif(fit)) > 2 # problem?
Multicollinearity
Multiple regression
Outliers
108 Chapter 12 Summary 109
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved. Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

QT Chapter 4
No ratings yet
QT Chapter 4
6 pages
MODULE IN STATISTICS Measures Tendency Variability
No ratings yet
MODULE IN STATISTICS Measures Tendency Variability
23 pages
Statistics 131 Worksheet 10: Let X, · · ·, X ∼ U (0, θ), θ > 0. Find unbiased estimators of θ
No ratings yet
Statistics 131 Worksheet 10: Let X, · · ·, X ∼ U (0, θ), θ > 0. Find unbiased estimators of θ
2 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
PPT09 - Simple Linear Regression and Correlation
No ratings yet
PPT09 - Simple Linear Regression and Correlation
53 pages
EC2303 Final Formula Sheet PDF
No ratings yet
EC2303 Final Formula Sheet PDF
8 pages
MULTIVARIATE ANALYSIS Syllabus
No ratings yet
MULTIVARIATE ANALYSIS Syllabus
2 pages
Chap5 - Multivariate Regression and Linear Model
No ratings yet
Chap5 - Multivariate Regression and Linear Model
16 pages
Multivariate Material
No ratings yet
Multivariate Material
58 pages
A Brief Overview of The Classical Linear Regression Model: Introductory Econometrics For Finance' © Chris Brooks 2013 1
No ratings yet
A Brief Overview of The Classical Linear Regression Model: Introductory Econometrics For Finance' © Chris Brooks 2013 1
80 pages
Sampling Notes Part 01
No ratings yet
Sampling Notes Part 01
13 pages
Estimation of Parameter
No ratings yet
Estimation of Parameter
19 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
33 pages
Basics of Multivariate Normal
No ratings yet
Basics of Multivariate Normal
46 pages
Linear Combination of Random Variables: E (X) and Var (X) of Modified Random Variable
No ratings yet
Linear Combination of Random Variables: E (X) and Var (X) of Modified Random Variable
2 pages
One Sample Hypothesis Testing
No ratings yet
One Sample Hypothesis Testing
9 pages
Linear Regression Analysis For STARDEX: Trend Calculation
No ratings yet
Linear Regression Analysis For STARDEX: Trend Calculation
6 pages
Standard Deviation Grouped Data
100% (1)
Standard Deviation Grouped Data
5 pages
Chapter 9. Test of Hypotheses For A Single Sample
No ratings yet
Chapter 9. Test of Hypotheses For A Single Sample
98 pages
Linear Algebra
No ratings yet
Linear Algebra
13 pages
Chapter 9
No ratings yet
Chapter 9
23 pages
Lecture - 8 MLR
No ratings yet
Lecture - 8 MLR
63 pages
CH 13
No ratings yet
CH 13
123 pages
Time Series Characteristic
No ratings yet
Time Series Characteristic
72 pages
SPSS Multiple Linear Regression
No ratings yet
SPSS Multiple Linear Regression
55 pages
Free Online Course On PLS-SEM Using SmartPLS 3.0 - Moderator and MGA
No ratings yet
Free Online Course On PLS-SEM Using SmartPLS 3.0 - Moderator and MGA
31 pages
Chap2 Multivariate Normal and Related Distributions
No ratings yet
Chap2 Multivariate Normal and Related Distributions
18 pages
Oneway
No ratings yet
Oneway
37 pages
Multiple Linear Regression
100% (1)
Multiple Linear Regression
33 pages
Qualitative Response Regression Models
No ratings yet
Qualitative Response Regression Models
6 pages
Properties of The Normal and Multivariate Normal Distributions
No ratings yet
Properties of The Normal and Multivariate Normal Distributions
2 pages
Measures of Location
No ratings yet
Measures of Location
41 pages
Linear Regression Model
No ratings yet
Linear Regression Model
3 pages
3 The Rao-Blackwell Theorem: 3.1 Mean Squared Error
No ratings yet
3 The Rao-Blackwell Theorem: 3.1 Mean Squared Error
2 pages
Multiple Regression
100% (1)
Multiple Regression
30 pages
Chapter 1 Basic Definitions of Stochastic Process, Kolmogorov Consistency Theorem (Lecture On 01-05-2021) - STAT 243 - Stochastic Process
No ratings yet
Chapter 1 Basic Definitions of Stochastic Process, Kolmogorov Consistency Theorem (Lecture On 01-05-2021) - STAT 243 - Stochastic Process
5 pages
Signed Learning Material No. 4B Data Management
No ratings yet
Signed Learning Material No. 4B Data Management
11 pages
BSTAT HANDOUTS - DESCRIPTIVE ONLY Handouts 3
No ratings yet
BSTAT HANDOUTS - DESCRIPTIVE ONLY Handouts 3
18 pages
K Kiran Kumar IIM Indore
100% (1)
K Kiran Kumar IIM Indore
115 pages
Measures of Variability: Range, Interquartile Range, Variance, and Standard Deviation
No ratings yet
Measures of Variability: Range, Interquartile Range, Variance, and Standard Deviation
13 pages
Chapter1 ALA-handout
No ratings yet
Chapter1 ALA-handout
88 pages
Estimation Methods and Their Properties
100% (1)
Estimation Methods and Their Properties
46 pages
Calculus I
No ratings yet
Calculus I
197 pages
Classes Tutor: Syllabus Statistics For Economics and Business 3 Credit Semester
No ratings yet
Classes Tutor: Syllabus Statistics For Economics and Business 3 Credit Semester
6 pages
Absolute Measure of Dispersion
No ratings yet
Absolute Measure of Dispersion
4 pages
Lecture Note 4 To 7 OLS
No ratings yet
Lecture Note 4 To 7 OLS
29 pages
Course Description SY 2018 2019 - CpE
No ratings yet
Course Description SY 2018 2019 - CpE
15 pages
Multivariate Normal Distribution: 1 Random Vector
No ratings yet
Multivariate Normal Distribution: 1 Random Vector
3 pages
Markov Chain
No ratings yet
Markov Chain
8 pages
ANOVA
No ratings yet
ANOVA
15 pages
Practice Midterm2 Fall2011
No ratings yet
Practice Midterm2 Fall2011
9 pages
Simple Linear Regression Model Ordinary Least Square (OLS) Method
No ratings yet
Simple Linear Regression Model Ordinary Least Square (OLS) Method
18 pages
Bowerman Regression CHPT 1
100% (2)
Bowerman Regression CHPT 1
18 pages
Question and Answers For Pyplots
No ratings yet
Question and Answers For Pyplots
11 pages
Multiple Linear Regression
100% (1)
Multiple Linear Regression
14 pages
CALCULATING Standard Deviation
No ratings yet
CALCULATING Standard Deviation
4 pages
Example of Two Group Discriminant Analysis
No ratings yet
Example of Two Group Discriminant Analysis
7 pages
DATAENG Lesson 11 Multiple Linear Regression
No ratings yet
DATAENG Lesson 11 Multiple Linear Regression
71 pages
Lesson 10 Simple Linear Regression and Correlation
No ratings yet
Lesson 10 Simple Linear Regression and Correlation
70 pages
Mastering Dynamic Programming in Java
From Everand
Mastering Dynamic Programming in Java
Ed A Norex
No ratings yet
Decision Making (Process & Technique)
100% (10)
Decision Making (Process & Technique)
21 pages
REVISED - BRM Course Outline SOBD
No ratings yet
REVISED - BRM Course Outline SOBD
7 pages
Example 1stst
No ratings yet
Example 1stst
10 pages
A Novel Approach To The Gas-Lift Allocation Optimization Problem
No ratings yet
A Novel Approach To The Gas-Lift Allocation Optimization Problem
11 pages
MGMT 432 02 Final - Exam - Review - Outline - Fall 2024
No ratings yet
MGMT 432 02 Final - Exam - Review - Outline - Fall 2024
12 pages
MNC Annual Report Bolinto Tabin
No ratings yet
MNC Annual Report Bolinto Tabin
5 pages
Power and Group Work in Physical Education: A Foucauldian Perspective
No ratings yet
Power and Group Work in Physical Education: A Foucauldian Perspective
15 pages
Introduction To Academic Writing-Course Outline-Fall2022
No ratings yet
Introduction To Academic Writing-Course Outline-Fall2022
4 pages
Wireless Transmission of Electricity
No ratings yet
Wireless Transmission of Electricity
8 pages
2.4 Comparison - of - Bearing - Capacity - Calculati
No ratings yet
2.4 Comparison - of - Bearing - Capacity - Calculati
18 pages
SF-01-Researched A Proposed Textbook On Facilities Planning With Feasibility Study
No ratings yet
SF-01-Researched A Proposed Textbook On Facilities Planning With Feasibility Study
9 pages
CBR Statdas
No ratings yet
CBR Statdas
20 pages
IJPER-Review Article
No ratings yet
IJPER-Review Article
11 pages
Private Sector Engagement Workshop: Group Presentation - Group A
No ratings yet
Private Sector Engagement Workshop: Group Presentation - Group A
10 pages
Bannon, Ford, Meltzer - Unknown - November 2011 The Cpa Journal
0% (1)
Bannon, Ford, Meltzer - Unknown - November 2011 The Cpa Journal
1 page
Calling and Wellbeing
No ratings yet
Calling and Wellbeing
11 pages
Project Report On Customer Service
No ratings yet
Project Report On Customer Service
13 pages
Booth, C. (2011) - Reflective Teaching, Effective Learning
No ratings yet
Booth, C. (2011) - Reflective Teaching, Effective Learning
208 pages
Ratnasari 2021
No ratings yet
Ratnasari 2021
11 pages
CBRC My Review Prof Ed 2.5
No ratings yet
CBRC My Review Prof Ed 2.5
3 pages
Jurnal Berkaitan DG Lampu UV
No ratings yet
Jurnal Berkaitan DG Lampu UV
6 pages
57th Batch BARC Trainees Magazine Kaarvaan PDF
No ratings yet
57th Batch BARC Trainees Magazine Kaarvaan PDF
108 pages
Siddharth Jain DMBA
No ratings yet
Siddharth Jain DMBA
55 pages
Article1437574026 - Hamadneh and Al - Masaeed
No ratings yet
Article1437574026 - Hamadneh and Al - Masaeed
7 pages
A Continuum of Play Based Learning The Role of The Teacher in Play Based Pedagogy and The Fear of Hijacking Play
No ratings yet
A Continuum of Play Based Learning The Role of The Teacher in Play Based Pedagogy and The Fear of Hijacking Play
17 pages
Job and Career Satisfaction of Foreign
No ratings yet
Job and Career Satisfaction of Foreign
1 page
Guidelines For Admission To International Doctoral Program in Informatics Graduate School of Informatics Kyoto University 2012
No ratings yet
Guidelines For Admission To International Doctoral Program in Informatics Graduate School of Informatics Kyoto University 2012
5 pages
Assessment of The M&E System For iCCM in Ethiopia - Dereje Et Al. (Ethiopian Medical Journal, 2014)
No ratings yet
Assessment of The M&E System For iCCM in Ethiopia - Dereje Et Al. (Ethiopian Medical Journal, 2014)
10 pages
BRMM 575 Chapter 2
No ratings yet
BRMM 575 Chapter 2
5 pages
Human Resourse Planing
No ratings yet
Human Resourse Planing
49 pages

Ch12 - Multiple Linear Regression

Uploaded by

Ch12 - Multiple Linear Regression

Uploaded by

6/28/2020

Learning Objectives for Chapter 12

12-1.1 Introduction 12-1.1 Introduction 12-1.1 Introduction

These displays can be pairs.panels(ex12,

the relationships hist.col = "#00AFBB",

relationship between n = 25,  yi = 725.82,  xi1 = 206,  xi 2 = 8, 294

12-1: Multiple Linear Regression Model R output R output

cov(bˆ i , bˆ j ) =  2 Cij , i j Estimate Std. Error t value Pr(>|t|) Residuals:

cov(bˆ ) =  2 ( XX) −1 =  2C Residual standard error: 2.288 on 22 degrees of freedom

Example 12-3 Example 12-3 Example 12-3

R2 and Adjusted R2 R2 and Adjusted R2 12-2.2 Tests on Individual Regression Coefficients

The hypotheses for testing the significance of any

1. The relationship between the response y and the regressors is

linear, at least approximately.

4. The errors are uncorrelated.

5. The errors are normally distributed.

12-5: Model Adequacy Checking 12-5: Model Adequacy Checking

12-5: Model Adequacy Checking 12-5: Model Adequacy Checking

Figure 12-8 Plot of residuals

Figure 12-7 Plot of residuals against

elements of the hat matrix is

H = X(XX)-1X The variance of the ith residual is

Practical experience indicates that if any of the VIFs exceeds 5 or 10,

If there is strong multicollinearity between x1 and x2 ,

Sec 12-6 Aspects of Multiple Regression Modeling 105 106 107

Wrap up Important Terms & Concepts of Chapter 12

You might also like