0% found this document useful (0 votes)
2 views19 pages

Model Building

The document discusses multiple regression model building, focusing on the differences between linear and nonlinear fits, particularly quadratic models. It explains the significance testing of quadratic effects and the implications of collinearity in regression analysis. Examples are provided to illustrate the application of quadratic regression and the detection of collinearity using the Variance Inflationary Factor (VIF).
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views19 pages

Model Building

The document discusses multiple regression model building, focusing on the differences between linear and nonlinear fits, particularly quadratic models. It explains the significance testing of quadratic effects and the implications of collinearity in regression analysis. Examples are provided to illustrate the application of quadratic regression and the detection of collinearity using the Variance Inflationary Factor (VIF).
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Basic Business

Statistics

Multiple Regression Model Building

Chap 15-1
Linear vs. Nonlinear Fit

Y Y

X X

residuals
residuals

X X


Linear fit does not give Nonlinear fit gives
random residuals random residuals
Nonlinear Relationships

The relationship between the dependent


variable and an independent variable may not
be linear
Can review the scatter plot to check for non-
linear relationships
Example: Quadratic model

Yi = β0 + β1X1i + β 2 X + ε i
2
1i

 The second independent variable is the square


of the first variable
Quadratic Regression Model

Model form:

Yi = β0 + β1X1i + β 2 X + ε i2
1i

 where:
β0 = Y intercept
β1 = regression coefficient for linear effect of X on Y
β2 = regression coefficient for quadratic effect on Y
εi = random error in Y for observation i
Quadratic Regression Model

Yi = β0 + β1X1i + β 2 X1i2 + ε i
Quadratic models may be considered when the scatter
plot takes on one of the following shapes:
Y Y Y Y

X1 X1 X1 X1
β1 > 0 β1 > 0 β1 < 0 β1 < 0
β2 > 0 β2 < 0 β2 > 0 β2 < 0
β1 = the coefficient of the linear term
β2 = the coefficient of the squared term
Testing for Significance:
Quadratic Effect

Testing the Quadratic Effect

 Compare quadratic regression equation


Yi = b0 + b1X1i + b 2 X1i2

with the linear regression equation


Yi = b0 + b1X1i
Testing for Significance:
Quadratic Effect
(continued)
Testing the Quadratic Effect
 Consider the quadratic regression equation

Yi = b0 + b1X1i + b 2 X1i2

Hypotheses

H0: β2 = 0 (The quadratic term does not improve the model)


H1: β2 ≠ 0 (The quadratic term improves the model)
Testing for Significance:
Quadratic Effect
(continued)
 Testing the Quadratic Effect
Hypotheses
H0: β2 = 0 (The quadratic term does not improve the model)
H1: β2 ≠ 0 (The quadratic term improves the model)

where:
 The test statistic is
b2 = squared term slope
b2 − β2 coefficient
t STAT =
Sb 2 β2 = hypothesized slope (zero)
Sb = standard error of the slope
d.f. = n − 3 2
Testing for Significance:
Quadratic Effect
(continued)
 Testing the Quadratic Effect

Compare adjusted r2 from simple regression model to


adjusted r2 from the quadratic model

 If adjusted r2 from the quadratic model is larger than


the adjusted r2 from the simple model, then the
quadratic model is likely a better model
Example: Quadratic Model

Filter Purity increases as filter time increases:


Purity Time
3 1
7 2 Purity vs. Time

8 3
100
15 5
22 7 80
33 8
60
40 10
Purity

54 12
40
67 13
70 14 20

78 15
0
85 15
0 5 10 15 20
87 16 Time
99 17
Example: Quadratic Model

(continued)
Simple regression results:
^Y = -11.283 + 5.985 Time

Standard
Coefficients Error t Stat P-value
t statistic and r2 are all high,
Intercept -11.28267 3.46805 -3.25332 0.00691 but the residuals are not
Time 5.98520 0.30966 19.32819 2.078E-10 random:

Time Residual Plot


Regression Statistics
10
R Square 0.96888

Residuals
Adjusted R Square 0.96628 5
Standard Error 6.15997 0
-5 0 5 10 15 20
-10
Time
Example: Quadratic Model in Excel
& Minitab

(continued)
 Quadratic regression results:
^ = 1.539 + 1.565 Time + 0.245 (Time)2
Y
Excel Minitab
Standard The regression equation is
Coefficients Error t Stat P-value Purity = 1.54 + 1.56 Time + 0.245 Time Squared
Intercept 1.53870 2.24465 0.68550 0.50722
Predictor Coef SE Coef T P
Time 1.56496 0.60179 2.60052 0.02467 Constant 1.5390 2.24500 0.69 0.507
Time 1.5650 0.60180 2.60 0.025
Time-squared 0.24516 0.03258 7.52406 1.165E-05
Time Squared 0.24516 0.03258 7.52 0.000

S = 2.59513 R-Sq = 99.5% R-Sq(adj) = 99.4%

The quadratic term is statistically significant (p-value very small)


Example: Quadratic Model in Excel
& Minitab

(continued)
 Quadratic regression results:
^ = 1.539 + 1.565 Time + 0.245 (Time)2
Y

Regression Statistics The regression equation is


Purity = 1.54 + 1.56 Time + 0.245 Time Squared
R Square 0.99494
Adjusted R Predictor Coef SE Coef T P
Square 0.99402 Constant 1.5390 2.24500 0.69 0.507
Time 1.5650 0.60180 2.60 0.025
Standard Error 2.59513 Time Squared 0.24516 0.03258 7.52 0.000

S = 2.59513 R-Sq = 99.5% R-Sq(adj) = 99.4%

The adjusted r2 of the quadratic model is higher than the adjusted r2 of the
simple regression model. The quadratic model explains 99.4% of the
variation in Y.
Example: Quadratic Model Residual
Plots

(continued)
Quadratic regression results:
Y = 1.539 + 1.565 Time + 0.245 (Time)2
Time Residual Plot Time-squared Residual Plot
10 10

Residuals
Residuals

5 5

0 0
0 5 10 15 20 0 100 200 300 400
-5 -5
Time Time-squared

The residuals plotted versus both Time and Time-squared show a random
pattern.
Collinearity
(continued)

Including two highly correlated independent


variables can adversely affect the regression
results
 No new information provided
Some Indications of Strong
Collinearity

Incorrect signs on the coefficients


Large change in the value of a previous
coefficient when a new variable is added to the
model
A previously significant variable becomes
non-significant when a new independent
variable is added
Detecting Collinearity
(Variance Inflationary Factor)

VIFj is used to measure collinearity:

1
VIFj =
1− R j
2

where R2j is the coefficient of determination of


variable Xj with all other X variables

If VIFj > 5, Xj is highly correlated with


the other independent variables
Example: Pie Sales

Pie Price Advertising


Week Sales ($) ($100s)
1 350 5.50 3.3
2 460 7.50 3.3 Recall the multiple regression
3 350 8.00 3.0
equation of chapter 2:
4 430 8.00 4.5
5 350 6.80 3.0
6 380 7.50 4.0
7 430 4.50 3.0 Sales = b0 + b1 (Price)
8 470 6.40 3.7
9 450 7.00 3.5 + b2 (Advertising)
10 490 5.00 4.0
11 340 7.20 3.5
12 300 7.90 3.2
13 440 5.90 4.0
14 450 5.00 3.5
15 300 7.00 2.7
Detecting Collinearity in Excel
using PHStat

PHStat / regression / multiple regression …


Check the “variance inflationary factor (VIF)” box

Regression Analysis
Output for the pie sales example:
Price and all other X
Regression Statistics  VIF is < 5
Multiple R 0.030438  There is no evidence of
R Square 0.000926
Adjusted R
collinearity between Price
Square -0.075925 and Advertising
Standard Error 1.21527
Observations 15
VIF 1.000927

You might also like