Multiple Regression
Multiple Regression
logistic regression
Yi = β 0 + β1X1i + β 2 X 2i + × × × + β k X ki + ε i
ˆ = b + b X + b X + ××× + b X
Yi 0 1 1i 2 2i k ki
In this chapter we will use Excel or Minitab to obtain the
regression slope coefficients and other regression
summary measures.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 14-4
Multiple Regression Equation
(continued)
Two variable model
Y
Ŷ = b0 + b1X1 + b2 X2
X2
X1
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 14-5
Example:
2 Independent Variables
n A distributor of frozen dessert pies wants to
evaluate factors thought to influence demand
R Square 0.52148
Adjusted R Square 0.44172
Standard Error 47.46341 Sales = 306.526 - 24.975(Price) + 74.131(Advertising)
Observations 15
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Analysis of Variance
Source DF SS MS F P
Regression 2 29460 14730 6.54 0.012
Residual Error 12 27033 2253
Total 14 56493
Check the
“confidence and
prediction interval
estimates” box
Input values
<
Predicted Y value
Confidence interval for the
mean value of Y, given
these X values
New
Obs Fit SE Fit 95% CI 95% PI
ˆ value 1 428.6 17.2 (391.1, 466.1) (318.6, 538.6)
Predicted Y
New
Obs Price Advertising
1 5.50 3.50 Prediction interval
for an individual Y
value, given these X
Input values values
Analysis of Variance
52.1% of the variation in pie
Source DF SS MS F P sales is explained by the
Regression 2 29460 14730 6.54 0.012 variation in price and
Residual Error 12 27033 2253
advertising
Total 14 56493
models
n What is the net effect of adding a new variable?
n We lose a degree of freedom when a new X
variable is added
n Did the new X variable add enough
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Analysis of Variance
Source DF SS MS F P
Regression 2 29460 14730 6.54 0.012
Residual Error 12 27033 2253
Total 14 56493
H0 : β 1 = β 2 = 0 Test Statistic:
H1: β1 and β2 not both zero MSR
FSTAT = = 6.5386
a = .05 MSE
df1= 2 df2 = 12
Decision:
Critical Since FSTAT test statistic is
Value:
in the rejection region (p-
F0.05 = 3.885 value < .05), reject H0
a = .05
Conclusion:
0 F There is evidence that at least one
Do not Reject H0
reject H0 independent variable affects Y
F0.05 = 3.885
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 14-26
Residuals in Multiple Regression
Two variable model
Y Sample
Residual =
Yi observation Ŷ = b0 + b1X1 + b2 X2
<
ei = (Yi – Yi)
<
Yi
x2i
X2
x1i
The best fit equation is found
by minimizing the sum of
X1
squared errors, Se2
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 14-27
Multiple Regression Assumptions
<
ei = (Yi – Yi)
Assumptions:
n The errors are normally distributed
<
n Residuals vs. Yi
n Residuals vs. X1i
n Residuals vs. X2i
n Residuals vs. time (if time series data)
Test Statistic:
bj - 0
t STAT = (df = n – k – 1)
Sb
j
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
b j ± tα / 2 Sb where t has
(n – k – 1) d.f.
j
SSR(X1 | X2)
= SSR (all variables) – SSR(X2)
a = .05, df = 1 and 12
F0.05 = 4.75
(For X1 and X2) (For X2 only)
ANOVA ANOVA
df SS MS df SS
Regression 2 29460.02687 14730.01343 Regression 1 17484.22249
Residual 12 27033.30647 2252.775539 Residual 13 39009.11085
Total 14 56493.33333 Total 14 56493.33333
2
ta = F1,a
Where a = degrees of freedom
Regression Analysis
Coefficients of Partial Determination
Intermediate Calculations
SSR(X1,X2) 29460.02687
SST 56493.33333
SSR(X2) 17484.22249 SSR(X1 | X2) 11975.80438
SSR(X1) 11100.43803 SSR(X2 | X1) 18359.58884
Coefficients
r2 Y1.2 0.307000188
r2 Y2.1 0.404459524
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 14-45
Using Dummy Variables
n Where: y = b 0 + b1 x1 + b 2 x2 + e
n
n
x1 = size of firm
1 if stock company
x2 =
0 otherwise
Dummy-Variable Example
(with 2 Levels)
Ŷ = b0 + b1X1 + b2 X2
Let:
Y = pie sales
X1 = price
X2 = holiday (X2 = 1 if a holiday occurred during the week)
(X2 = 0 if there was no holiday that week)
Different Same
intercept slope
Y (sales)
If H0: β2 = 0 is
b0 + b2 rejected, then
b0 “Holiday” has a
significant effect
on pie sales
Y = house price
X1 = square feet
X2 = 1 if ranch, 0 otherwise
X3 = 1 if split level, 0 otherwise
Ŷ = b0 + b1X1 + b2 X2 + b3 X3
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 14-56
Interpreting the Dummy Variable
Coefficients (with 3 Levels)
Consider the regression equation:
Ŷ = 20.43 + 0.045X1 + 23.53X2 + 18.84X3
For a colonial: X2 = X3 = 0
With the same square feet, a
Ŷ = 20.43 + 0.045X 1 ranch will have an estimated
average price of 23.53
thousand dollars more than a
For a ranch: X2 = 1; X3 = 0
colonial.
Ŷ = 20.43 + 0.045X 1 + 23.53
With the same square feet, a
split-level will have an
For a split level: X2 = 0; X3 = 1
estimated average price of
Ŷ = 20.43 + 0.045X 1 + 18.84 18.84 thousand dollars more
than a colonial.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 14-57
Interaction Between
Independent Variables
n Hypothesizes interaction between pairs of X
variables
n Response to one X variable may vary at different
levels of another X variable
n Ŷ = b0 + b1X1 + b 2 X 2 + b3 X3
= b0 + b1X1 + b 2 X 2 + b3 (X1X2 )
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 14-58
Effect of Interaction
4
X2 = 0:
Y = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1
0
X1
0 0.5 1 1.5
Slopes are different if the effect of X1 on Y depends on X2 value
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 14-60
Significance of Interaction Term
probability of success
Odds ratio =
1- probability of success