CH 06
CH 06
Chapter 6
I Chapter Outline
Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 53
Multicollinearity
Checking for multicollinearity using spreadsheet software
II Teaching Tips
2. Continuing with the exercise explained in the previous item, you could also play
the following game with your students to illustrate regression. Run a regression
model trying to predict weight in terms of height and/or age. Then, ask for
volunteers from both genders to submit their height and/or age and you will guess
their weight. This is usually a very engaging experience and you can enhance it
by explaining why the predictions that are not very accurate are sometimes the
result of not having a good model. You can also use the example to explain the
need for prediction intervals to cope with the uncertainty in the estimation.
Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 54
6.1
(a) The corresponding multiple regression model is predicted price = b 0 + b1 x
area + b2 x neighborhood rating + b3 x general rating. This is the result from
running the regression:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.90
R Square 0.81
Adjusted R Square 0.77
Standard Error 49.07
Observations 20
ANOVA
df SS MS F Significance F
Regression 3 163167.7802 54389.3 22.59 5.39018E-06
Residual 16 38525.96977 2407.9
Total 19 201693.75
(b) Since the determination coefficient R2 is very close to one, the regression
model is acceptable in general. Also notice that the 95% confidence intervals
for each of the regression coefficients do not contain the value of zero, and so,
we are 95% confident that each of the coefficients is different from zero.
Therefore, this model is recommendable.
(c) Predicted price = -166.69 + 0.09 x area + 39.92 x neighborhood rating + 42.70
x general rating.
(d) Predicted price = -166.69 + 0.09 x 3,000 + 39.92 x 5 + 42.70 x 4 = $473,710.
6.2
(a) We use the following model: predicted taxes = b 0 + b1 x labor hours + b2 x
computer hours. After running the regression, we obtain the following output:
Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 55
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.97
R Square 0.93
Adjusted R Square 0.91
Standard Error 1.07
Observations 10
ANOVA
df SS MS F Significance F
Regression 2 112.38 56.19 49.02 0.00
Residual 7 8.02 1.15
Total 9 120.40
Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 56
(d) The regression equation produced by the model is as follows: predicted taxes
= -101.82 + 2.56 x labor hours + 1.10 x computer hours. From examining the
regression coefficients associated with labor and computer time (2.56 > 1.10),
respectively, it is clear that increasing the field-audit time by one hour would
have a bigger impact on uncovering unpaid taxes than increasing the computer
time by one hour.
6.3
(a) We use the following model: predicted taxes = b0 + b1 x gross income + b2 x
schedule A + b3 x schedule C income + b4 x schedule C % + b5 x home office.
After running the regression model, we obtain the following summary report:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.94
R Square 0.88
Adjusted R Square 0.84
Standard Error 3572.44
Observations 24
ANOVA
df SS MS F Significance F
Regression 5 1653944963.8 330788992.7 25.92 0.00
9 8
Residual 18 229721457.07 12762303.17
Total 23 1883666420.9
6
Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 57
Since the R2 value is so close to one, we can safely say that the model is valid.
However, a few of the coefficients are not statistically significant. In
particular, the 95% intervals for the coefficients corresponding to the variables
Schedule A deductions, Schedule C income, and home office contain the
value of zero, and so, they do not pass the t-student test.
(b) The revised model initially eliminates the three variables with no statistical
significance from the previous model, that is, Schedule A deduction, Schedule
C income, and home office. After computing the corresponding regression, we
discover that the coefficient corresponding to the variable Schedule C % is not
statistically significant. After we also eliminate this variable, we obtain a
model where the intercept is not statistically significant either. The final
model is predicted taxes = 0.31 x gross income. The R 2 value for this model is
0.82, which is close enough to 1 to validate the model. The coefficient
associated with gross income also passes the t-student test.
(c) By looking at the residual plot as shown below, there are no signs of
heteroscedasticity in the model. A histogram of the residuals also shows that
the normality condition is approximately satisfied. A 95% confidence interval
of the coefficient associated with gross income is [0.24, 0.37].
6.4
Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 58
(a) We use the following model: predicted month number of next earthquake
= b0 + b1 x time since most recent earthquake + b 2 x time since second
most recent earthquake. After running the regression model, we obtain the
following report:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.73
R Square 0.53
Adjusted R Square 0.42
Standard Error 16.73
Observations 12
ANOVA
df SS MS F Significance F
Regression 2 2797.75 1398.88 5.00 0.03
Residual 9 2519.16 279.91
Total 11 5316.92
(b) The R2 value of this regression model is 0.53. Taking into consideration
the imperfect nature of earthquake prediction is a rather high value.
(c) By looking at the significance F value, this model is statistically valid at
the 5% and 10% level of significance. However, it is not valid at the 1%
level of significance. The value of R2 is close to 0.5, so that the model is
barely valid. The coefficient associated with the variable time from most
recent earthquake is not statistically significant since its corresponding
95% confidence interval contains the value of zero. The other coefficient
is significant. From the chart below, where we plot residuals versus time
(month of earthquake), there is not clear pattern, so that we conclude that
there is no evidence of auto-correlation in the model.
Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 59
(d) In the revised model, we eliminate the variable corresponding to the time
since the most recent earthquake. The resulting regression formula is
Predicted month number of next earthquake = 170 - 9.7 x time since
second most recent earthquake. The R2 value is 0.51, but the significance
of F value is 0.009, indicating that the model is statistically valid. The
coefficients pass the t-student significance test and there is no evidence of
auto-correlation.
6.5
(a) We use the following regression model: Predicted number of defective
shafts = b0 + b1 x batch size. After running the regression on Excel, we get
the following summary report:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.98
R Square 0.95
Adjusted R Square 0.95
Standard Error 7.56
Observations 30
ANOVA
df SS MS F Significance F
Regression 1 32744.46 32744.46 572.90 0.00
Residual 28 1600.34 57.16
Total 29 34344.80
Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 60
(b) The R2 value is 0.98, which is very close to one, and so, validating the
model. The t-statistics are also large, indicating that the coefficients are
statistically significant. The linear model is a good fit to the data, but is not
the best fit. By looking at the scatter plot of the two variables, as shown
below, there seems to be a quadratic relation between the two variables.
Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 61
6.6
(a) The regression equation produced by Jack's regression model is predicted
sales = 13,707.14 + 37.34 x hops + 1,319.27 x malt + 0.05 x advertising -
63.17 x bitterness + 53.23 x investment.
(b) The degrees of freedom are 50 - 5 -1 = 44. Since this is more than 30, we
use the normal distribution to find a Z-factor of 1.96, corresponding to
95% confidence. The confidence intervals for each coefficient are shown
in the table below.
From the table it follows that the intervals corresponding to the variables hops,
bitterness, and initial investment contain zero, and so, those coefficients
are not statistically significant.
(c) Since the R2 value is 0.88, then we can safely say that the model is valid.
As indicated before, the coefficients for the variable hops, bitterness, and
initial investment are significant. We can eliminate those coefficients to
obtain a more valid model.
Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 62
(d) The new regression model only has the independent variables malt and
annual advertising. The new regression equation is predicted sales =
-14,162 + 1,401.13 x malt + 0.05 x advertising. The R2 value for the new
model is 0.88 and all of the coefficients are significant.
(e) The predictions of the annual sales of each new beer are summarized in
the table below.
(f) Since the amounts given for malt and annual advertising are within the
range of the data used to create the model, I would recommend the
regression model to predict sales of the new beer Final Excalibur. The
actual sales forecast is -14,162 + 1,401.13 x 7 + 0.05 x 150,000 = 3,145.91
thousands of dollars.
6.7
(a) The degrees of freedom are 67 - 4 - 1 = 62.
(b) Since the degrees of freedom are more than 30, then we use the normal
distribution to find the Z-factor, which in this case is 1.96. The
corresponding 95% confidence intervals are shown below.
Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 63
(e) I would look at a plot chart between the residuals as a function of time,
from January 1989 to July 1994. If in resulting chart there is an apparent
pattern, then there exists an auto-correlation problem in the model.
6.8
(a) We use the following model: predicted market value = b0 + b1 x total
assets + b2 x total sales + b3 x number of employees. The following is the
summary output from running the regression model:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.8
R Square 0.6
Adjusted R Square 0.5
Standard Error 637.1
Observations 15
ANOVA
df SS MS F Significance F
Regression 3 6488609.4 2162869.8 5.3 0.0
Residual 11 4464258.2 405841.7
Total 14 10952867.6
Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 64
6.9
(a) For given data x1, …, xn and y1, …, yn, let f(b0,b1) be the residual sum of
squares. Then the gradient of f is
Therefore, by equating the last expression to zero, we get the formula for
b1.
(b) It follows from the following argument:
Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 65
(a) Using the data in the file OILPLUS.XLS, we ran regression in Excel and obtained
the summary report shown below. Based on this report, we obtain predicted
heating oil consumption for next December = 109 - 1.24 x 35.2 = 65,352 gallons.
(A better model can be obtained by using regression to find a formula for oil
consumption in terms of temperature and temperature2).
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.83
R Square 0.69
Adjusted R Square 0.68
Standard Error 13.52
Observations 55
ANOVA
df SS MS F Significance F
Regression 1 21386.84 21386.84 117.02 5.01324E-15
Residual 53 9686.03 182.76
Total 54 31072.87
(b) The forecast based on regression accounts for any trend on the data due to
variations in temperature. It is clear that oil consumption depends on the
temperature, so that it is hard to accept that consumption is always going to be the
same (75.60) regardless of the temperature.
(c) The value of R2 is 0.69, which is close to 1, indicating that the model is
empirically valid, but not by much. Using the standard error we find a 95%
confidence interval for the temperature coefficient, that is [-1.46, -1.01]. Clearly,
since this interval does not contain zero, the coefficient corresponding to
temperature is significant. The scatter plot of the residuals shows more residuals
dispersion for lower temperatures and less dispersion for higher temperature
values, and so there might be a problem of heteroscedasticity.
(d) The R2 value is not very bad, but there may be a better model with a higher value.
I would recommend exploring other independent variables or trying nonlinear
models.
Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 66
EXECUTIVE COMPENSATION
(a) The variables change in stock price and change in sales do not seem to be very
relevant to determine he compensation of a CEO, they may affect bonuses, stock
options, and indirect compensations, but the main portion of the salary is not
probably determined from changes in these two variables. On the other hand, the
model does not consider other variables such as the years of experience prior to
the current position outside and inside the company, education other than having
or not a MBA, knowledge or experience of the industry immediately related to the
company, the average CEO compensation in this industry, etc.
(b) After running the regression in Excel, we obtain the following summary output:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.87
R Square 0.75
Adjusted R Square 0.73
Standard Error 422.40
Observations 50
ANOVA
df SS MS F Significance F
Regression 4 23896133.23 5974033.31 33.48 5.79962E-13
Residual 45 8029082.79 178424.06
Total 49 31925216.02
(c) The R2 value is 0.75, indicating a good model. The coefficients corresponding to
the variables stock change, sales change, and the intercept are not statistically
significant. The following is the correlation matrix:
Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 67
There is high correlation between the independent variables stock change and
years in position, indicating a possible multicollinearity problem.
(d) The best model we found has the variables years in current position and MBA as
independent variables. The R2 of this model is 0.74. Both variables are significant
at the 95%level. Since the correlation between these two variables is small (0.43),
we can safely say that there are not multicollinearity problems. Furthermore, the
residuals follow an approximate normal distribution, and there are not apparent
heteroscedasticity problems. Only the intercept is not statistically significant in
our model. To summarize, we propose the model predicted CEO compensation =
207.5 x years in current position + 307.2 x MBA.
(e) As mentioned before in item 1, there are other factors that are critical in
determining the compensation of a CEO. According to our model presented in the
previous item, having a MBA will represent 307.2 million dollars increment in the
CEO compensation. Therefore, we think that having a MBA has an effect on CEO
compensation.
(a) After analyzing the summary output provided in this case, we notice that the
variables EMPL, total, P25, P35, P45, P55, COMP, NCOMP, and CLI are not
statistically significant. Furthermore, by looking at the correlation matrix
provided below, we notice high correlation between pairs of variables taken from
total, P15, P25, P35, P45, and P55, indicating multicollinearity problems. Also
notice that even though the variable PRICE is statistically significant, it has a very
small correlation with the variable EARN.
EARN SIZE EMPL total P15 P25 P35 P45 P55 INC COMP NCOMP NREST PRICE CLI
EARN 1.00
SIZE 0.44 1.00
EMPL -0.11 0.05 1.00
total 0.59 -0.02 -0.10 1.00
P15 0.63 -0.05 -0.10 0.96 1.00
P25 0.23 -0.08 -0.02 0.58 0.42 1.00
P35 0.63 -0.03 -0.12 0.96 0.98 0.43 1.00
P45 0.63 -0.02 -0.11 0.96 0.98 0.41 0.99 1.00
P55 0.40 0.06 -0.09 0.77 0.68 0.29 0.67 0.65 1.00
INC 0.46 0.18 0.09 0.11 0.15 0.02 0.14 0.14 0.01 1.00
COMP -0.14 -0.17 0.12 -0.14 -0.11 -0.01 -0.12 -0.13 -0.20 -0.08 1.00
NCOMP 0.11 -0.02 0.11 0.07 0.07 0.10 0.07 0.08 0.01 0.17 0.16 1.00
NREST 0.34 -0.10 -0.16 0.05 0.07 0.01 0.10 0.09 -0.02 -0.06 0.11 0.01 1.00
PRICE -0.18 0.07 0.08 0.04 -0.03 0.08 -0.01 -0.01 0.15 0.00 -0.30 -0.20 -0.06 1.00
CLI 0.04 0.05 0.14 0.21 0.21 0.09 0.20 0.23 0.15 0.10 0.02 -0.01 -0.29 0.26 1.00
Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 68
Combining all of these ideas, we end up with a regression model with only four
independent variables: SIZE, P15, INC, and NREST. After running the regression
based on this model, we obtain the following summary report.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.91
R Square 0.83
Adjusted R Square 0.81
Standard Error 39.47
Observations 60
ANOVA
df SS MS F Significance F
Regression 4 406495.50 101623.88 65.25 3.12098E-20
Residual 55 85666.22 1557.57
Total 59 492161.72
Notice that the R2 of this model is 0.83, and all of the coefficients are significant.
There are no multicollinearity problems and no apparent heteroscedasticity
problems.
Using the target performance ratio of 26%, our model would recommend the
opening of two of the three stores that actually attained the target in 1994. Also,
Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 69
our model would not recommend the opening of the stores that did not actually
attain the target performance ratio.
(c) The results of applying our model to the new stores is shown below:
According to this, the recommendation would be to open only the store located in
Toulouse. Notice also that the store located in Dijon has a predicted performance
rate very close to the target, so that we also recommend opening the store in
Dijon.
(d) The relative strengths of our model are that it is a very simply model in the
number of independent variables required to predict operating earnings. It is
statistically sound, based on the data provided until 1994, and it might be
improved by adding the data from 1995. The major weakness of our regression
model is that it underestimates the real performance rate when this rate is close to
the target. It is also subject to the standard criticism for regression models. For
instance, the model could produce lousy predictions if used to extrapolate.
Our analysis in this case considers the separate regression models for the GM data
and IBM data using the 12 independent variables under consideration. The
summary output for the regression model of the GM data is:
SUMMARY OUTPUT
Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 70
Regression Statistics
Multiple R 0.38
R Square 0.15
Adjusted R Square -0.03
Standard Error 0.08
Observations 72
ANOVA
df SS MS F Significance F
Regression 12 0.07 0.01 0.85 0.60
Residual 59 0.41 0.01
Total 71 0.48
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.66
R Square 0.43
Adjusted R Square 0.31
Standard Error 0.06
Observations 72
ANOVA
df SS MS F Significance F
Regression 12 0.18 0.02 3.70 0.00
Residual 59 0.24 0.00
Total 71 0.42
Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 71
We also take into consideration the correlation matrices for both sets of data (not
shown).
Based on our analysis, we conclude that the variables ROE, previous 6-month
return, S.D. of stock returns, and V/MC are statistically significant for at least one
of the two sets of data. The other variables are not significant. Furthermore, there
is high correlation between S.D. of stock returns and V/MC, so that to avoid
multicollinearity and since V/MC has a higher correlation with the dependent
variable returns, we eliminate S.D. of stock returns. Therefore, our final model
only has three independent variables, that is, ROE, previous 6-month return, and
V/MC.
To make the predictions, we use two regression equations, one for each set of
data. For the GM data we use the model predicted return = -0.01 - 0.02 x ROE -
0.06 x 6-month + 0.01 x V/MC. For the IBM data we use the model predicted
return = -0.04 + 0.36 x ROE - 0.06 x 6-month + 0.05 x V/MC.
The final return predictions for the two companies in the requested months are:
GM IBM
Date Return Date Return
960131 -0.53% 960131 4.78%
960228 0.01% 960228 7.38%
960331 0.27% 960331 3.68%
960428 -0.44% 960428 2.68%
960531 -1.09% 960531 4.05%
960630 -0.48% 960630 1.33%
Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.