0% found this document useful (0 votes)

167 views20 pages

CH 06

Uploaded by

jy z

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

167 views20 pages

CH 06

Uploaded by

jy z

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 20

Instructor’s Manual Chapter 6 52

Chapter 6

I Chapter Outline

6.1 Prediction based on Simple Linear Regression

 The simple linear regression model is introduced, including the assumptions
of independence and normality of the “noise” in the model
 Definition of regression coefficients
 Definition of predicted, observed, and residual values
 Definition of the residual sum of squares
 Statement of the formulas to determine regression coefficients

6.2 Prediction based on Multiple Linear Regression

 Definition of a multiple linear regression model
 Extension of the concepts for simple linear regression to the multiple
regression models

6.3 Using Spreadsheet Software for Linear Regression

 Description of how to use spreadsheet software to organize data in a
regression model, to run the regression, and to view the output

6.4 Interpretation of Computer Output of a Linear Regression Model

 Interpretation and analysis of the following elements in a compute output:
o
Regression coefficients
o
Standard error
o
Degrees of freedom
o
Standard errors of the regression coefficients
o
Confidence intervals for the regression coefficients
o
t-statistics
o
Coefficient of determination R2

6.5 Sample Correlation and R2 in Simple Linear Regression

6.6 Validating the Regression Model

 Review of the main assumptions in linear regression:
o
Linearity
o
Normality of the noise
o
Heteroscedasticity
o
Autocorrelation
 Discussion on how to use spreadsheet software to check for autocorrelation

6.7 Warnings and Issues in Linear Regression Modeling

 Over-specification by the addition of too many independent variables
 Extrapolation beyond the range of the data

Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 53

 Multicollinearity
 Checking for multicollinearity using spreadsheet software

6.8 Regression Modeling Techniques

 Discussion on advanced regression modeling techniques:
o Nonlinear relationships
o The use of “dummy” variables
o Stepwise multiple regression

6.9 Illustration of the Regression Modeling Process

6.10 Summary and Conclusions

II Teaching Tips

1. The correlation coefficient, as well as the determination coefficient, can be

illustrated by using the following experience in class. Perform a quick survey of
your students, including variables such as height, weight, age, number of siblings,
shirt size, gender, position among your siblings (first born, second born…) etc.
Run a correlation of the data and ask the students to interpret the correlation
coefficients from different pairs of variables. Students usually find very
interesting or surprising to discover correlation between these variables
concerning their own data, specially for pairings such as number of siblings and
position among your siblings, gender and shirt size, etc.

2. Continuing with the exercise explained in the previous item, you could also play
the following game with your students to illustrate regression. Run a regression
model trying to predict weight in terms of height and/or age. Then, ask for
volunteers from both genders to submit their height and/or age and you will guess
their weight. This is usually a very engaging experience and you can enhance it
by explaining why the predictions that are not very accurate are sometimes the
result of not having a good model. You can also use the example to explain the
need for prediction intervals to cope with the uncertainty in the estimation.

3. An alternative way to verify that a regression model is valid consists in comparing

the value of the significance F provided by the Excel summary output report. If
this number is below a given significance level, say 1%, then the model is valid at
such level. This is a more precise statement than just looking at the R 2 value.
Also, to test the significance of a coefficient, a similar test can be conducted by
using the P-values provided by the Excel summary output report.

III Answers to Chapter Exercises

6.1
(a) The corresponding multiple regression model is predicted price = b 0 + b1 x
area + b2 x neighborhood rating + b3 x general rating. This is the result from
running the regression:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.90
R Square 0.81
Adjusted R Square 0.77
Standard Error 49.07
Observations 20

ANOVA
df SS MS F Significance F
Regression 3 163167.7802 54389.3 22.59 5.39018E-06
Residual 16 38525.96977 2407.9
Total 19 201693.75

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept -166.69 65.24 -2.56 0.02 -305.00 -28.39
Area 0.09 0.01 7.16 0.00 0.06 0.11
Neighborhood rating 39.92 8.59 4.65 0.00 21.72 58.12
General rating 42.70 9.74 4.39 0.00 22.06 63.33

(b) Since the determination coefficient R2 is very close to one, the regression
model is acceptable in general. Also notice that the 95% confidence intervals
for each of the regression coefficients do not contain the value of zero, and so,
we are 95% confident that each of the coefficients is different from zero.
Therefore, this model is recommendable.
(c) Predicted price = -166.69 + 0.09 x area + 39.92 x neighborhood rating + 42.70
x general rating.
(d) Predicted price = -166.69 + 0.09 x 3,000 + 39.92 x 5 + 42.70 x 4 = $473,710.

6.2
(a) We use the following model: predicted taxes = b 0 + b1 x labor hours + b2 x
computer hours. After running the regression, we obtain the following output:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.97
R Square 0.93
Adjusted R Square 0.91
Standard Error 1.07
Observations 10

ANOVA
df SS MS F Significance F
Regression 2 112.38 56.19 49.02 0.00
Residual 7 8.02 1.15
Total 9 120.40

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept -101.82 13.32 -7.64 0.00 -133.32 -70.32
Labor hours 2.56 0.30 8.45 0.00 1.85 3.28
Computer hours 1.10 0.31 3.51 0.01 0.36 1.84

(b) Using 10 – 2 – 1 = 7 degrees of freedom, we find that the corresponding 95%

t-student value is 2.365. Next, we notice that the absolute value of the t-stat of
each of the regression coefficients is greater than 2.365. Therefore, we
conclude that with a 95% confidence the coefficients are significant.
(c) There is not an apparent evidence of heteroscedasticity from looking at the
scatter-plots of the regression coefficients:

(d) The regression equation produced by the model is as follows: predicted taxes
= -101.82 + 2.56 x labor hours + 1.10 x computer hours. From examining the
regression coefficients associated with labor and computer time (2.56 > 1.10),
respectively, it is clear that increasing the field-audit time by one hour would
have a bigger impact on uncovering unpaid taxes than increasing the computer
time by one hour.

6.3
(a) We use the following model: predicted taxes = b0 + b1 x gross income + b2 x
schedule A + b3 x schedule C income + b4 x schedule C % + b5 x home office.
After running the regression model, we obtain the following summary report:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.94
R Square 0.88
Adjusted R Square 0.84
Standard Error 3572.44
Observations 24

ANOVA
df SS MS F Significance F
Regression 5 1653944963.8 330788992.7 25.92 0.00
9 8
Residual 18 229721457.07 12762303.17
Total 23 1883666420.9
6

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept -8423.69 6237.38 -1.35 0.19 -21527.94 4680.56
Gross Income 0.29 0.03 10.04 0.00 0.23 0.35
Schedule A -0.01 0.16 -0.07 0.94 -0.35 0.33
Deductions

Schedule C Income 0.19 0.17 1.12 0.28 -0.16 0.54

Schedule C % 104.66 43.09 2.43 0.03 14.13 195.20
Home Office -3786.20 1826.50 -2.07 0.05 -7623.53 51.13

Since the R2 value is so close to one, we can safely say that the model is valid.
However, a few of the coefficients are not statistically significant. In
particular, the 95% intervals for the coefficients corresponding to the variables
Schedule A deductions, Schedule C income, and home office contain the
value of zero, and so, they do not pass the t-student test.
(b) The revised model initially eliminates the three variables with no statistical
significance from the previous model, that is, Schedule A deduction, Schedule
C income, and home office. After computing the corresponding regression, we
discover that the coefficient corresponding to the variable Schedule C % is not
statistically significant. After we also eliminate this variable, we obtain a
model where the intercept is not statistically significant either. The final
model is predicted taxes = 0.31 x gross income. The R 2 value for this model is
0.82, which is close enough to 1 to validate the model. The coefficient
associated with gross income also passes the t-student test.
(c) By looking at the residual plot as shown below, there are no signs of
heteroscedasticity in the model. A histogram of the residuals also shows that
the normality condition is approximately satisfied. A 95% confidence interval
of the coefficient associated with gross income is [0.24, 0.37].

(d) The prediction is 0.31 x $130,000 = $40,300 in taxes.

6.4

(a) We use the following model: predicted month number of next earthquake
= b0 + b1 x time since most recent earthquake + b 2 x time since second
most recent earthquake. After running the regression model, we obtain the
following report:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.73
R Square 0.53
Adjusted R Square 0.42
Standard Error 16.73
Observations 12

ANOVA
df SS MS F Significance F
Regression 2 2797.75 1398.88 5.00 0.03
Residual 9 2519.16 279.91
Total 11 5316.92

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 166.92 39.16 4.26 0.00 78.33 255.50
Most recent 1.24 2.12 0.58 0.57 -3.56 6.03
Second most recent -10.08 3.19 -3.16 0.01 -17.30 -2.87

(b) The R2 value of this regression model is 0.53. Taking into consideration
the imperfect nature of earthquake prediction is a rather high value.
(c) By looking at the significance F value, this model is statistically valid at
the 5% and 10% level of significance. However, it is not valid at the 1%
level of significance. The value of R2 is close to 0.5, so that the model is
barely valid. The coefficient associated with the variable time from most
recent earthquake is not statistically significant since its corresponding
95% confidence interval contains the value of zero. The other coefficient
is significant. From the chart below, where we plot residuals versus time
(month of earthquake), there is not clear pattern, so that we conclude that
there is no evidence of auto-correlation in the model.

(d) In the revised model, we eliminate the variable corresponding to the time
since the most recent earthquake. The resulting regression formula is
Predicted month number of next earthquake = 170 - 9.7 x time since
second most recent earthquake. The R2 value is 0.51, but the significance
of F value is 0.009, indicating that the model is statistically valid. The
coefficients pass the t-student significance test and there is no evidence of
auto-correlation.

6.5
(a) We use the following regression model: Predicted number of defective
shafts = b0 + b1 x batch size. After running the regression on Excel, we get
the following summary report:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.98
R Square 0.95
Adjusted R Square 0.95
Standard Error 7.56
Observations 30

ANOVA
df SS MS F Significance F
Regression 1 32744.46 32744.46 572.90 0.00
Residual 28 1600.34 57.16
Total 29 34344.80

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept -47.90 4.11 -11.65 0.00 -56.32 -39.48
Batch size 0.37 0.02 23.94 0.00 0.34 0.40

(b) The R2 value is 0.98, which is very close to one, and so, validating the
model. The t-statistics are also large, indicating that the coefficients are
statistically significant. The linear model is a good fit to the data, but is not
the best fit. By looking at the scatter plot of the two variables, as shown
below, there seems to be a quadratic relation between the two variables.

(c) The residual plot is shown below. There is evidence of heteroscedasticity

in that there is a convex (quadratic) dependency of the residuals on the
batch size.

(d) The revised model is predicted number of defective shafts = b0 + b1 x

batch size + b2 x batch size2. After running the regression, we obtain the
formula Predicted number of defective shafts = 6.9 - 0.12 x batch size +
0.0009 x batch size2. The R2 value is 0.995, the coefficients are all
statistically significant, and there is no indication of heteroscedasticity.

6.6
(a) The regression equation produced by Jack's regression model is predicted
sales = 13,707.14 + 37.34 x hops + 1,319.27 x malt + 0.05 x advertising -
63.17 x bitterness + 53.23 x investment.
(b) The degrees of freedom are 50 - 5 -1 = 44. Since this is more than 30, we
use the normal distribution to find a Z-factor of 1.96, corresponding to
95% confidence. The confidence intervals for each coefficient are shown
in the table below.

Coefficients Standard Error Lower 95% Upper 95%

Intercept -13707.139 1368.409 -16464.985 -10949.292
Hops 37.344 42.497 -48.303 122.992
Malt 1319.270 161.076 994.642 1643.897
Advertising 0.049 0.005 0.040 0.059
Bitterness -63.168 83.109 -230.662 104.327
Investment 53.230 133.142 -215.100 321.561

From the table it follows that the intervals corresponding to the variables hops,
bitterness, and initial investment contain zero, and so, those coefficients
are not statistically significant.
(c) Since the R2 value is 0.88, then we can safely say that the model is valid.
As indicated before, the coefficients for the variable hops, bitterness, and
initial investment are significant. We can eliminate those coefficients to
obtain a more valid model.

(d) The new regression model only has the independent variables malt and
annual advertising. The new regression equation is predicted sales =
-14,162 + 1,401.13 x malt + 0.05 x advertising. The R2 value for the new
model is 0.88 and all of the coefficients are significant.
(e) The predictions of the annual sales of each new beer are summarized in
the table below.

New Beer Malt Advertising Sales Forecast

Great Ale 8.0 150,000.0 4,547.04
Fine Brau 6.0 155,000.0 1,994.78
HBC Porter 8.0 180,000.0 6,047.04
HBC Stout 7.0 150,000.0 3,145.91

(f) Since the amounts given for malt and annual advertising are within the
range of the data used to create the model, I would recommend the
regression model to predict sales of the new beer Final Excalibur. The
actual sales forecast is -14,162 + 1,401.13 x 7 + 0.05 x 150,000 = 3,145.91
thousands of dollars.

6.7
(a) The degrees of freedom are 67 - 4 - 1 = 62.
(b) Since the degrees of freedom are more than 30, then we use the normal
distribution to find the Z-factor, which in this case is 1.96. The
corresponding 95% confidence intervals are shown below.

Coefficient Standard Error Lower limit Upper limit

Intercept -4.015 2.766 -9.43636 1.40636
Money Supply 0.368 0.064 0.24256 0.49344
Lending Rate 0.005 0.049 -0.09104 0.10104
Price Index 0.037 0.009 0.01936 0.05464
Exchange Rate 0.268 1.175 -2.035 2.571

Notice that the confidence intervals corresponding to the intercept, lending

rate, and exchange rate variables contain the value of zero, and so, they are
not statistically significant. The other coefficients, that is money supply
and price index, are relevant for predicting U.S. exports to Singapore.
(c) The third regression is the best overall for several reasons. First, it has the
same R2 value as the other two models, but only comprises two
independent variables, allowing for more simplicity. Second, the third
model eliminates the two variables that are not statistically significant, as
indicated in the previous item. Third, in the third model all of the
coefficients are statistically significant, even the intercept is significant (in
the first and second model, a 95% confidence interval for the intercept will
contain zero).
(d) Predicted U.S. exports to Singapore = -3.423 + 0.361 x 7.3 + 0.037 x 155
= $4.95 billions.

(e) I would look at a plot chart between the residuals as a function of time,
from January 1989 to July 1994. If in resulting chart there is an apparent
pattern, then there exists an auto-correlation problem in the model.

6.8
(a) We use the following model: predicted market value = b0 + b1 x total
assets + b2 x total sales + b3 x number of employees. The following is the
summary output from running the regression model:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.8
R Square 0.6
Adjusted R Square 0.5
Standard Error 637.1
Observations 15

ANOVA
df SS MS F Significance F
Regression 3 6488609.4 2162869.8 5.3 0.0
Residual 11 4464258.2 405841.7
Total 14 10952867.6

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 452.5 242.0 1.9 0.1 -80.1 985.1
Assets 0.0 0.9 0.0 1.0 -1.9 1.9
Sales 0.5 0.7 0.6 0.5 -1.2 2.1
Employees 0.0 0.0 -0.6 0.6 -0.1 0.0

(b) There is evidence of multicollinearity in that none of the coefficients is

statistically significant (all of them fail the t-student significance test). To
detect multicollinearity, we use the correlation matrix:

Market value Assets Sales Employees

Market value 1
Assets 0.76 1
Sales 0.76 0.996 1
Employees 0.72 0.98 0.98 1

Notice that there is strong correlation between the three independent

variables. We can probably eliminate two independent variables.
(c) After experimenting with several combinations of independent variables,
we found that the model with the highest R2 (0.76) is predicted market
value = 493.4 + 0.26 x total sales.
(d) Predicted market value = 493.4 + 0.26 x 3,500 = $1,403.4.

6.9
(a) For given data x1, …, xn and y1, …, yn, let f(b0,b1) be the residual sum of
squares. Then the gradient of f is

It is not difficult to show that the Hessian of f is positive definite, and so

the minimum is achieved when f(b0,b1) = 0. By solving the equation
corresponding to the first coordinate in this system of equations, we get

And so, dividing by n,

By replacing this value of b0 into the equation corresponding to the second

coordinate in the system of equations, we get

Therefore, by equating the last expression to zero, we get the formula for
b1.
(b) It follows from the following argument:

(c) It follows from the following argument:

IV Answers to Chapter Cases

PREDICTING HEATING OIL CONSUMPTION AT OILPLUS

(a) Using the data in the file OILPLUS.XLS, we ran regression in Excel and obtained
the summary report shown below. Based on this report, we obtain predicted
heating oil consumption for next December = 109 - 1.24 x 35.2 = 65,352 gallons.
(A better model can be obtained by using regression to find a formula for oil
consumption in terms of temperature and temperature2).

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.83
R Square 0.69
Adjusted R Square 0.68
Standard Error 13.52
Observations 55

ANOVA
df SS MS F Significance F
Regression 1 21386.84 21386.84 117.02 5.01324E-15
Residual 53 9686.03 182.76
Total 54 31072.87

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 109.00 6.35 17.16 2.68072E-23 96.26 121.74
Temperature -1.24 0.11 -10.82 5.01324E-15 -1.46 -1.01

(b) The forecast based on regression accounts for any trend on the data due to
variations in temperature. It is clear that oil consumption depends on the
temperature, so that it is hard to accept that consumption is always going to be the
same (75.60) regardless of the temperature.

(c) The value of R2 is 0.69, which is close to 1, indicating that the model is
empirically valid, but not by much. Using the standard error we find a 95%
confidence interval for the temperature coefficient, that is [-1.46, -1.01]. Clearly,
since this interval does not contain zero, the coefficient corresponding to
temperature is significant. The scatter plot of the residuals shows more residuals
dispersion for lower temperatures and less dispersion for higher temperature
values, and so there might be a problem of heteroscedasticity.

(d) The R2 value is not very bad, but there may be a better model with a higher value.
I would recommend exploring other independent variables or trying nonlinear
models.

EXECUTIVE COMPENSATION

(a) The variables change in stock price and change in sales do not seem to be very
relevant to determine he compensation of a CEO, they may affect bonuses, stock
options, and indirect compensations, but the main portion of the salary is not
probably determined from changes in these two variables. On the other hand, the
model does not consider other variables such as the years of experience prior to
the current position outside and inside the company, education other than having
or not a MBA, knowledge or experience of the industry immediately related to the
company, the average CEO compensation in this industry, etc.

(b) After running the regression in Excel, we obtain the following summary output:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.87
R Square 0.75
Adjusted R Square 0.73
Standard Error 422.40
Observations 50

ANOVA
df SS MS F Significance F
Regression 4 23896133.23 5974033.31 33.48 5.79962E-13
Residual 45 8029082.79 178424.06
Total 49 31925216.02

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept -1.59 155.91 -0.01 0.99 -315.62 312.43
Years in Position 190.59 33.51 5.69 0.00 123.09 258.09
Stock Change 1.59 2.67 0.59 0.56 -3.80 6.97
Sales Change 1.01 1.32 0.77 0.45 -1.64 3.66
MBA 304.72 139.87 2.18 0.03 23.01 586.43

(c) The R2 value is 0.75, indicating a good model. The coefficients corresponding to
the variables stock change, sales change, and the intercept are not statistically
significant. The following is the correlation matrix:

Compensation Years in Position Stock Change Sales Change MBA

Compensation 1.00
Years in Position 0.85 1.00
Stock Change 0.70 0.78 1.00
Sales Change 0.16 0.12 0.20 1.00
MBA 0.51 0.43 0.36 0.04 1.00

There is high correlation between the independent variables stock change and
years in position, indicating a possible multicollinearity problem.

(d) The best model we found has the variables years in current position and MBA as
independent variables. The R2 of this model is 0.74. Both variables are significant
at the 95%level. Since the correlation between these two variables is small (0.43),
we can safely say that there are not multicollinearity problems. Furthermore, the
residuals follow an approximate normal distribution, and there are not apparent
heteroscedasticity problems. Only the intercept is not statistically significant in
our model. To summarize, we propose the model predicted CEO compensation =
207.5 x years in current position + 307.2 x MBA.

(e) As mentioned before in item 1, there are other factors that are critical in
determining the compensation of a CEO. According to our model presented in the
previous item, having a MBA will represent 307.2 million dollars increment in the
CEO compensation. Therefore, we think that having a MBA has an effect on CEO
compensation.

THE CONSTRUCTION DEPARTMENT AT CROQ’PAIN

(a) After analyzing the summary output provided in this case, we notice that the
variables EMPL, total, P25, P35, P45, P55, COMP, NCOMP, and CLI are not
statistically significant. Furthermore, by looking at the correlation matrix
provided below, we notice high correlation between pairs of variables taken from
total, P15, P25, P35, P45, and P55, indicating multicollinearity problems. Also
notice that even though the variable PRICE is statistically significant, it has a very
small correlation with the variable EARN.

EARN SIZE EMPL total P15 P25 P35 P45 P55 INC COMP NCOMP NREST PRICE CLI
EARN 1.00
SIZE 0.44 1.00
EMPL -0.11 0.05 1.00
total 0.59 -0.02 -0.10 1.00
P15 0.63 -0.05 -0.10 0.96 1.00
P25 0.23 -0.08 -0.02 0.58 0.42 1.00
P35 0.63 -0.03 -0.12 0.96 0.98 0.43 1.00
P45 0.63 -0.02 -0.11 0.96 0.98 0.41 0.99 1.00
P55 0.40 0.06 -0.09 0.77 0.68 0.29 0.67 0.65 1.00
INC 0.46 0.18 0.09 0.11 0.15 0.02 0.14 0.14 0.01 1.00
COMP -0.14 -0.17 0.12 -0.14 -0.11 -0.01 -0.12 -0.13 -0.20 -0.08 1.00
NCOMP 0.11 -0.02 0.11 0.07 0.07 0.10 0.07 0.08 0.01 0.17 0.16 1.00
NREST 0.34 -0.10 -0.16 0.05 0.07 0.01 0.10 0.09 -0.02 -0.06 0.11 0.01 1.00
PRICE -0.18 0.07 0.08 0.04 -0.03 0.08 -0.01 -0.01 0.15 0.00 -0.30 -0.20 -0.06 1.00
CLI 0.04 0.05 0.14 0.21 0.21 0.09 0.20 0.23 0.15 0.10 0.02 -0.01 -0.29 0.26 1.00

Combining all of these ideas, we end up with a regression model with only four
independent variables: SIZE, P15, INC, and NREST. After running the regression
based on this model, we obtain the following summary report.

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.91
R Square 0.83
Adjusted R Square 0.81
Standard Error 39.47
Observations 60

ANOVA
df SS MS F Significance F
Regression 4 406495.50 101623.88 65.25 3.12098E-20
Residual 55 85666.22 1557.57
Total 59 492161.72

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept -399.01 50.63 -7.88 0.00 -500.47 -297.55
SIZE 0.75 0.10 7.73 0.00 0.56 0.95
P15 0.04 0.00 10.10 0.00 0.04 0.05
INC 8.81 1.62 5.44 0.00 5.57 12.06
NREST 1.45 0.23 6.34 0.00 0.99 1.91

Notice that the R2 of this model is 0.83, and all of the coefficients are significant.
There are no multicollinearity problems and no apparent heteroscedasticity
problems.

(b) By testing the model as indicated, we obtain the following result

STOR EARN K Ratio Predicted EARN Predicted Ratio Open?

51 216.3 776.0 27.87% 149.4 19.26% no
52 65.7 647.8 10.14% 135.5 20.91% no
53 67.6 689.8 9.80% 49.6 7.19% no
54 127.9 715.0 17.89% 108.2 15.13% no
55 82.9 650.1 12.76% 26.8 4.12% no
56 -2.9 788.4 -0.37% 70.0 8.88% no
57 247.7 782.0 31.68% 236.3 30.22% yes
58 343.0 1557.8 22.02% 269.9 17.33% no
59 193.1 935.6 20.64% 119.6 12.78% no
60 277.5 688.0 40.34% 253.2 36.81% yes

Using the target performance ratio of 26%, our model would recommend the
opening of two of the three stores that actually attained the target in 1994. Also,

our model would not recommend the opening of the stores that did not actually
attain the target performance ratio.

STORE Predicted EARN K Predicted Ratio Open?

Calais 32.9 660.1 4.99% no
Montchanin 55.5 733.0 7.58% no
Aubusson 75.4 1050.3 7.18% no
Toulouse 352.8 836.0 42.20% yes
Torcy 4.5 783.6 0.58% no
Marseilles-1 85.0 924.8 9.19% no
Marseilles-2 14.6 1089.6 1.34% no
Clermont 44.3 737.7 6.00% no
Montpellier 114.2 584.0 19.55% no
Dijon 173.4 681.0 25.46% no

According to this, the recommendation would be to open only the store located in
Toulouse. Notice also that the store located in Dijon has a predicted performance
rate very close to the target, so that we also recommend opening the store in
Dijon.

(d) The relative strengths of our model are that it is a very simply model in the
number of independent variables required to predict operating earnings. It is
statistically sound, based on the data provided until 1994, and it might be
improved by adding the data from 1995. The major weakness of our regression
model is that it underestimates the real performance rate when this rate is close to
the target. It is also subject to the standard criticism for regression models. For
instance, the model could produce lousy predictions if used to extrapolate.

SLOAN INVESTORS, PART I

Our analysis in this case considers the separate regression models for the GM data
and IBM data using the 12 independent variables under consideration. The
summary output for the regression model of the GM data is:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.38
R Square 0.15
Adjusted R Square -0.03
Standard Error 0.08
Observations 72

ANOVA
df SS MS F Significance F
Regression 12 0.07 0.01 0.85 0.60
Residual 59 0.41 0.01
Total 71 0.48

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 0.09 0.13 0.75 0.46 -0.16 0.35
E/P 0.05 0.06 0.99 0.33 -0.06 0.16
ROE -0.07 0.03 -2.26 0.03 -0.12 -0.01
BV/P -0.01 0.04 -0.24 0.81 -0.09 0.07
CF/P 0.50 0.31 1.62 0.11 -0.12 1.11
1 Month 0.20 0.19 1.10 0.28 -0.17 0.58
2 Month -0.08 0.14 -0.58 0.56 -0.37 0.20
6 Month -0.09 0.08 -1.23 0.22 -0.24 0.06
12 Month -0.02 0.06 -0.26 0.80 -0.14 0.11
S&P -0.17 0.32 -0.54 0.59 -0.80 0.46
SD Returns -1.60 1.25 -1.28 0.21 -4.10 0.90
SD CF/P 0.58 1.36 0.43 0.67 -2.14 3.30
V/MC 0.03 0.03 0.98 0.33 -0.03 0.08

For the IBM data we obtain:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.66
R Square 0.43
Adjusted R Square 0.31
Standard Error 0.06
Observations 72

ANOVA
df SS MS F Significance F
Regression 12 0.18 0.02 3.70 0.00
Residual 59 0.24 0.00
Total 71 0.42

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 0.49 0.15 3.39 0.00 0.20 0.79

E/P -0.48 0.14 -3.39 0.00 -0.76 -0.20

ROE 0.53 0.13 4.25 0.00 0.28 0.78
BV/P -0.45 0.18 -2.52 0.01 -0.81 -0.09
CF/P 0.37 0.47 0.80 0.43 -0.57 1.32
1 Month -0.03 0.14 -0.23 0.82 -0.32 0.26
2 Month -0.11 0.12 -0.91 0.37 -0.36 0.13
6 Month -0.44 0.10 -4.38 0.00 -0.64 -0.24
12 Month 0.17 0.10 1.71 0.09 -0.03 0.36
S&P -0.40 0.25 -1.61 0.11 -0.90 0.10
SD Returns -2.75 0.94 -2.93 0.00 -4.63 -0.87
SD CF/P -5.17 2.74 -1.89 0.06 -10.66 0.31
V/MC 0.06 0.02 2.32 0.02 0.01 0.10

We also take into consideration the correlation matrices for both sets of data (not
shown).

Based on our analysis, we conclude that the variables ROE, previous 6-month
return, S.D. of stock returns, and V/MC are statistically significant for at least one
of the two sets of data. The other variables are not significant. Furthermore, there
is high correlation between S.D. of stock returns and V/MC, so that to avoid
multicollinearity and since V/MC has a higher correlation with the dependent
variable returns, we eliminate S.D. of stock returns. Therefore, our final model
only has three independent variables, that is, ROE, previous 6-month return, and
V/MC.

To make the predictions, we use two regression equations, one for each set of
data. For the GM data we use the model predicted return = -0.01 - 0.02 x ROE -
0.06 x 6-month + 0.01 x V/MC. For the IBM data we use the model predicted
return = -0.04 + 0.36 x ROE - 0.06 x 6-month + 0.05 x V/MC.

The final return predictions for the two companies in the requested months are:

GM IBM
Date Return Date Return
960131 -0.53% 960131 4.78%
960228 0.01% 960228 7.38%
960331 0.27% 960331 3.68%
960428 -0.44% 960428 2.68%
960531 -1.09% 960531 4.05%
960630 -0.48% 960630 1.33%

Assignment Regression 6.5.2025
No ratings yet
Assignment Regression 6.5.2025
3 pages
Test Bank For Business Analytics 3rd Edition by Evans
No ratings yet
Test Bank For Business Analytics 3rd Edition by Evans
28 pages
Exam 1 Quantitative Methods For Management (SM 60.65)
0% (2)
Exam 1 Quantitative Methods For Management (SM 60.65)
7 pages
Assignment 1 - Memo - 2020
100% (2)
Assignment 1 - Memo - 2020
4 pages
Case Study 5
100% (2)
Case Study 5
29 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
Adms 2510
No ratings yet
Adms 2510
10 pages
Stephen G. Powell, Kenneth R. Baker-Management Science - The Art of Modeling With Spreadsheets-Wiley (2013)
No ratings yet
Stephen G. Powell, Kenneth R. Baker-Management Science - The Art of Modeling With Spreadsheets-Wiley (2013)
42 pages
IPandBIP
No ratings yet
IPandBIP
30 pages
ch10 Ism
0% (1)
ch10 Ism
4 pages
Decision-Making Under Conditions of Risk and
No ratings yet
Decision-Making Under Conditions of Risk and
9 pages
Ms 236 N 0
No ratings yet
Ms 236 N 0
63 pages
DS II Mid Term 2017 Solution
No ratings yet
DS II Mid Term 2017 Solution
20 pages
K Kiran Kumar IIM Indore
100% (1)
K Kiran Kumar IIM Indore
115 pages
ADMS 2510 Final Exam W 2005 Solutions
No ratings yet
ADMS 2510 Final Exam W 2005 Solutions
8 pages
Chapter5 New
100% (2)
Chapter5 New
49 pages
Predicting Winnings For NASCAR Drivers
No ratings yet
Predicting Winnings For NASCAR Drivers
7 pages
Linear Programming
No ratings yet
Linear Programming
6 pages
Chapter 2 171
No ratings yet
Chapter 2 171
118 pages
Forecasting Assignment 1
0% (2)
Forecasting Assignment 1
9 pages
Quant Studies Chapter 7
No ratings yet
Quant Studies Chapter 7
14 pages
Logistic+Regression - Done
100% (1)
Logistic+Regression - Done
41 pages
CHPT 12 Homework
100% (1)
CHPT 12 Homework
22 pages
Park 7e SM ch04
No ratings yet
Park 7e SM ch04
16 pages
Square and Renata Ratio Analysis and Valuation
No ratings yet
Square and Renata Ratio Analysis and Valuation
47 pages
2 - Inventory Control
No ratings yet
2 - Inventory Control
60 pages
Untitled
No ratings yet
Untitled
10 pages
Part 1: Building Your Own Binary Classification Model: Data - Final Project
No ratings yet
Part 1: Building Your Own Binary Classification Model: Data - Final Project
9 pages
Case Hytex Catalogs 2012
100% (1)
Case Hytex Catalogs 2012
2 pages
Test Bank Quantitative Analysis Management 9th Edition Barry Render PDF
100% (1)
Test Bank Quantitative Analysis Management 9th Edition Barry Render PDF
31 pages
Chapter 7: Learning Curve Individual Online Study
No ratings yet
Chapter 7: Learning Curve Individual Online Study
5 pages
Reading 2 Time-Series Analysis
No ratings yet
Reading 2 Time-Series Analysis
47 pages
Statistics For Business and Economics,: 11E Anderson/Sweeney/Williams
100% (1)
Statistics For Business and Economics,: 11E Anderson/Sweeney/Williams
57 pages
R35 Capital Budgeting Q Bank
No ratings yet
R35 Capital Budgeting Q Bank
15 pages
Chapter 3 Simplex Method PDF
No ratings yet
Chapter 3 Simplex Method PDF
32 pages
Assignment 01
No ratings yet
Assignment 01
6 pages
Assignment No 1 (Sequence Problem)
No ratings yet
Assignment No 1 (Sequence Problem)
6 pages
DSME3030 Course Outline
No ratings yet
DSME3030 Course Outline
2 pages
Module 3 - Regression
No ratings yet
Module 3 - Regression
55 pages
Statistics of Management
No ratings yet
Statistics of Management
7 pages
Paris: Nadia Syarifah M.Z (A021191023)
No ratings yet
Paris: Nadia Syarifah M.Z (A021191023)
8 pages
Weekly Sales of Hot Pizza Are As Follows:: Week Demand Week Demand Week Demand
No ratings yet
Weekly Sales of Hot Pizza Are As Follows:: Week Demand Week Demand Week Demand
15 pages
Topic 5: Mathematical Programming
No ratings yet
Topic 5: Mathematical Programming
28 pages
Solution To Bonus Problem 3: Telephone Survey: Person Responding Percentage of Daytime Calls Percentage of Evening Calls
No ratings yet
Solution To Bonus Problem 3: Telephone Survey: Person Responding Percentage of Daytime Calls Percentage of Evening Calls
7 pages
Chapter 6
100% (1)
Chapter 6
13 pages
Lansink Appraisals 1
No ratings yet
Lansink Appraisals 1
1 page
BAB210 Assignment3
No ratings yet
BAB210 Assignment3
5 pages
Test Bank 1 - Spreadsheet Modeling
100% (2)
Test Bank 1 - Spreadsheet Modeling
14 pages
Chapter 3 - Introduction To Linear Programming A
No ratings yet
Chapter 3 - Introduction To Linear Programming A
37 pages
Assignment 2: Name: Saif Ali Momin Erp Id: 08003 Course: Business Finance II
No ratings yet
Assignment 2: Name: Saif Ali Momin Erp Id: 08003 Course: Business Finance II
39 pages
Linear Programming Examples Assignment 1
No ratings yet
Linear Programming Examples Assignment 1
5 pages
Simplex Method For Standard Maximization Problem
No ratings yet
Simplex Method For Standard Maximization Problem
6 pages
Macroeconomics Chapter 1
No ratings yet
Macroeconomics Chapter 1
24 pages
CH 8. Linear Programming Applications
No ratings yet
CH 8. Linear Programming Applications
55 pages
Assignment 1faef
50% (2)
Assignment 1faef
2 pages
CH 03
No ratings yet
CH 03
40 pages
Financial
No ratings yet
Financial
323 pages
Solutions For Spreadsheet Modeling and Decision Analysis 9th Edition by Ragsdale
No ratings yet
Solutions For Spreadsheet Modeling and Decision Analysis 9th Edition by Ragsdale
4 pages
Linear Regression
No ratings yet
Linear Regression
8 pages
Da On Regression
No ratings yet
Da On Regression
58 pages
Multiple Regression
No ratings yet
Multiple Regression
25 pages
Chapter 3-Draft.
No ratings yet
Chapter 3-Draft.
7 pages
Combining Paired and Two-Sample Data Using A Permutation Test
No ratings yet
Combining Paired and Two-Sample Data Using A Permutation Test
13 pages
SQQS2013 Individual Assignment 1 Answer
No ratings yet
SQQS2013 Individual Assignment 1 Answer
3 pages
Sample Size Calculator
No ratings yet
Sample Size Calculator
3 pages
Chapter II
No ratings yet
Chapter II
71 pages
Jadad Scale PDF
100% (1)
Jadad Scale PDF
12 pages
Bcom Part 2 Ae Business Statistics e 3041 2021
No ratings yet
Bcom Part 2 Ae Business Statistics e 3041 2021
8 pages
Causal Ordering
No ratings yet
Causal Ordering
7 pages
DAAI - Lecture - 04 - With - Solutions - 10oct22
No ratings yet
DAAI - Lecture - 04 - With - Solutions - 10oct22
84 pages
Sample Practice Final Exam 2004
No ratings yet
Sample Practice Final Exam 2004
13 pages
Ebs 351 - Statistics and Probability II
No ratings yet
Ebs 351 - Statistics and Probability II
7 pages
Book
No ratings yet
Book
14 pages
Automatic Versus Manual Transmissions: Mtcars Dataset Analysis
No ratings yet
Automatic Versus Manual Transmissions: Mtcars Dataset Analysis
4 pages
Ch10 Summary
No ratings yet
Ch10 Summary
5 pages
Jamovi 1
No ratings yet
Jamovi 1
2 pages
1 s2.0 S1877050918315163 Main PDF
No ratings yet
1 s2.0 S1877050918315163 Main PDF
11 pages
Clarkson
100% (1)
Clarkson
4 pages
Panel Data For Learing
100% (2)
Panel Data For Learing
34 pages
Statistics Theoretical
No ratings yet
Statistics Theoretical
17 pages
HCB 0202 Ibs Cat One
No ratings yet
HCB 0202 Ibs Cat One
1 page
STAT400
No ratings yet
STAT400
6 pages
Math 7-Q4-Module-6
No ratings yet
Math 7-Q4-Module-6
16 pages
Module No. 12 Title: Pearson R and Spearman Rho: 1. The Coefficient of Correlation 2. Rank Correlation
100% (1)
Module No. 12 Title: Pearson R and Spearman Rho: 1. The Coefficient of Correlation 2. Rank Correlation
14 pages
Statistics: Statistics Is The Discipline That Concerns The Collection, Organization, Displaying, Analysis
No ratings yet
Statistics: Statistics Is The Discipline That Concerns The Collection, Organization, Displaying, Analysis
21 pages
CA Measure of Central Tendency & Dispersion New
No ratings yet
CA Measure of Central Tendency & Dispersion New
4 pages
Subject: STT 041 Lesson Title: Introduction To Statistics
No ratings yet
Subject: STT 041 Lesson Title: Introduction To Statistics
11 pages
A Simulation Approach On Cronbachs Alpha
No ratings yet
A Simulation Approach On Cronbachs Alpha
5 pages
Chapter 9
No ratings yet
Chapter 9
22 pages
Chap01 Why Study Statistics
No ratings yet
Chap01 Why Study Statistics
13 pages

CH 06

Uploaded by

CH 06

Uploaded by

Instructor’s Manual Chapter 6 52

6.1 Prediction based on Simple Linear Regression

6.2 Prediction based on Multiple Linear Regression

6.3 Using Spreadsheet Software for Linear Regression

6.4 Interpretation of Computer Output of a Linear Regression Model

6.5 Sample Correlation and R2 in Simple Linear Regression

6.6 Validating the Regression Model

6.7 Warnings and Issues in Linear Regression Modeling

6.8 Regression Modeling Techniques

6.9 Illustration of the Regression Modeling Process

6.10 Summary and Conclusions

1. The correlation coefficient, as well as the determination coefficient, can be

3. An alternative way to verify that a regression model is valid consists in comparing

III Answers to Chapter Exercises

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

(b) Using 10 – 2 – 1 = 7 degrees of freedom, we find that the corresponding 95%

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Schedule C Income 0.19 0.17 1.12 0.28 -0.16 0.54

(d) The prediction is 0.31 x $130,000 = $40,300 in taxes.

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

(c) The residual plot is shown below. There is evidence of heteroscedasticity

(d) The revised model is predicted number of defective shafts = b0 + b1 x

Coefficients Standard Error Lower 95% Upper 95%

New Beer Malt Advertising Sales Forecast

Coefficient Standard Error Lower limit Upper limit

Notice that the confidence intervals corresponding to the intercept, lending

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

(b) There is evidence of multicollinearity in that none of the coefficients is

Market value Assets Sales Employees

Notice that there is strong correlation between the three independent

It is not difficult to show that the Hessian of f is positive definite, and so

And so, dividing by n,

By replacing this value of b0 into the equation corresponding to the second

(c) It follows from the following argument:

IV Answers to Chapter Cases

PREDICTING HEATING OIL CONSUMPTION AT OILPLUS

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Compensation Years in Position Stock Change Sales Change MBA

THE CONSTRUCTION DEPARTMENT AT CROQ’PAIN

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

(b) By testing the model as indicated, we obtain the following result

STOR EARN K Ratio Predicted EARN Predicted Ratio Open?

STORE Predicted EARN K Predicted Ratio Open?

SLOAN INVESTORS, PART I

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

For the IBM data we obtain:

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

E/P -0.48 0.14 -3.39 0.00 -0.76 -0.20

You might also like