Dat Sol 2
Dat Sol 2
Dat Sol 2
MULTIPLE REGRESSION
1. A good predictor variable is related to the dependent variable but not too highly related
to
other predictor variables.
3. The net regression coefficient measures the average change in the dependent variable per
unit change in the relevant independent variable, holding the other independent variables
constant.
b. The bottom half of a correlation matrix is the same as the top half.
f. Models that include variables 4 and 6 or variables 2 and 5 should be best. The
predictor variables in these models are related to the dependent variable and not
too highly related to each other.
g. Variable 5.
35
9.
a. Both predictor variables are significantly related to the dependent variable. The
predictor variables are highly correlated to each other indicating potential for
multicollinearity.
b. When income is increased by one thousand dollars holding family size constant,
the
average increase in annual food expenditures is 228 dollars. When family size is
increased by one person holding income constant, the average decrease in annual
food expenditures is 41 dollars. Since family size is positively related to food
expenditures, .737, it doesn’t make sense that a decrease would take place.
11. a. Scatter diagram below. Female drivers indicated by solid circles, male divers by
diamonds.
36
b. The regression equation is: Y = 25.5 - 1.04 X1 + 1.21 X2
For a given age of car, female drivers expect to get about 1.2 more miles
per gallon than male drivers.
d.
37
Outlets 0.739
Auto 0.548 0.670
Income 0.936 0.556 0.281
Analysis of Variance
Source DF SS MS F P
Regression 3 1843.40 614.47 86.32 0.000
Error 7 49.83 7.12
Total 10 1893.23
Source DF Seq SS
Outlets 1 1033.84
Auto 1 9.83
Income 1 799.74
c. The standard error of estimate has been reduced to 2.67 and the R2 has increased
to 97.4%. The estimate made in part (b) should be quite accurate.
d. The number of retail outlets (X3) and personal income (X4) would be included in
the best model. The equation is Y = -4.02690 + .62092X 3 + .43017X4 and would
explain 96.5% of the sales variable variance. The number of automobiles
registered variable would be dropped to eliminate collinearity.
15. a. Scatter plot for cash purchases versus number of itmes (rectangles) and credit card
purchases versus number of itmes (solid circles).
38
b. Minitab regression output follows
Notice that for a given number of items, sales from cash purchases are estimated to be
about $18.60 less than gross sales from credit card purchases.
39
c. The regression in part b is significant. The number of items sold and
whether the purchases were cash or credit card explains approximately
83% of the variation in gross sales. The predictor variable Items is clearly
significant. The coefficient of the dummy variable X2 is significantly
different from 0 at the 10% level but not at the 5% level. From the
residual plots below we see that there are a few large residuals (see, in
particular, cash sales for day 25 and credit card sales for day 1); but
overall, plots do not indicate any serious departures from the usual
regression assumptions.
e. Fitted function in part b is effectively two parallel straight lines given by the
equations:
Cash purchases: Y = 13.61 + 5.99Items – 18.6(1) = -4.98 + 5.99Items
Credit card purchases: Y = 13.61 + 5.99Items
40
Credit card purchases: Y = 10.02 + 6.46Items R2 = 66.0%
Predictions for cash sales and credit card sales will not be too much different
for the two procedures (one prediction equation or two individual equations).
In terms of R2, the single equation model falls between the fits of the separate
models for cash purchases and credit card purchases but closer to the higher
number for cash purchases. For convenience and overall good fit, prefer the
single equation with the dummy variable.
19. Stepwise regression results, with significance level .05 to enter and leave the
regression function, follow.
Step 1
Constant -26.24
X3 31.4
T-Value 3.30
P-Value 0.004
S 14.6
R-Sq 37.71
R-Sq(adj) 34.25
The “best” regression model relates final exam score to the single predictor
variable grade point average.
Predictor R2
Variables
X1 .295
X2 .301
41
X3 .377
X1, X2 .404
X1, X3 .452
X2, X3 .460
X1, X2, X3 .498
The R 2 criterion would suggest using all three predictor variables. However,
the
results in problem 7.17 suggest there is a multicollinearity problem with three
predictors. The best two independent variable model uses predictors X2 and X3.
When this model is fit, X2 is not required. End up with model involving single
predictor X3, the model selected by the stepwise procedure.
CHAPTER 8
42
ANSWERS TO ODD NUMBERED PROBLEMS
1. The residuals are not independent from one observation to the next. So knowledge of
the error in one time period helps an analyst anticipate the error in the next time period.
5. Reject H0 if DW < 1.10; Since 1.0 < 1.1, reject and conclude that the residuals are
positively autocorrelated.
7. Serial correlation can be eliminated by correct specification of the equation (using the
best predictor variables). Using the regression of percentage changes, autoregressive
models, first differencing, and the iterative approach can also eliminate serial
correlation.
Analysis of Variance
SOURCE DF SS MS F p
Regression 2 340.29 170.15 32.87 0.000
Error 13 67.29 5.18
Total 15 407.58
SOURCE DF SEQ SS
PRICE 1 307.49
POP 1 32.81
Using the .05 significance level for a sample of 16 with 2 predictor variables, the
decision rule is:
43
Reject the null hypothesis if the calculated Durbin-Watson statistic is less
than .98. Fail to reject the null hypothesis if the calculated Durbin-Watson
statistic is greater than 1.54. If the calculated Durbin-Watson statistic lies
between .98 and 1.54, the test is inconclusive (reject H0 if DW < .98).
Since the test statistic computed from the sample data is above the critical values
from the table 2.37 > 1.54, fail to reject. Serial correlation was not a problem.
11. Serial correlation is not a problem. However, it is interesting to see whether the students
realize that collinearity is a problem.
Analysis of Variance
Source DF SS MS F P
Regression 3 404256 134752 1.88 0.161
Error 24 1724481 71853
Total 27 2128736
Source DF Seq SS
Use 1 1195
Charge 1 372102
Customer 1 30959
44
Use 0.003284 0.001039 3.16 0.004
Charge 32.7488 0.8472 38.66 0.000
Analysis of Variance
Source DF SS MS F P
Regression 2 76938 38469 774.66 0.000
Error 25 1241 50
Total 27 78180
Source DF Seq SS
Use 1 2734
Charge 1 74204
13. a.
b. No
45
c. Using the exponential model
d.
e. No
46
f. Autocorrelation
g. Yt = 22.6459*(1.08655**t)
15.
ROW C1 C2 C3 C4
1 16.3 0 0 0
2 17.7 1 0 0
3 28.1 0 1 0
4 34.3 0 0 1
47
1996(3rd Qt) Y = 19.3 - 1.43(0) + 11.2(1) + 33.3(0) = 30.5
The model explains 80.1% of the sales variable variance. Autocorrelation does not
appear to be a problem.
Analysis of Variance
Source DF SS MS F P
Regression 1 1164598 1164598 20.27 0.000
Residual Error 18 1034389 57466
Total 19 2198987
involving simple differences are close to the results obtained by the method of
generalized differences in Example 8.5. The estimated slope coefficient is 9.16
versus an estimated slope coefficient of 9.26 obtained with generalized. The
intercept coefficient 149 is also somewhat consistent with the intercept coefficient
54483(1.997) = 163 for the generalized differences procedure. We would
expect the two methods to produce similar results.
48
regression, autoregressive models, and ARIMA models.
d.
e.
May 31 (2004) Y = 421.4 + .85273(2118) = 2227.5 compared to 2150
Aug 31 (2004) Y = 421.4 + .85273(2221) = 2315.3 compared to 2350
Nov 30 (2004) Y = 421.4 + .85273(2422) = 2486.7 compared to 2600
Feb 28 (2005) Y = 421.4 + .85273(3239) = 3183.3 compared to 3400
CHAPTER 9
49
BOX-JENKINS (ARIMA) METHODOLOGY
1. a. 0 .196
b. series is random
b. Yˆ 52 = 76.55 Yˆ 53 = 84.45
c. 75.65
2
3.2
5. a. MA(2)
b. AR(1)
c. ARIMA(1,0,1)
50
If an ARIMA model is fit to the demand data, the autocorrelations and
plots of the original series and the series of first differences, suggest an
ARIMA(0,1,1) model with a constant term might be good starting point. The first
order moving average term is suggested by the significant autocorrelation at lag 1
for the first differenced series.
The least squares estimate of the constant term, .7127, is virtually the same as the
least squares slope coefficient in the straight line fit shown in part a. Also, the
first order moving average coefficient is essentially 1. These two results
are consistent with a straight line time trend regression model for the original
data.
Suppose Yt is demand in time period t. The straight line time trend regression
51
model is: Yt 0 1t t . Thus Yt 1 0 1 (t 1) t 1 and
Yt Yt 1 1 t t 1 . The latter is an ARIMA(0,1,1) model with a constant
term (the slope coefficient in the straight line model) and a first order moving
average coefficient of 1.
There is some residual autocorrelation (particularly at lag 2) for both the straight
line fit and the ARIMA(0,1,1) fit, but the usual residual plots indicate no other
problems.
d. The forecasts for the next four periods from forecast origin t = 52 for the
ARIMA model follow.
9. Since the autocorrelation coefficients trail off and the partial autocorrelation
coefficients drop off to zero after one time lag, an AR(1) model should be
adequate. The best model is
Yˆ t = 109.628 - 0.9377Yt-1
Yˆ 81 = 109.628 - 0.9377Y80
52
Autocorrelation Function for Yt
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
5 10 15 20
53
Partial Autocorrelation Function for Yt
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
5 10 15 20
Number of observations: 80
Residuals: SS = 2325.19 (backforecasts excluded)
MS = 29.81 DF = 78
Modified Box-Pierce (Ljung-Box) Chi-Square statistic
Lag 12 24 36 48
Chi-Square 24.8(DF=11) 39.4(DF=23) 74.0(DF=35) 83.9(DF=47)
95 Percent Limits
Period Forecast Lower Upper Actual
81 29.9234 19.2199 40.6269
82 81.5688 66.8957 96.2419
83 33.1408 15.7088 50.5728
54
The critical chi-square for 11 df's is 19.68. Since the calculated chi-square for
the residual autocorrelations equals 24.8, the model is deemed inadequate.
11. The slow decline in the early, nonseasonal lags indicates the need for regular
differencing.
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
2 12 22
Lag Corr T LBQ Lag Corr T LBQ Lag Corr T LBQ Lag Corr T LBQ
1 0.71 6.92 49.43 8 0.54 2.07 309.89 15 0.40 1.22 506.38 22 0.23 0.64 602.29
2 0.63 4.34 88.66 9 0.50 1.85 337.18 16 0.40 1.20 525.09 23 0.26 0.73 610.88
3 0.63 3.69 128.66 10 0.45 1.61 359.38 17 0.42 1.26 546.38 24 0.42 1.18 634.13
55
Autocorrelation Function for Regular
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
5 15 25
Lag Corr T LBQ Lag Corr T LBQ Lag Corr T LBQ Lag Corr T LBQ
1 -0.35 -3.44 12.19 8 -0.01 -0.11 25.78 15 -0.03 -0.19 90.84 22 -0.13 -0.72 109.87
2 -0.17 -1.49 15.08 9 0.05 0.40 26.06 16 -0.05 -0.28 91.10 23 -0.25 -1.43 118.13
3 0.01 0.07 15.09 10 -0.17 -1.35 29.20 17 0.25 1.47 98.33 24 0.54 2.98 156.06
4 -0.03 -0.23 15.16 11 -0.29 -2.22 38.14 18 -0.24 -1.38 105.13 25 -0.14 -0.71 158.67
The nonseasonal part of the series now seems to be stationary and the peaks at
lags 12 and 24 are apparent. The seasonal autocorrelation coefficients seem to
be decaying slowly.Seasonal differencing is necessary. The autocorrelation
coefficient and partial autocorrelation coefficient plots for the seasonal
differenced data are shown on the next page.
56
Autocorrelation Function for Seasonal
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
5 15 25
Lag Corr T LBQ Lag Corr T LBQ Lag Corr T LBQ Lag Corr T LBQ
1 -0.49 -4.44 20.42 8 0.07 0.48 23.42 15 -0.03 -0.15 61.85 22 0.02 0.12 70.43
2 -0.03 -0.19 20.47 9 0.01 0.08 23.43 16 -0.11 -0.66 63.13 23 -0.02 -0.12 70.48
3 0.04 0.30 20.61 10 -0.07 -0.50 23.89 17 0.21 1.22 67.63 24 0.02 0.10 70.51
4 0.03 0.23 20.70 11 0.27 2.00 31.19 18 -0.13 -0.78 69.58 25 0.03 0.20 70.65
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
5 15 25
57
Concentrating on the nonseasonal lags, the autocorrelation coefficients drop off
after one time lag and the partial autocorrelation coefficients trail off, so an MA(1)
model should be adequate. Concentrating on the seasonal lags (12 and 24), the
autocorrelation coefficients drop off from 12 to 24 and the partial autocorrelation
coefficients trail off, so an MA(1) model should be adequate. The best model
should be an ARIMA(0,1,1)(0,1,1).ARIMA model for Yt
95 Percent Limits
Period Forecast Lower Upper Actual
97 163500 146991 180009
98 158300 141277 175322
99 177084 159562 194606
100 178792 160785 196798
101 188706 170227 207185
102 184846 165907 203785
103 191921 172532 211310
104 188746 168918 208574
105 185194 164936 205451
106 187669 166991 208348
107 188084 166993 209175
108 221521 200025 243016
The critical chi-square for 10 df's is 18.3. Since the calculated chi-square for the
residual autocorrelations equals 3, the model is deemed adequate.
58
13. One question that might arise is should the student use the first 145 observations or
all 150 observations. In the actual circumstance it will not make much difference.
The autocorrelation coefficient plot indicates that the data are nonstationary.
Therefore, the data should be first differenced.
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
5 15 25 35
Lag Corr T LBQ Lag Corr T LBQ Lag Corr T LBQ Lag Corr T LBQ
1 0.53 6.43 42.20 13 0.19 1.13 269.28 25 0.17 0.95 322.51 37 -0.09 -0.49 346.51
2 0.51 4.97 81.55 14 0.25 1.44 279.58 26 0.16 0.89 327.38
3 0.42 3.59 109.07 15 0.13 0.76 282.53 27 0.15 0.84 331.77
4 0.41 3.21 134.96 16 0.20 1.13 289.20 28 0.02 0.13 331.88
5 0.37 2.75 156.75 17 0.21 1.16 296.43 29 0.09 0.48 333.36
6 0.38 2.65 179.11 18 0.20 1.15 303.67 30 0.01 0.05 333.37
7 0.43 2.88 208.20 19 0.10 0.53 305.26 31 0.10 0.56 335.40
8 0.36 2.27 228.45 20 0.14 0.75 308.48 32 0.16 0.87 340.46
9 0.32 1.95 244.61 21 0.04 0.22 308.75 33 0.11 0.60 342.88
10 0.24 1.43 253.73 22 0.11 0.60 310.84 34 0.05 0.26 343.35
11 0.18 1.05 258.87 23 0.10 0.56 312.67 35 -0.08 -0.45 344.76
12 0.16 0.95 263.15 24 0.16 0.86 317.10 36 -0.02 -0.10 344.84
The autocorrelation coefficient and partial autocorrelation coefficient plots for the first
differenced data are also shown on the next page.
59
Autocorrelation Function for Diff.
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
5 15 25 35
Lag Corr T LBQ Lag Corr T LBQ Lag Corr T LBQ Lag Corr T LBQ
1 -0.48 -5.84 34.76 13 -0.03 -0.27 41.25 25 0.03 0.23 70.52 37 -0.09 -0.74 109.44
2 0.07 0.70 35.49 14 0.18 1.76 46.60 26 -0.00 -0.02 70.52
3 -0.07 -0.72 36.27 15 -0.19 -1.80 52.47 27 0.13 1.12 73.43
4 0.02 0.17 36.32 16 0.06 0.57 53.09 28 -0.20 -1.78 80.98
5 -0.04 -0.42 36.60 17 0.01 0.08 53.10 29 0.15 1.29 85.15
6 -0.05 -0.51 37.00 18 0.11 1.08 55.36 30 -0.18 -1.57 91.51
7 0.13 1.33 39.78 19 -0.15 -1.42 59.36 31 0.03 0.29 91.74
It appears that the autocorrelations drop off after lag one and that the partial
autocorrelations trail off to zero. This suggests an MA(1) model. If 145
observations are used, the model is
Yˆ t = Yt-1 - 0.7179t-1
60
The critical chi-square for 23 df's is 36.4 at the 95% significance level. Since the
calculated chi-square for the residual autocorrelations equals 29.5, the model is
deemed adequate.
15.
61
Autocorrelation Function for Price
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
10 20 30
Lag Corr T LBQ Lag Corr T LBQ Lag Corr T LBQ Lag Corr T LBQ
1 0.96 10.56 114.31 10 0.63 1.93 800.80 19 0.29 0.76 1048.79 28 -0.06 -0.16 1068.56
2 0.92 5.98 220.03 11 0.59 1.76 847.89 20 0.24 0.64 1057.31 29 -0.08 -0.22 1069.71
3 0.88 4.50 316.31 12 0.56 1.61 889.73 21 0.19 0.50 1062.65 30 -0.11 -0.29 1071.65
4 0.83 3.69 403.83 13 0.51 1.45 925.67 22 0.14 0.37 1065.69
The autocorrelation coefficient plot indicates that the data are nonstationary.
Therefore, the data should be first differenced.
62
Autocorrelation Function for Diff.
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
5 15 25
Lag Corr T LBQ Lag Corr T LBQ Lag Corr T LBQ Lag Corr T LBQ
1 0.07 0.76 0.59 10 0.03 0.33 6.87 19 0.04 0.40 20.74 28 -0.01 -0.13 27.68
2 0.10 1.08 1.81 11 0.01 0.15 6.90 20 0.05 0.46 21.08 29 0.02 0.14 27.72
The autocorrleations for the first differenced series are random. Box-Jenkins is
not the appropriate technique to forecast this series.
17. The variation in Disney sales increases with the level, so a log transformation
seems appropriate. Let Yt be the natural log of sales and Wt Yt Yt 4 be the
seasonally differenced series. Two ARIMA models that represent the data
reasonably well are given by the representations ARIMA(1,0,0)(0,1,1) 4 and
ARIMA(0,1,1)(0,1,1) 4. The former model contains a constant. The results for
the ARIMA(1,0,0)(0,1,1)4 process are displayed below.
Fitted model: Wt .50Wt 1 .089 t .49 t 4
Final Estimates of Parameters
Type Coef SE Coef T P
AR 1 0.4991 0.1164 4.29 0.000
SMA 4 0.4863 0.1196 4.07 0.000
Constant 0.0886 0.0063 14.07 0.000
63
Forecasts: Date ForecastLnSales ForecastSales
Q4 1995 8.25008 3828
Q1 1996 8.12423 3375
Q2 1996 8.11642 3349
Q3 1996 8.24372 3804
Q4 1996 8.43698 4615
64
CHAPTER 10
1. The Delphi method can be used in any forecasting situation where there is little or no
historical data and there is expert opinion (experience) available. Two examples might
be:
65
CHAPTER 11
1.
a. One response: Forecasts may not be right, but they improve the odds of being close to
right. More importantly, if there are no agreed upon set of forecasts to drive planning,
then different groups may develop own procedures to guide planning with potential
chaos
as the result.
b. One response: Analogy—If you think education is expensive, try ignorance. Having a
good set of forecasts is like walking while looking ahead instead of at your shoes.
Planning without forecasts will lead to inefficient operations, sub optimal returns on
investment, poor customer service, and so forth.
c. One response: Good forecasts require not only good quantitative skills, they also require
an in-depth understanding of the business and, ultimately, good communication skills
to sell forecasts to management.
66