ITLS5050 Data Set 2 v7 Multiple Regression
ITLS5050 Data Set 2 v7 Multiple Regression
400
Production
350
300
250
200
45 50 55 60 65 70 75 80
Workers
Scatter plot of machines versus production
500 We can see a strong positive relationship between
machines and production.
We expected this based on the positive correlation
coefficient.
This means we should take machines into account in
450 our model to forecast production.
400
Production
350
300
250
200
15 20 25 30 35 40 45
Machines
Scatter plot of afternoon versus production
500
450
400
Morning and Afternoon shifts
evening shifts Production appears
Production
to be higher in the
afternoon.
350
We should include
this information in
our forecasts.
300
250
200
-0.5 0 0.5 1 1.5
Afternoon
Scatter plot of day of week versus production
500
450
400
Weekdays Weekends
The day of the
Production
300
250
200
-0.5 0 0.5 1 1.5
Day of week
Scatter plot of breakdown versus production
500
450
400
No breakdown Breakdown
There are few
Production
breakdowns so
350 it is hard to see
a pattern.
300
250
200
-0.5 0 0.5 1 1.5
Breakdown
Scatter plot of delivery versus production
500
450
400
No delivery Delivery
Deliveries do not
Production
appear to affect
350 production.
300
250
200
-0.5 0 0.5 1 1.5
Delivery
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.62
R Square 0.39 Variance in workers explains 39% of the variance in production.
Adjusted R Square 0.38
Standard Error 36.3 We expect that 95% of the time production will be within two standard errors
Observations 150.0
ANOVA
df SS MS F Significance F
Regression 1 122248.62 122248.62 92.92 0.00
Residual 148 194706.78 1315.59
Total 149 316955.39
If this is less than 0.05 then we are at least 95% confident that workers does help to explain production in the population.
Upper 95%
128.4 Our forecast is that production will be 75.3 plus 4 times the number of workers on duty.
4.8
than 0.05 then we are at least 95% confident that workers does help to explain production in the population.
e sample, adding one woker added 4 to production.
do not know what the real relation is in the population.
we are 95% sure that adding one worker will add between 3.2 and 4.8 to production.
Regression Statistics
Multiple R 0.94
R Square 0.88 Our model explains 88% of the variance in production. This is much better than
Adjusted R Square 0.88 The Adjusted R Square takes into account the number of variables in our mode
Standard Error 16.0 We expect that 95% of the time production will be within two standard errors
Observations 150.0
ANOVA
df SS MS F Significance F
Regression 3 279810.60 93270.20 366.60 0.00
Residual 146 37144.80 254.42
Total 149 316955.39
RESIDUAL OUTPUT
If this is less than 0.05 then we are at least 95% confident that our model does help to explain production in the population.
Upper 95%
-28.0
3.4 In the sample, adding one worker added 3 units to production, accounting for machines in operation and time of
6.6 In the sample, every extra machine in operation added 6.1 units to production, accounting for the number of wo
33.9 In the sample, production averaged 28.4 units more in the afternoon, holding all else constant.
+ 28.4 × Afternoon
re all less than 0.05 which means that each of the variables in our model help to explain production.
more accurate forecasts.
machines in operation and time of day. We are 95% confident the true impact of an extra worker is between 2.6 and 3.4.
accounting for the number of workers and time of day.
ll else constant.
SUMMARY OUTPUT In this model, we test if the other variables weekend, breakd
Regression Statistics
Multiple R 0.94
R Square 0.89
Adjusted R Square 0.88 Adding the extra variables into the model does not improve our Adjusted R Squ
Standard Error 15.8
Observations 150.0
ANOVA
df SS MS F Significance F
Regression 6 281279.79 46879.96 187.91 0.00
Residual 143 35675.61 249.48
Total 149 316955.39
RESIDUAL OUTPUT
Upper 95%
-27.9
3.4
6.6
33.8
4.9 The P-value for weekend is well above 0.05. The day of the week does not seem to affect production.
-0.6 The P-value is less than 0.05. Breakdowns do seem to affect production when we account for the number of wor
13.5 The P-value for delivery is well above 0.05. Deliveries do not seem to affect production.
We eliminate Weekend and Delivery from our model and run the regression again.
m to affect production.
we account for the number of workers, machines, time of day, etc.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.94
R Square 0.89 Our model predicts 89% of the variance in production.
Adjusted R Square 0.88 The adjusted R Square improved a bit when we removed weekend and delivery
Standard Error 15.8 We expect that 95% of the time production will be within two standard errors
Observations 150.0
ANOVA
df SS MS F Significance F
Regression 4 280870.96 70217.74 282.16 0.00
Residual 145 36084.43 248.86
Total 149 316955.39
RESIDUAL OUTPUT
Overall, our model helps to predict production because the Significance F is less than 0.05.
Upper 95%
-28.0
3.4 In the sample, adding one worker added 3 units to production, accounting for machines in operation, time of da
6.6 In the sample, every extra machine in operation added 6.1 units to production, holding all else constant.
33.6 In the sample, production averaged 28.2 units more in the afternoon, holding all else constant.
-0.5 In the sample, a breakdown occuring was associated with 11.9 units of lost production, holding all else constant.
+ 28.2 × Afternoon - 11.9 × Afternoon breakdown.
s are less than 0.05 so we are confident that each variable helps to explain production.
machines in operation, time of day and breakdowns. We are 95% confident the true impact of an extra worker is between 2.6 and 3.4.
holding all else constant.
ll else constant.
duction, holding all else constant.
etween 2.6 and 3.4.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.942042
R Square 0.887443
Adjusted R 0.88272
Standard E 15.79493
Observatio 150
ANOVA
df SS MS F Significance F
Regression 6 281279.8 46879.96 187.9109 3.14E-65
Residual 143 35675.61 249.4798
Total 149 316955.4
Coefficients
Standard Error t Stat P-value Lower 95%Upper 95% Lower 95.0%
Upper 95.0%
Intercept -53.22231 12.79926 -4.158232 5.5E-05 -78.52251 -27.9221 -78.52251 -27.9221
Workers 3.002343 0.187607 16.00333 1.11E-33 2.631501 3.373185 2.631501 3.373185
Machines 6.102226 0.262839 23.21662 2.31E-50 5.582675 6.621777 5.582675 6.621777
Afternoon 28.41152 2.746312 10.34534 4.51E-19 22.9829 33.84013 22.9829 33.84013
Breakdown-11.95323 5.750932 -2.078486 0.039452 -23.32105 -0.585407 -23.32105 -0.585407
Weekend -0.654579 2.823923 -0.231798 0.817026 -6.236606 4.927448 -6.236606 4.927448
Delivery 5.242324 4.194367 1.249849 0.213396 -3.048649 13.5333 -3.048649 13.5333
Forecasts
This page shows how our forecast for production varies as we build more complex models
We will product production for weekday afternoon shift with 60 workers on duty and 30 production machines in operation wi
Regression Statistics
Multiple R 0.942660065406762
R Square 0.888607998912681
Adjusted R Square 0.883901294641385
Standard Error 15.7629069330634
Observations 149
ANOVA
df SS MS F Significance F
Regression 6 281460.321652915 46910.05 188.7962 4.417661E-65
Residual 142 35282.6313672195 248.4692
Total 148 316742.953020134
Regression Statistics
Multiple R 0.9420418321722
R Square 0.8874428135624
Adjusted R Square 0.8827201344112
Standard Error 15.794928658512
Observations 150
ANOVA
df SS MS F Significance F
Regression 6 281279.786033505 46879.96 187.910884 3.13748343E-65
Residual 143 35675.6072998286 249.4798
Total 149 316955.393333333
Regression Statistics
Multiple R 0.94135698039898
R Square 0.88615296454588
Adjusted R Squar0.88301235667128
Standard Error 15.775238541766
Observations 150
ANOVA
df SS MS F Significance F
Regression 4 280870.961431139 70217.74 282.1597 0.00
Residual 145 36084.4319021947 248.8582
Total 149 316955.393333333