Multiple Linear Regression
Multiple Linear Regression
Inputs
CRIM
Inputs
MEDV
CHAS RM
Average
Error
-4.10783E15
Regression Model
Input
Coefficien
tStd. Error
Variable
t
Statistic
s
Intercep
-28.3135
3.2925
-8.5993
t
CRIM
-0.285
0.044
-6.4755
CHAS
3.6893
1.4977
2.4634
RM
8.2114
0.5195
15.805
P-Value
0
0
0.0143
0
CI Lower CI Upper
-34.7929
-0.3716
0.742
7.189
-21.8341
-0.1984
6.6365
9.2338
RSS
Reductio
n
153409.9
3128.863
773.5671
9791.29
>> Write the equation for predicting the median house price from
the predictors in the model.
>> What median house price is predicted for a tract in the Boston area that
does not bound the Charles River, has a crime rate of 0.1, and where the
average number of rooms per house is 6? What is the prediction error?
MEDV= -28.3135+ (-0.285*CRIM) + (3.6893*CHAS) + (8.2114*RM)
MEDV= -28.3135+ (-0.285*0.1) + (3.6893*0) + (8.2114*6)
0.0285+0+49.2684
MEDV=20.9264
Median house price is = 20,926.4
=-28.3135-
>>Correlation table
Which predictors are likely to be measuring the same thing among the 14 predictors? Discuss the
relationships among INDUS, NOX, and TAX.
Correlation values:
Model 1
Total sum of
squared errors
RMS Error
Average Error
4616.353
4.780506
-0.300841374
Model 2
Total sum of
squared errors
RMS Error
Average Error
4686.579
4.81673
-0.22067477
5000
4500
4000
3500
3000
2500
2000
1500
1000
500
0
Cumulative
MEDV when
sorted using
predicted
values
Cumulative
MEDV using
average
0
100
200
# Cases
300
Cumulative
2
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
Series1
5 6 7
Deciles
9 10
Model 3
Total sum of
squared errors RMS Error
4828.086
Average Error
4.888908
-0.18015984
5000
Cumulative
MEDV when
sorted using
predicted
values
Cumulative
4000
3000
2000
Cumulative
MEDV using
average
1000
0
0
100
200
# Cases
300
Series1
5 6 7
Deciles
10
Summary
Comparing Total sum of squared errors, RMS Error and Average Error along
with lift charts we can conclude that model 1 with CRIM, ZN, CHAS, NOX, RM,
DIS, RAD, PTRATIO, B, LSTAT is the best model for predicting Boston housing
prices.
Stepwise Regression
The highest value with Adjusted R2 will be the estimated best model
Model 1:
Total sum of
squared errors
787222.7657
RMS Error
35.12679152
Average
Error
6.62863E-12
Model 2:
Total sum of
squared errors
793812.5792
RMS Error
35.27350767
Average
Error
-2.3941E-12
Model 3:
Total sum of
squared errors
807460.9178
RMS Error
35.57545114
Average
Error
-3.2331E-12
Exhaustive Search
We have 3 models based on the Adjusted R2value:
From the 3 models we need to analyze the Lift charts and the RMS error value and select the best fitting model.
I decided to go with Model2 after considering or evaluating the RMS error on the next analysis questions.
Model 1:
Decile-wise lift chart
(training dataset)
80000
Cumulative
100000
Cumulative FARE
when sorted using
predicted values
60000
40000
Cumulative FARE
using average
20000
0
0
200
400
# Cases
600
800
Total sum
of
squared
errors
RMS Error
787222.8
35.12679
Average
Error
-5.5E-12
2
1.5
1
Series1
0.5
0
1
5 6 7
Deciles
10
Model 2:
Lift chart (training dataset)
120000
80000
60000
40000
20000
0
0
200
400
# Cases
600
800
Cumulative
100000
2
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
Series1
Total sum of
squared
errors
785120.4
RMS Error
35.07986
Average
Error
7.32E-09
5
6
Deciles
10
Model 3:
Decile-wise lift chart (training
dataset)
Cumulative
100000
80000
60000
40000
20000
0
200
400
# Cases
600
800
Total sum
of
squared
errors
120000
2
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
Series1
RMS Error
785001.1 35.07719
Average
Error
2.4E-09
5
6
Deciles
10
factors not available for predicting the average fare from a new airport are:
Slot
Gate
SW
Exhaustive Search
R2 value with 0.7139 is considered to be the
best fit model.
average fare predicted with the given
characteristics for this model is 195.157
With model3,the average fare was 222.14,
and the difference between this model is
26.983, so model considering all the factors
is the best fit.