100% found this document useful (1 vote)

404 views25 pages

Multiple Linear Regression

The document discusses multiple linear regression analysis conducted on housing prices in Boston. It provides details on: - Using a training dataset to build a regression model and validation dataset to evaluate the model's accuracy. - Fitting a multiple linear regression model with median house price (MEDV) as the target variable and crime rate (CRIM), proximity to river (CHAS), and average rooms per house (RM) as predictor variables. - Evaluating three regression models on the validation dataset and determining that model 1, using CRIM, ZN, CHAS, NOX, RM, DIS, RAD, PTRATIO, B, LSTAT as predictors, had the best fit based on total error

Uploaded by

3432meesala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

404 views25 pages

Multiple Linear Regression

Uploaded by

3432meesala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Multiple Linear Regression

By: Shruthi Reddy,Gadampalli

005927160

Traditional vs Validation Data Set

The training dataset is used to train or build a model and to test the accuracy of
the estimated value calculated using trading data; we have to set aside a part of
original data called as validation set data.

>> Fit a multiple linear regression model to the

median house price (MEDV) as a function of CRIM,
CHAS, and RM
Inputs
Inputs
Inputs

Inputs

CRIM

Inputs

MEDV

CHAS RM

Training Data Scoring Summary Report

Total
sum of
RMS
squared
Error
errors
11759.0
4 6.219409

Average
Error
-4.10783E15

Validation Data Scoring Summary Report

Total
sum of
Average
squared RMS
Error
errors Error
7371.00 6.040 0.039038
3
705
718

Regression Model

Input
Coefficien
tStd. Error
Variable
t
Statistic
s
Intercep
-28.3135
3.2925
-8.5993
t
CRIM
-0.285
0.044
-6.4755
CHAS
3.6893
1.4977
2.4634
RM
8.2114
0.5195
15.805

P-Value

0
0
0.0143
0

CI Lower CI Upper

-34.7929
-0.3716
0.742
7.189

-21.8341
-0.1984
6.6365
9.2338

RSS
Reductio
n
153409.9
3128.863
773.5671
9791.29

>> Write the equation for predicting the median house price from
the predictors in the model.

>> What median house price is predicted for a tract in the Boston area that
does not bound the Charles River, has a crime rate of 0.1, and where the
average number of rooms per house is 6? What is the prediction error?
MEDV= -28.3135+ (-0.285*CRIM) + (3.6893*CHAS) + (8.2114*RM)
MEDV= -28.3135+ (-0.285*0.1) + (3.6893*0) + (8.2114*6)
0.0285+0+49.2684
MEDV=20.9264
Median house price is = 20,926.4

=-28.3135-

>>Correlation table
Which predictors are likely to be measuring the same thing among the 14 predictors? Discuss the
relationships among INDUS, NOX, and TAX.

Correlation values:

INDUS & NOX 0.763

AGE & NOX 0.731
TAX & RAD 0.910
TAX & INDUS 0.7208
NOX & DIS -0.769
DIS & AGE -0.747

After considering the highest and lowest correlation, we can eliminate

INDUS, AGE and TAX

Model 1
Total sum of
squared errors

RMS Error

Average Error

4616.353

4.780506

-0.300841374

Model 2
Total sum of
squared errors

RMS Error

Average Error

4686.579

4.81673

-0.22067477

Decile-wise lift chart

(validation dataset)

5000
4500
4000
3500
3000
2500
2000
1500
1000
500
0

Cumulative
MEDV when
sorted using
predicted
values
Cumulative
MEDV using
average
0

100
200
# Cases

300

Decile mean / Global mean

Cumulative

Lift chart (validation

dataset)

2
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0

Series1

5 6 7
Deciles

9 10

Model 3
Total sum of
squared errors RMS Error
4828.086

Average Error

4.888908

-0.18015984

5000
Cumulative
MEDV when
sorted using
predicted
values

Cumulative

4000
3000
2000

Cumulative
MEDV using
average

1000
0
0

100
200
# Cases

300

Decile-wise lift chart

(validation dataset)
Decile mean / Global mean

Lift chart (validation

dataset)
2
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0

Series1

5 6 7
Deciles

Summary

Comparing Total sum of squared errors, RMS Error and Average Error along
with lift charts we can conclude that model 1 with CRIM, ZN, CHAS, NOX, RM,
DIS, RAD, PTRATIO, B, LSTAT is the best model for predicting Boston housing
prices.

correlation table for airfare

Distance is the best predictor for fare

Pivot table with the average fare

Converting Categorical variables (e.g.,

SW) into dummy variables

Stepwise Regression
The highest value with Adjusted R2 will be the estimated best model

Model 1:

Total sum of
squared errors
787222.7657

RMS Error
35.12679152

Average
Error
6.62863E-12

Model 2:

Total sum of
squared errors
793812.5792

RMS Error
35.27350767

Average
Error
-2.3941E-12

Model 3:

Total sum of
squared errors
807460.9178

RMS Error
35.57545114

Average
Error
-3.2331E-12

Exhaustive Search
We have 3 models based on the Adjusted R2value:
From the 3 models we need to analyze the Lift charts and the RMS error value and select the best fitting model.

I decided to go with Model2 after considering or evaluating the RMS error on the next analysis questions.

Model 1:
Decile-wise lift chart
(training dataset)

Lift chart (training dataset)

120000

80000

Decile mean / Global mean

Cumulative

100000
Cumulative FARE
when sorted using
predicted values

60000
40000

Cumulative FARE
using average

20000
0
0

200

400
# Cases

600

800

Total sum
of
squared
errors

RMS Error

787222.8

35.12679

Average
Error

-5.5E-12

2
1.5
1
Series1

0.5
0
1

5 6 7
Deciles

Model 2:
Lift chart (training dataset)

Decile-wise lift chart (training

dataset)

120000

80000

Cumulative FARE when

sorted using predicted
values

60000
40000

Cumulative FARE using

average

20000
0
0

200

400
# Cases

600

800

Decile mean / Global mean

Cumulative

100000

2
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0

Series1

Total sum of
squared
errors

785120.4

RMS Error

35.07986

Average
Error

7.32E-09

5
6
Deciles

Model 3:
Decile-wise lift chart (training
dataset)

Lift chart (training dataset)

Cumulative

100000
80000

Cumulative FARE when

sorted using predicted
values

60000

40000

Cumulative FARE using

average

20000
0

200

400
# Cases

600

800

Total sum
of
squared
errors

Decile mean / Global mean

120000

2
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0

Series1

RMS Error

785001.1 35.07719

Average
Error

2.4E-09

5
6
Deciles

Average fare on route by the given

characteristics
Using the Formula
y=_0+_1 x_1+_2 x_2+_3 x_3++_k x_k
We are calculating the average route on fare for the values given in question:
17.9917+(NEW*2.4219)+(HI*0.0083)+(S_Income*0.0012)+(E_Income*0.0014)+(S_POP*0)+(E_POP*0)+(Distance*0.075
8)+(PAX*-0.0009)+(Vacation*-35.7079)+(SW_ORD*-41.074)+(Slot_ORD*-16.3576)+(Gate_ORD*20.6157)
Y = 17.9917+(3*2.4219)+(4442.141*0.0083)+(28,760*0.0012)+(27,664*0.0014)+(4,557,004*0)+(3,195,503*0)+(1976*
0.0758)+(12782*-0.0009)+(0*-35.7079)+(0*-41.074)+(1*-16.3576)+(1*-20.6157)
= 222.14

If Southwest decides to cover this route:

We replace the co-efficient value of SW as 1 if Southwest decides to cover the
route,
Y = 17.9917+(3*2.4219)+(4442.141*0.0083)+(28,760*0.0012)+(27,664*0.0014)+(4,557,004*0)+(3,19
5,503*0)+(1976*0.0758)+(12782*-0.0009)+(0*-35.7079)+(1*-41.074)+(1*16.3576)+(1*-20.6157)
Then the average fare if Southwest decides to cover the route would be
181.067.

factors not available for predicting the average fare from a new airport are:
Slot
Gate
SW

Exhaustive Search
R2 value with 0.7139 is considered to be the
best fit model.
average fare predicted with the given
characteristics for this model is 195.157
With model3,the average fare was 222.14,
and the difference between this model is
26.983, so model considering all the factors
is the best fit.

Quantitative Research Methodology
97% (32)
Quantitative Research Methodology
8 pages
Understanding The World of International Luxury Brands - The 'Dream Formula' - Dubois, Paternault 1995
50% (2)
Understanding The World of International Luxury Brands - The 'Dream Formula' - Dubois, Paternault 1995
8 pages
Exponential and Logarithmic Functions
100% (1)
Exponential and Logarithmic Functions
25 pages
Amc Handnotes
No ratings yet
Amc Handnotes
64 pages
GATE Industrial Engineering Book
100% (1)
GATE Industrial Engineering Book
12 pages
Polynomial and Rational Functions
100% (1)
Polynomial and Rational Functions
66 pages
Branching Processes
100% (1)
Branching Processes
15 pages
Trigonometry PDF
100% (1)
Trigonometry PDF
4 pages
Divisors and Divisibility Overview
No ratings yet
Divisors and Divisibility Overview
3 pages
7counting The Number of Occurences of An Outcome in An Experiment
No ratings yet
7counting The Number of Occurences of An Outcome in An Experiment
78 pages
Binomial Distribution
100% (1)
Binomial Distribution
15 pages
COT2-Solving Right Triangles Using Trigonometric Ratios
No ratings yet
COT2-Solving Right Triangles Using Trigonometric Ratios
38 pages
Trigonometry Workbook
No ratings yet
Trigonometry Workbook
20 pages
SMC 2023 Extended Solutions 1
No ratings yet
SMC 2023 Extended Solutions 1
23 pages
Moment Generating Functions
No ratings yet
Moment Generating Functions
7 pages
Prime Factorization
No ratings yet
Prime Factorization
11 pages
Normal Distribution Practice 1
No ratings yet
Normal Distribution Practice 1
5 pages
UCE Book 4
100% (1)
UCE Book 4
134 pages
Maths Class X Chapter 10 Circles Practice Paper 09 Answers
No ratings yet
Maths Class X Chapter 10 Circles Practice Paper 09 Answers
11 pages
Sine Rule and Cosine Rule
100% (1)
Sine Rule and Cosine Rule
35 pages
BGW-complex BGW-complex BGW-complex BGW-complex
No ratings yet
BGW-complex BGW-complex BGW-complex BGW-complex
14 pages
Regression
No ratings yet
Regression
24 pages
Translations, Rotations, Reflections, and Dilations
100% (1)
Translations, Rotations, Reflections, and Dilations
39 pages
Sequence and Series
No ratings yet
Sequence and Series
31 pages
Apmc
No ratings yet
Apmc
160 pages
12 Strategies and Formulas
No ratings yet
12 Strategies and Formulas
51 pages
The Continued Fraction
No ratings yet
The Continued Fraction
22 pages
Basic Rules of Probability
100% (1)
Basic Rules of Probability
7 pages
Counting Assessment: Learn To Solve This Type of Problems, Not Just This Problem!
No ratings yet
Counting Assessment: Learn To Solve This Type of Problems, Not Just This Problem!
4 pages
Solving Problems Involving Theorems On Similar Triangles
No ratings yet
Solving Problems Involving Theorems On Similar Triangles
8 pages
Algebraic Expressions
No ratings yet
Algebraic Expressions
24 pages
146 Chapter 13. The Trigonometric Functions (LECTURE NOTES 9)
No ratings yet
146 Chapter 13. The Trigonometric Functions (LECTURE NOTES 9)
15 pages
Module 3 PDF
No ratings yet
Module 3 PDF
23 pages
EC2303 Final Formula Sheet PDF
No ratings yet
EC2303 Final Formula Sheet PDF
8 pages
Section 6.2 Applications of Radian Measure: Objective 1: Determining The Area of A Sector of A Circle
No ratings yet
Section 6.2 Applications of Radian Measure: Objective 1: Determining The Area of A Sector of A Circle
3 pages
Basis Representation Theorem
No ratings yet
Basis Representation Theorem
25 pages
Indeterminate Forms and LHopitals Rule Presentation Slides
No ratings yet
Indeterminate Forms and LHopitals Rule Presentation Slides
45 pages
Rassias T. Functional Equations, Inequalities and Applications 2013
No ratings yet
Rassias T. Functional Equations, Inequalities and Applications 2013
219 pages
Bischof, J., Brüggemann, U., & Daske, H. (2014) .
No ratings yet
Bischof, J., Brüggemann, U., & Daske, H. (2014) .
59 pages
Algebra 1 Rev Summer 2011
0% (1)
Algebra 1 Rev Summer 2011
329 pages
Discrete Distributions Modified
No ratings yet
Discrete Distributions Modified
12 pages
Factorial PDF
No ratings yet
Factorial PDF
2 pages
A-CAT Corp Forecasting Paper - Final
No ratings yet
A-CAT Corp Forecasting Paper - Final
16 pages
Notes PDF
No ratings yet
Notes PDF
407 pages
Module 7 - Determinants - CET
No ratings yet
Module 7 - Determinants - CET
22 pages
Chapter 5 The Straight Line
No ratings yet
Chapter 5 The Straight Line
56 pages
Cnditional Probability
No ratings yet
Cnditional Probability
5 pages
2.5 Reasoning in Algebra and Geometry Practice Worksheet Form G
No ratings yet
2.5 Reasoning in Algebra and Geometry Practice Worksheet Form G
2 pages
Structural Equation Modeling Lecture Notes
100% (1)
Structural Equation Modeling Lecture Notes
40 pages
Question Bank (Economics) - Entry-Test-2021-22
No ratings yet
Question Bank (Economics) - Entry-Test-2021-22
60 pages
MBA Syllabus
No ratings yet
MBA Syllabus
164 pages
R Packages For Machine Learning
No ratings yet
R Packages For Machine Learning
3 pages
Notes On Mathematical Expectation
No ratings yet
Notes On Mathematical Expectation
6 pages
How To Write Proofs
No ratings yet
How To Write Proofs
25 pages
Analyze Arithmetic Sequences Series
No ratings yet
Analyze Arithmetic Sequences Series
24 pages
Chapter 3 - Trigonometric Functions
No ratings yet
Chapter 3 - Trigonometric Functions
11 pages
Permutation
No ratings yet
Permutation
5 pages
01252022010047AnGeom - Q3 - Module 3 - Rotation of Axes
No ratings yet
01252022010047AnGeom - Q3 - Module 3 - Rotation of Axes
15 pages
Summation: Tal. If Numbers Are Added Sequentially From Left To Right, I
No ratings yet
Summation: Tal. If Numbers Are Added Sequentially From Left To Right, I
6 pages
Graphs of Trigonometric Functions
No ratings yet
Graphs of Trigonometric Functions
8 pages
Large Print SMC 2015
No ratings yet
Large Print SMC 2015
30 pages
Factorial: Problem Code: FCTRL
No ratings yet
Factorial: Problem Code: FCTRL
4 pages
Usage Note 40724: Comparing Covariance Structures, Testing Covariance Parameters Using The COVTEST Statement in PROC GLIMMIX
No ratings yet
Usage Note 40724: Comparing Covariance Structures, Testing Covariance Parameters Using The COVTEST Statement in PROC GLIMMIX
8 pages
ZPZ Corrected - Alison Miller - MOP 2011 PDF
No ratings yet
ZPZ Corrected - Alison Miller - MOP 2011 PDF
5 pages
Suggested Reading: General Statistics Books
No ratings yet
Suggested Reading: General Statistics Books
12 pages
Arimax Arima
100% (1)
Arimax Arima
57 pages
Inferential Statistics For Psychology
No ratings yet
Inferential Statistics For Psychology
20 pages
Numerical Methods Learning Module 1 24 2
No ratings yet
Numerical Methods Learning Module 1 24 2
14 pages
A Statistical Perspective On Data Mining
No ratings yet
A Statistical Perspective On Data Mining
25 pages
117 Polynomial Problems From Amsp Toc
0% (1)
117 Polynomial Problems From Amsp Toc
3 pages
Anova and The Design of Experiments: Welcome To Powerpoint Slides For
No ratings yet
Anova and The Design of Experiments: Welcome To Powerpoint Slides For
22 pages
Model Sum of Squares DF Mean Square F Sig. 1 Regression .456 4 .114 1.388 .252 Residual 3.942 48 .082 Total 4.398 52 A. Predictors: (Constant), LC, DEBT, TANG, EXT B. Dependent Variable: DPR
No ratings yet
Model Sum of Squares DF Mean Square F Sig. 1 Regression .456 4 .114 1.388 .252 Residual 3.942 48 .082 Total 4.398 52 A. Predictors: (Constant), LC, DEBT, TANG, EXT B. Dependent Variable: DPR
3 pages
Unit 4 - Operations Planning and Control
No ratings yet
Unit 4 - Operations Planning and Control
212 pages
Anova Satu Arah
No ratings yet
Anova Satu Arah
69 pages
Forecasting WCS Prices
No ratings yet
Forecasting WCS Prices
18 pages
ISE 500 Fall 2018 Assignment 7: Regression Plot
No ratings yet
ISE 500 Fall 2018 Assignment 7: Regression Plot
26 pages
Ch02 WienerFilters Lect 04
No ratings yet
Ch02 WienerFilters Lect 04
51 pages
JMP - Statistical-Thinking - PSP Certification
No ratings yet
JMP - Statistical-Thinking - PSP Certification
10 pages
Wpiea2022220 Print PDF
No ratings yet
Wpiea2022220 Print PDF
22 pages
Algosintrvwques
No ratings yet
Algosintrvwques
27 pages
5 SEC - Usman, Britto, Damm & Börstler - Effort Estimation in Large-Scale Software DevelopmentAn Industrial Case Study
No ratings yet
5 SEC - Usman, Britto, Damm & Börstler - Effort Estimation in Large-Scale Software DevelopmentAn Industrial Case Study
30 pages
Estimating Residential Water Demand
No ratings yet
Estimating Residential Water Demand
8 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
SEE 2 - Rizkita Parithusta
No ratings yet
SEE 2 - Rizkita Parithusta
14 pages
STA3064 Assignment 1
No ratings yet
STA3064 Assignment 1
10 pages
Maths 11 3
No ratings yet
Maths 11 3
17 pages
Making Sense of Numbers and Math: My Method for Learning
From Everand
Making Sense of Numbers and Math: My Method for Learning
Dr. Cary N. Schneider
1/5 (2)
Simplifying Circle Geometry
From Everand
Simplifying Circle Geometry
Jacob Ncongwane
No ratings yet
Master SAT Prep Maths: Maths, #1
From Everand
Master SAT Prep Maths: Maths, #1
Subbalakshmi Devaki
No ratings yet
Complex analysis A Complete Guide
From Everand
Complex analysis A Complete Guide
Gerardus Blokdyk
No ratings yet

Multiple Linear Regression

Uploaded by

Multiple Linear Regression

Uploaded by

Multiple Linear Regression

By: Shruthi Reddy,Gadampalli

Traditional vs Validation Data Set

>> Fit a multiple linear regression model to the

Training Data Scoring Summary Report

Validation Data Scoring Summary Report

INDUS & NOX 0.763

After considering the highest and lowest correlation, we can eliminate

Decile-wise lift chart

Decile mean / Global mean

Lift chart (validation

Decile-wise lift chart

Lift chart (validation

correlation table for airfare

Distance is the best predictor for fare

Pivot table with the average fare

Converting Categorical variables (e.g.,

Lift chart (training dataset)

Decile mean / Global mean

Decile-wise lift chart (training

Cumulative FARE when

Cumulative FARE using

Decile mean / Global mean

Lift chart (training dataset)

Cumulative FARE when

Cumulative FARE using

Decile mean / Global mean

Average fare on route by the given

If Southwest decides to cover this route:

You might also like