0% found this document useful (0 votes)

130 views

BAUDM Assignment Predicting Boston Housing Prices

- The document describes a project to predict Boston housing prices using a regression model. - It discusses partitioning the data into training and validation sets, with the training set used to build the model and the validation set used to evaluate it. - The best fitting regression model included the variables CRIM, CHAS, RM, DIS, PTRATIO, B, LSTAT, and CAT.MEDV, achieving an R-squared of 0.8257 on the training set and 0.8332 on the validation set.

Uploaded by

Suraj

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

130 views

BAUDM Assignment Predicting Boston Housing Prices

Uploaded by

Suraj

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

BAUDM Assignment

Predicting Boston Housing Prices

By:
Suraj Dhende (14PGP015)
Saranya Panicker (14PGP108)
Bharadwaj Sista (14PGP118)
Ashis Tripathy (14PGP121)
Yadvendra Yadav (14PGP123)

Ans a:

The data should be partitioned into training and validation sets because we need two
sets of data: one to build the regression model and the other to test the model.
The regression model will describe the relationship between the dependent and the
independent variables where as the model run on the validation set will determine the
accuracy of the model that has been built on the training data set.
The validation data is used to validate or test the model. In this process, the model
(built using the training data set) is used to make predictions with the validation data data that were not used to fit the model. In this way we get an unbiased estimate of
how well the model performs. We compute measures of error which reflect the
prediction accuracy.
Ans b: The equation is:
Model Summary
Model

R Square

Approximately

Adjusted R

Std. Error of the

Square

Estimate

60% of the
cases
(SAMPLE) = 1
(Selected)
.747a

.557

.553

6.161

a. Predictors: (Constant), RM, CHAS, CRIM

ANOVAa,b
Model

Sum of Squares

Mean Square

Regression

14051.438

4683.813

Residual

11160.292

294

37.960

Total

25211.730

297

F
123.388

a. Dependent Variable: MEDV

b. Selecting only cases for which Approximately 60% of the cases (SAMPLE) = 1
c. Predictors: (Constant), RM, CHAS, CRIM

-34.867-0.218*CRIM+3.824*CHAS+9.20*RM

Sig.
.000c

Ans c:
Median house price is $-7.288
Ans d:
i. There are certain variables that measure the level of development and industrialization.
These variables are likely to be positively correlated. From the correlations we come to know
that INDUS, NOX and TAX are highly correlated. This is because areas that have a high
proportion of non-retail businesses tend to have higher taxes and more pollution.
INDUS indicates the proportion of non-retail business while NOX indicates Nitric oxide
concentration.
ii. The highly correlated variables are as follows:
1) NOX and INDUS: Correlation coefficient = 0.764
2) TAX and INDUS: Correlation coefficient = 0.688
3) AGE and NOX: Correlation coefficient = 0.724
4) DIS and NOX: Correlation coefficient = -0.765
5) DIS and AGE: Correlation coefficient = -0.745
6) TAX and RAD: Correlation coefficient = 0.891
The variables INDUS, TAX and NOX denote the same thing that is development and
urbanization. So we can remove these variables to find the best fit model
iii.
Model 1: We have chosen to keep NOX

Variables Entered/Removeda,b
Model

Variables

Entered

Removed

Method

LSTAT, B,
1

PTRATIO,
CRIM, ZN, RM,

. Enter

NOX, DISc
a. Dependent Variable: MEDV
b. Models are based only on cases for which
Approximately 60% of the cases (SAMPLE) = 1
c. All requested variables entered.

Model Summary
Model

R Square

Adjusted R

Std. Error of

Square

the Estimate

Approximately
60% of the

Change Statistics
R Square

F Change

df1

df2

Sig. F Change

Change

cases
(SAMPLE) =
1 (Selected)
.846a

.716

.709

4.973

.716

91.286

289

.000

a. Predictors: (Constant), LSTAT, B, PTRATIO, CRIM, ZN, RM, NOX, DIS

Model 2: Keeping INDUS

Variables Entered/Removeda,b
Model

Variables

Entered

Removed

Method

INDUS, B,
1

PTRATIO,

. Enter

CRIM, RM, ZN,

LSTAT, DISc

a. Dependent Variable: MEDV

b. Models are based only on cases for which
Approximately 60% of the cases (SAMPLE) = 1
c. All requested variables entered.

Model Summary
Mod
el

Approximat Square

Adjusted R

Std. Error

Square

of the

R Square

Estimate

Change

ely 60% of

Change Statistics
df1

df2

Sig. F
Change

the cases
(SAMPLE)
= 1
(Selected)
1

.837a

.701

.693

5.104

.701

84.833

a. Predictors: (Constant), INDUS, B, PTRATIO, CRIM, RM, ZN, LSTAT, DIS

289

.000

Model 3: Keeping TAX

Variables Entered/Removeda,b
Model

Variables

Entered

Removed

Method

TAX, RM, B,
1

ZN, PTRATIO,

. Enter

CRIM, DIS,
LSTATc

a. Dependent Variable: MEDV

b. Models are based only on cases for which
Approximately 60% of the cases (SAMPLE) = 1
c. All requested variables entered.

Model Summary
Mode
l

Adjusted R

Std. Error of

Approximate

Square

the Estimate

ly 60% of

Change Statistics
R Square

df1

Change

df2

Sig. F
Change

the cases
(SAMPLE) =
1 (Selected)
1

.837a

.700

.692

5.114

.700

84.361

a. Predictors: (Constant), TAX, RM, B, ZN, PTRATIO, CRIM, DIS, LSTAT

We find that the adjusted R square value is highest for following model
Model Summaryb,c
Model

R
Approximately

Approximately

60% of the

cases

(SAMPLE) = 1

(SAMPLE) ~= 1

(Selected)
1

R Square

Adjusted R

Std. Error of the

Square

Estimate

(Unselected)
a

.909

.913

.8257330890

.821

3.899

a. Predictors: (Constant), CAT. MEDV, CHAS, CRIM, B, DIS, PTRATIO, RM, LSTAT
b. Unless noted otherwise, statistics are based only on cases for which Approximately 60% of the
cases (SAMPLE) = 1.
c. Dependent Variable: MEDV

289

.000

Coefficientsa,b
Model

Unstandardized Coefficients

Standardized

Sig.

Collinearity Statistics

Coefficients
B
(Constant)

Std. Error

21.8196835416

4.650

CRIM

-.0916326951

.029

CHAS

2.5602912363

Beta

Tolerance

VIF

4.693

.000

-.088

-3.113

.002

.747

1.339

.963

.068

2.658

.008

.921

1.086

1.5674889438

.534

.112

2.938

.004

.418

2.390

DIS

-.3863479004

.133

-.086

-2.905

.004

.684

1.462

PTRATIO

-.3669881155

.124

-.087

-2.967

.003

.695

1.438

.0072449467

.003

.073

2.653

.008

.790

1.266

-.4361544592

.049

-.346

-8.820

.000

.391

2.559

12.6070216351

.825

.512

15.274

.000

.536

1.865

LSTAT
CAT. MEDV

a. Dependent Variable: MEDV

b. Selecting only cases for which Approximately 60% of the cases (SAMPLE) = 1

Final model: The variables are CRIM, CHAS, RM, DIS, PTRATIO, B, LSTAT &
CAT.MEDV

FOR TRAINING DATA SET:

Z1 = (MEDVactual - MEDV mean)^2 = 25211.73023
Z2 = (MEDVactual MEDV calculated)^2 = 4393.570348
R square = 1- (Z2/Z1) = 0.825733089

FOR TEST DATA SET:

Z1 = (MEDVactual - MEDV mean)^2 = 17486.93188
Z2 = (MEDVactual MEDV calculated)^2 = 2916.163235
R square = 1- (Z2/Z1) = 0.833237571

Solid Starts - First 100 Days
94% (18)
Solid Starts - First 100 Days
287 pages
Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
89% (45)
12 Week Program: Summer Body Starts Now
70 pages
The Hold Me Tight Workbook - Dr. Sue Johnson
100% (16)
The Hold Me Tight Workbook - Dr. Sue Johnson
187 pages
Read People Like A Book by Patrick King-Edited
62% (66)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Cheat Code To The Universe
94% (77)
Cheat Code To The Universe
34 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
COSMIC CONSCIOUSNESS OF HUMANITY - PROBLEMS OF NEW COSMOGONY (V.P.Kaznacheev,. Л. V. Trofimov.)
94% (212)
COSMIC CONSCIOUSNESS OF HUMANITY - PROBLEMS OF NEW COSMOGONY (V.P.Kaznacheev,. Л. V. Trofimov.)
212 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (541)
How To Develop and Write A Grant Proposal
17 pages
Workbook For The Body Keeps The Score
88% (52)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (28)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
75% (12)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
36 Questions To Fall in Love 1
97% (31)
36 Questions To Fall in Love 1
2 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
80% (35)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
ALCHEMIST
64% (14)
ALCHEMIST
4 pages
1001 Songs
71% (69)
1001 Songs
1,798 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
6.1 Boston Housing Sol
100% (7)
6.1 Boston Housing Sol
7 pages
40 Questions To Test A Data Scientist On Time Series
No ratings yet
40 Questions To Test A Data Scientist On Time Series
26 pages
Week1 Quiz
No ratings yet
Week1 Quiz
16 pages
TUTORIAL 7: Multiple Linear Regression I. Multiple Regression
No ratings yet
TUTORIAL 7: Multiple Linear Regression I. Multiple Regression
6 pages
SAS Code To Select The Best Multiple Linear Regression Model For Multivariate Data Using Information Criteria
No ratings yet
SAS Code To Select The Best Multiple Linear Regression Model For Multivariate Data Using Information Criteria
6 pages
Lab 9 Report
No ratings yet
Lab 9 Report
5 pages
Regression and Multiple Regression Analysis
100% (1)
Regression and Multiple Regression Analysis
21 pages
Multiple Regression
No ratings yet
Multiple Regression
100 pages
2023 Tutorial 11
No ratings yet
2023 Tutorial 11
7 pages
Homework5
No ratings yet
Homework5
6 pages
Review Question Econometrics - 2
No ratings yet
Review Question Econometrics - 2
3 pages
Annotated Stata Output - DR AMINU MATERIAL2
No ratings yet
Annotated Stata Output - DR AMINU MATERIAL2
3 pages
HW 5, 448
100% (1)
HW 5, 448
16 pages
Mediha Ass of Spss
No ratings yet
Mediha Ass of Spss
23 pages
Linear Model
No ratings yet
Linear Model
10 pages
Problem Set 6
No ratings yet
Problem Set 6
6 pages
Ch26 Exercises
No ratings yet
Ch26 Exercises
14 pages
Ch03 Guan CM Aise
No ratings yet
Ch03 Guan CM Aise
41 pages
DAR LEC10
No ratings yet
DAR LEC10
22 pages
CM
No ratings yet
CM
8 pages
Unit 4
No ratings yet
Unit 4
7 pages
1.3. MR Using SPSS
No ratings yet
1.3. MR Using SPSS
24 pages
TSFP-7_original
No ratings yet
TSFP-7_original
3 pages
Untitled Document
No ratings yet
Untitled Document
6 pages
CT 2
No ratings yet
CT 2
4 pages
Multiple Regression
100% (1)
Multiple Regression
100 pages
Econometrics For Management Assignment
No ratings yet
Econometrics For Management Assignment
3 pages
IGNOU MBA MS - 08 Solved Assignments 2011
No ratings yet
IGNOU MBA MS - 08 Solved Assignments 2011
12 pages
Assignment Today
No ratings yet
Assignment Today
9 pages
Poisson Regression - Stata Data Analysis Examples
No ratings yet
Poisson Regression - Stata Data Analysis Examples
12 pages
Econometrics 3A Supplementary Examination Memo
No ratings yet
Econometrics 3A Supplementary Examination Memo
9 pages
Students Tutorial Answers Week12
100% (1)
Students Tutorial Answers Week12
8 pages
Linear Regression With Length Predicted by Dose-1
No ratings yet
Linear Regression With Length Predicted by Dose-1
7 pages
Regression Practice Questions 2
No ratings yet
Regression Practice Questions 2
4 pages
Sample Final Solutions
No ratings yet
Sample Final Solutions
12 pages
Multiple Regression PDF
No ratings yet
Multiple Regression PDF
19 pages
Aer0 Pro RM Sir
No ratings yet
Aer0 Pro RM Sir
10 pages
Statistics 578 Assignment 5 Homework
100% (6)
Statistics 578 Assignment 5 Homework
13 pages
Multiple Linear Regression
100% (3)
Multiple Linear Regression
26 pages
Econometrics CRT M2: Regression Model Evaluation
No ratings yet
Econometrics CRT M2: Regression Model Evaluation
7 pages
Applied Regression Analysis Final Project
No ratings yet
Applied Regression Analysis Final Project
8 pages
HW3+Solution
No ratings yet
HW3+Solution
10 pages
SPSS STATISTICS PROJECT Interpretation
No ratings yet
SPSS STATISTICS PROJECT Interpretation
6 pages
Regression
No ratings yet
Regression
46 pages
Output Regresi
No ratings yet
Output Regresi
7 pages
BES - R Lab 9
No ratings yet
BES - R Lab 9
7 pages
Mid Term Umt
No ratings yet
Mid Term Umt
4 pages
Regression Output Data: Notes
No ratings yet
Regression Output Data: Notes
5 pages
HW 6
No ratings yet
HW 6
2 pages
BA Notes[End Sem)
No ratings yet
BA Notes[End Sem)
26 pages
Blank Answer File
No ratings yet
Blank Answer File
9 pages
GB Paper Imi - 1 Sep 2013 Manish
No ratings yet
GB Paper Imi - 1 Sep 2013 Manish
16 pages
HW4 Template
No ratings yet
HW4 Template
3 pages
Correlation: Assignment 3 - Correlation and Regression Analysis
No ratings yet
Correlation: Assignment 3 - Correlation and Regression Analysis
7 pages
unit5_R
No ratings yet
unit5_R
5 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Classification, Parameter Estimation and State Estimation: An Engineering Approach Using MATLAB
From Everand
Classification, Parameter Estimation and State Estimation: An Engineering Approach Using MATLAB
Bangjun Lei
3/5 (1)
Profit Driven Business Analytics: A Practitioner's Guide to Transforming Big Data into Added Value
From Everand
Profit Driven Business Analytics: A Practitioner's Guide to Transforming Big Data into Added Value
Wouter Verbeke
No ratings yet
Guided Randomness in Optimization, Volume 1
From Everand
Guided Randomness in Optimization, Volume 1
Maurice Clerc
No ratings yet
Agent-based Modeling of Tax Evasion: Theoretical Aspects and Computational Simulations
From Everand
Agent-based Modeling of Tax Evasion: Theoretical Aspects and Computational Simulations
Sascha Hokamp
No ratings yet
ECON 312 ECONOMETRICS I - Kabarak University (1)
No ratings yet
ECON 312 ECONOMETRICS I - Kabarak University (1)
5 pages
1.10 Simple Linear Regression
No ratings yet
1.10 Simple Linear Regression
9 pages
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
No ratings yet
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
19 pages
Jahangirnagar University: An Assignment On
No ratings yet
Jahangirnagar University: An Assignment On
16 pages
Assignment - IV
No ratings yet
Assignment - IV
2 pages
Grade: A. Great Job! Feel Free To Let Me Know If You Have Any Questions About My Comments
No ratings yet
Grade: A. Great Job! Feel Free To Let Me Know If You Have Any Questions About My Comments
11 pages
1.Bais varience trade-off
No ratings yet
1.Bais varience trade-off
5 pages
MAS202 - Homework For Chapter 13-14
No ratings yet
MAS202 - Homework For Chapter 13-14
7 pages
Sampada Soni - 62310151
No ratings yet
Sampada Soni - 62310151
3 pages
Geographically Weighted Regression: Martin Charlton A Stewart Fotheringham
No ratings yet
Geographically Weighted Regression: Martin Charlton A Stewart Fotheringham
17 pages
Full Download Original PDF Econometric Analysis 8th Edition by William H Greene PDF
100% (31)
Full Download Original PDF Econometric Analysis 8th Edition by William H Greene PDF
41 pages
ECE431 - Signal Detection and Estimation Theory Assignment 1
No ratings yet
ECE431 - Signal Detection and Estimation Theory Assignment 1
2 pages
Introduction to Statistics and Data Analysis: With Exercises, Solutions and Applications in R, 2nd Edition Christian Heumann All Chapters Instant Download
100% (3)
Introduction to Statistics and Data Analysis: With Exercises, Solutions and Applications in R, 2nd Edition Christian Heumann All Chapters Instant Download
50 pages
HOMW Chap8&9Fall2009
No ratings yet
HOMW Chap8&9Fall2009
4 pages
BU255 Mock Final Exam Solution
No ratings yet
BU255 Mock Final Exam Solution
13 pages
Instant ebooks textbook Stereology for Statisticians 1st Edition Adrian Baddeley (Author) download all chapters
100% (3)
Instant ebooks textbook Stereology for Statisticians 1st Edition Adrian Baddeley (Author) download all chapters
81 pages
Special Paper, 2019 BSAS, S-III Replica Statistics III
No ratings yet
Special Paper, 2019 BSAS, S-III Replica Statistics III
1 page
Model Summary and Parameter Estimates
No ratings yet
Model Summary and Parameter Estimates
1 page
CHAPTER 15 Partial and Multiple Correlation and Regression Analysis
100% (2)
CHAPTER 15 Partial and Multiple Correlation and Regression Analysis
48 pages
Econ7020X 2024S FinalExam
No ratings yet
Econ7020X 2024S FinalExam
10 pages
Multiple Linear Regression 2021
No ratings yet
Multiple Linear Regression 2021
45 pages
Multiple Regression Analysis: DR J Reeves Wesley Professor VIT Business School Reeveswesley.j@vit - Ac.in
No ratings yet
Multiple Regression Analysis: DR J Reeves Wesley Professor VIT Business School Reeveswesley.j@vit - Ac.in
19 pages
Econometrics: Problem Set 2 - 18/2/2022
No ratings yet
Econometrics: Problem Set 2 - 18/2/2022
2 pages
5cf783r0hSYZTD8N 0COXan7bvGRd4pWm-EPSM UNIT 7 WeatherTrendsSalesPredictor
No ratings yet
5cf783r0hSYZTD8N 0COXan7bvGRd4pWm-EPSM UNIT 7 WeatherTrendsSalesPredictor
5 pages
Anova 23
No ratings yet
Anova 23
6 pages
Regression Analysis - STAT510
No ratings yet
Regression Analysis - STAT510
39 pages
Fds Unit FINAL
No ratings yet
Fds Unit FINAL
27 pages
BN2102 7-10
No ratings yet
BN2102 7-10
24 pages
Cambridge Stats Table
No ratings yet
Cambridge Stats Table
15 pages
Biostat Estimation
100% (1)
Biostat Estimation
48 pages