100% found this document useful (1 vote)

1K views20 pages

House Prices Predictive Model Summary Report

This document summarizes a predictive model for house prices using a dataset of historical home sales in Ames, Iowa. The dataset contains details on 1460 homes with 81 features describing attributes like the lot, interior details, and condition of the home. Data wrangling steps included handling missing and inconsistent data. Machine learning techniques explored were regression, ridge regression, and lasso regression. The best model was selected using cross validation and model performance scores to predict house prices accurately and aid real estate industry clients.

Uploaded by

sk3146

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

1K views20 pages

House Prices Predictive Model Summary Report

Uploaded by

sk3146

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Springboard DSC Capstone Project I

House Prices – Predictive Model

NIKHILA THOTA
12/2017

1
Table of Contents
1. Introduction..................................................................................................................................... 4
2. Client ................................................................................................................................................ 4
3. Dataset............................................................................................................................................. 4
4. Data Wrangling ............................................................................................................................... 5
a. Handling missing data .............................................................................................................. 7
b. Handling inconsistent data ....................................................................................................... 7
5. New Dataset .................................................................................................................................... 7
6. Data Exploration ............................................................................................................................. 7
a. Multicollinearity ....................................................................................................................... 7
b. Some interesting questions ...................................................................................................... 9
7. Data Standardization .................................................................................................................... 12
8. Encoding Categorical Data ........................................................................................................... 12
9. Train and Test Sets ........................................................................................................................ 13

10. Machine Learning ......................................................................................................................... 13

a. Regression .............................................................................................................................. 13
11. Regularization ............................................................................................................................... 15
a. Ridge Regression .................................................................................................................... 16
b. Lasso Regression .................................................................................................................... 17

12. Cross Validation ............................................................................................................................ 19

13. Scores ............................................................................................................................................ 19
14. Summary ....................................................................................................................................... 19
15. Further Analysis ............................................................................................................................ 20

16. Recommendations ........................................................................................................................ 20

17. Figures ..............................................................................................................................................

a. Figure 6.1 .................................................................................................................................. 8
b. Figure 6.2 .................................................................................................................................. 9
c. Figure 6.3................................................................................................................................... 9
d. Figure 6.4 ................................................................................................................................ 10
e. Figure 6.5 ................................................................................................................................ 10
f. Figure 6.6 ................................................................................................................................ 11
g. Figure 6.7 ................................................................................................................................ 11

2
h. Figure 6.8 .................................................................................................................................. 12

i. Figure 8.1 .................................................................................................................................. 13

j. Figure 9.1 .................................................................................................................................. 13

k. Figure 10.1 ................................................................................................................................ 14

l. Figure 10.2 ................................................................................................................................ 14

m. Figure 10.2 ................................................................................................................................ 14

n. Figure 10.3 ................................................................................................................................ 15

o. Figure 11.1 ................................................................................................................................ 16

p. Figure 11.2 ................................................................................................................................ 16

q. Figure 11.3 ................................................................................................................................ 17

r. Figure 11.4 ................................................................................................................................ 17

s. Figure 11.5................................................................................................................................ 18

t. Figure 11.6 ................................................................................................................................ 18

u. Figure 13.1 ................................................................................................................................ 19

3
1. Introduction
An accurate prediction on the house price is important to prospective homeowners, developers,
investors, appraisers, tax assessors and other real estate market participants, such as, mortgage
lenders and insurers. Traditional house price prediction is based on cost and sale price comparison
lacking an accepted standard and a certification process. Therefore, the availability of a house price
prediction model helps fill up an important information gap and improve the efficiency of the real
estate market.

Real estate market is booming in the United States, every person’s dreams is to have a perfect
house. As house market in the USA is thriving house price becomes a crucial factor for a home seeker.
Research shows that important factors that influence the house price are housing site, housing
quality, geographical location and the environment.

2. Client
This analysis report can be an interest to any Real estate company, Real estate investors,
Mortgage lenders and Home insurers. This report helps make decisions easy for the businesses and
home seekers.

3. Dataset
Dataset consists of historical house prices of residential homes in Ames, Iowa. The dataset
consists of 81 exploratory features with 1460 observations. The dataset is extracted from Kaggle
https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data

The data set contains every minute detail of the house. Some of the major features in this data set
are:

1. Lot Area
2. Neighborhood
3. House Style
4. Quality of the house
5. Overall condition of the house
6. Year built
7. Year remodeled
8. Foundation
9. Basement Condition
10. Total basement square feet
11. 1st floor square feet
12. 2nd floor square feet
13. Above ground living area in square feet
14. Full bathrooms above ground
15. Bedrooms above grade
16. Total rooms above grade
17. Garage size in square feet
18. Garage quality

4
However, it is good idea to explore the data set from Kaggle to get good idea on the data.

4. Data Wrangling
Data Wrangling is an extremely important step for any data analysis. It is very crucial for
data to be organized. This process typically includes manually converting/mapping data from one
raw form into another format to allow for more convenient consumption and organization of the
data.

Data Cleaning steps carried out in this project are:

i. Handling missing data

ii. Handling inconsistent data in a few variables

House Prices data set information:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 81 columns):
Id 1460 non-null int64
MSSubClass 1460 non-null int64
MSZoning 1460 non-null object
LotFrontage 1201 non-null float64
LotArea 1460 non-null int64
Street 1460 non-null object
Alley 91 non-null object
LotShape 1460 non-null object
LandContour 1460 non-null object
Utilities 1460 non-null object
LotConfig 1460 non-null object
LandSlope 1460 non-null object
Neighborhood 1460 non-null object
Condition1 1460 non-null object
Condition2 1460 non-null object
BldgType 1460 non-null object
HouseStyle 1460 non-null object
OverallQual 1460 non-null int64
OverallCond 1460 non-null int64
YearBuilt 1460 non-null int64
YearRemodAdd 1460 non-null int64
RoofStyle 1460 non-null object
RoofMatl 1460 non-null object
Exterior1st 1460 non-null object
Exterior2nd 1460 non-null object
MasVnrType 1452 non-null object
MasVnrArea 1452 non-null float64
ExterQual 1460 non-null object
ExterCond 1460 non-null object
Foundation 1460 non-null object
BsmtQual 1423 non-null object
BsmtCond 1423 non-null object
BsmtExposure 1422 non-null object
BsmtFinType1 1423 non-null object
BsmtFinSF1 1460 non-null int64

5
BsmtFinType2 1422 non-null object
BsmtFinSF2 1460 non-null int64
BsmtUnfSF 1460 non-null int64
TotalBsmtSF 1460 non-null int64
Heating 1460 non-null object
HeatingQC 1460 non-null object
CentralAir 1460 non-null object
Electrical 1459 non-null object
1stFlrSF 1460 non-null int64
2ndFlrSF 1460 non-null int64
LowQualFinSF 1460 non-null int64
GrLivArea 1460 non-null int64
BsmtFullBath 1460 non-null int64
BsmtHalfBath 1460 non-null int64
FullBath 1460 non-null int64
HalfBath 1460 non-null int64
BedroomAbvGr 1460 non-null int64
KitchenAbvGr 1460 non-null int64
KitchenQual 1460 non-null object
TotRmsAbvGrd 1460 non-null int64
Functional 1460 non-null object
Fireplaces 1460 non-null int64
FireplaceQu 770 non-null object
GarageType 1379 non-null object
GarageYrBlt 1379 non-null float64
GarageFinish 1379 non-null object
GarageCars 1460 non-null int64
GarageArea 1460 non-null int64
GarageQual 1379 non-null object
GarageCond 1379 non-null object
PavedDrive 1460 non-null object
WoodDeckSF 1460 non-null int64
OpenPorchSF 1460 non-null int64
EnclosedPorch 1460 non-null int64
3SsnPorch 1460 non-null int64
ScreenPorch 1460 non-null int64
PoolArea 1460 non-null int64
PoolQC 7 non-null object
Fence 281 non-null object
MiscFeature 54 non-null object
MiscVal 1460 non-null int64
MoSold 1460 non-null int64
YrSold 1460 non-null int64
SaleType 1460 non-null object
SaleCondition 1460 non-null object
SalePrice 1460 non-null int64
dtypes: float64(3), int64(35), object(43)

The output above is produced from info() function. There are a few categorical and numerical
variables with missing values.

6
a. Handling Missing Data:
• Categorical Data: The categorical variables with missing values are ‘MasVnrType’ and
‘Electrical’. Python provides many methods like fillna, forward/ backward filling,
dropna etc. for handling missing data. I introduced another category called ‘missing’
to all the null values. This way I am retaining the original information of the data and
not guessing anything.
• Numerical Data: The most popular method to handle missing numerical data is Mean
Imputation. I applied the same on my numerical data. Mean imputation is a method
in which the missing value on a certain variable is replaced by the mean of the
available cases. This is a reliable method for handling missing numerical data.

b. Handling inconsistent data:

There are a few null values in the data set which are not actually nulls but are entered
wrongly as nulls. Referring to the actual data set description file (data_description.txt)
from Kaggle, a few values were coded as ‘NA’ if a feature was not present in the house,
but these NA values were entered as Nan in the .csv file. I decoded these misinterpreted
values as ‘No feature_name’ (feature_name being name of the feature not present in the
house).

5. New Data Set

The data is now clean without any null/ inconsistent values. I transferred this data into a new csv
file ‘house_prices_cleaned.csv’. I will use this data set for data exploration.

6. Data Exploration
Data exploration is the first step in data analysis and typically involves summarizing the main
characteristics of a dataset. It is commonly conducted using visual analytics tools. Data Visualization
is best way to explore the data because it allows users to quickly and simply view most of the relevant
features of the dataset. By displaying data graphically scatter plots/ bar charts to name a few – users
can identify variables that are likely to have interesting observations and if they are helpful for further
in-depth analysis.

I used seaborn library provided by Python for my visualizations. I divided the data frame into
numerical and categorical – containing quantitative and qualitative data respectively for the ease of
analysis.

a. Multicollinearity: Multicollinearity exists when two or more of the predictors

highly correlated, this might lead to an increase in the variance of the coefficient
estimates and make the estimates very sensitive to minor changes in the model.
I used Heat map to find out highly correlated independent variables. From the graph,
we can see that features like:
- 'GarageCars' and 'GarageArea',
- 'Total Basement square footage' and '1st floor square footage',
- 'Above grade(ground) area' and 'Total no. of rooms above grade(ground) are highly
correlated with each other.

7
The issue with Multicollinearity can be addressed through Machine Learning algorithms
such as Ridge and Lasso Regression.

Other than that, the highly correlated independent variables with the target variable Sale
Price are Overall Quality, Above Ground Living area and Garage cars.

Figure 6.1

8
b. Some interesting questions:
1. What type of lots tend to have higher prices?

Figure 6.2

Cul-de-Sac lots tend to have higher prices followed by houses that have frontage on
3 sides of property. Cul-de-sac houses usually have more lot area, this might be a
reason for a spike in a Cul-de-Sac site.

2. Which neighborhoods are most and least expensive?

Figure 6.3

9
Northridge Heights and Stone Brook have the most expensive houses and Old Town,
Brook Side, Sawyer, North Ames, Edwards, Iowa DOT and Rail Road, Meadow Village and
Briardale are least priced houses among all the neighborhoods.

3. Does external look of the house effect Sale Price?

Figure 6.4

Looks like the exterior of the house is as important as the interior. The better the exterior
quality the higher the house price is.

4. What effect does Basement Condition have on house price?

Figure 6.5

10
Basement condition has a linear effect on Sale Price, the better the quality of basement
the more the price of the house.

5. What is the relationship between HVAC system and Sale Price?

Figure 6.6

Figure 6.7

11
HVAC is one of the major component every house owner should consider before buying the
house. HVAC has a positive correlation with Sale Price.

6. How does Kitchen Quality effect the final Sale price of a house?

Figure 6.8

Kitchen is the heart of the house. It is evident from the graph that an improvised/remodeled
kitchen doesn’t come cheap.

7. Data Standardization
Before applying any Machine Learning Algorithms, it is extremely important to standardize the
data. Data Standardization should be performed to make sure that all the features are on the same
scale so that they can be compared for analyzing results. Data Standardization (or Z-score
normalization) is the process where the features are rescaled so that they’ll have the properties of a
standard normal distribution with μ=0 and σ=1, where μ is the mean (average) and σ is the standard
deviation from the mean. I used functions from Scikit-learn library (a very useful Machine Learning
library provided by Python) to standardize the data.

8. Encoding Categorical Data

Regression Analysis only takes numerical data as input, the model doesn’t consider categorical
data, because it is not possible to fit a least squares line with non-numerical data. Therefore, it is
common practice in Machine Learning to transform the categorical data into numerical data. Scikit-
learn offers two methods to achieve this task – Label Encoding and One Hot Encoding.

I used One Hot Encoding to convert the categorical data into binary form of representation. This
resulted in enormous increase in the number of features from 81 to 306 features in the resultant
matrix. With the data fully–– prepared the next step is to apply Machine Learning algorithms on data.

The data frame after performing Standardization and One Hot Encoding is below.

12
Figure 8.1

9. Train and Test Sets

Before applying ML algorithm, it is essential to split the data into train and test sets, so that there
will be an untouched data set to assess the performance of the model. I split data the into train (70%
of the entire data) and test (30% of the entire data).
X_train – contains all the predictors of train data set
Y_train – the target variable in train set
X-test – all predictors in test set
Y_test – target variable in test set
Note: Target Variable – ‘SalePrice’

Figure 9.1

Notice that train data set is a matrix with all predictors and test data is a vector with only target
variable.

10. Machine Learning

a. Regression:
I performed Multiple Linear Regression first and then moved to more advanced
algorithms. Regression plot plotted between the actual and predicted prices produced a
good fit of a line for the data.

13
Figure 10.1

The analysis can be further strengthened by making residual plots. There are three
different residual plots for train, test, train and test together. They all are surrounded along
the reference line. The data range in between $ -50,000 and $ +50,000, this is very much
comparable to real estate market. A house that has most the positively correlated features in
a house will be at least $ 50,000 to $ 80,000 higher than the houses with negatively correlated
features.

Figure 10.2

14
The regression analysis produced a bunch of positively and negatively correlated coefficients with
the Sale Price. The top ten positive and negative coefficients are

Positive Coefficients Negative Coefficeints

Figure 10.3

11. Regularization
To overcome the problem of ‘Overfitting’ which usually occurs because the model learns the train
data and noise in the data too hard Regularization is used. Regularization allows to shrink the coefficients
to zero by introducing a tuning parameter 'lambda' or 'alpha'. This ensures:

• Shrinking of parameters, therefore it is mostly used to prevent multicollinearity.

• Reduces the model complexity by coefficient shrinkage.

Ridge and Lasso Regression techniques are used in Regularization process.

15
a. Ridge Regression:

- Regression Plot:

Figure 11.1

The least squares line looks to be a good fit for the data.

- Residual Plots:

Figure 11.2

The graphs look similar to the Regression residual graphs. The plot representing Train and
Test data tells that model is performing good on test data too.

16
- Coefficients:
Positive Coefficients Negative Coefficients

Figure 11.3

There is a change in the coefficients of features. A few features now are more positively/
negatively correlated with the target variable than in Multiple Regression.

b. Lasso Regression: Lasso introduces a tuning parameter to shrink the coefficients to zero, this is
an advantage over Ridge regression.

- Regression Plot:

Figure 11.4

From the plot above the regression line is a good fit for data

17
- Residual Plots:

Figure 11.5

The data is spread within $ -50,000 and $ +50,000 of the reference line. The test data is
spread similarly as train data in the third plot. This is a sign that the model works well
outside of the train data (test data).

- Coefficients
Positive Coefficients Negative Coefficients

Figure 11.6

There is a difference between in the coefficients when Lasso regression is applied on data.

18
12. Cross Validation
When evaluating different hyperparameters for estimators, such as the alpha is this setting that
must be manually set for an Ridge, there is still a risk of overfitting on the test set because the parameters
can be tweaked until the estimator performs optimally. To solve this problem, yet another part of the
dataset can be held out as a so-called “validation set”: training proceeds on the training set, after which
evaluation is done on the validation set, and when the experiment seems to be successful, final evaluation
can be done on the test set. GridSearchCV is used in Scikit-learn library to achieve this task.

- GridSearchCV with Ridge: Alpha values considered are - alphas = [1e-15, 1e-10, 1e-8,
1e-5,1e-4, 1e-3,1e-2, 1, 5, 10]
- GridSearchCV produced best alpha value as: 10 and scores R2 as 0.873 and RMSE as
25777.429.
- GridSearchCV with Lasso: Same alpha values are considered in Lasso too - alphas =
[1e-15, 1e-10, 1e-8, 1e-5,1e-4, 1e-3,1e-2, 1, 5, 10]
- GridSearchCV produced best alpha value as: 10 and scores R2 as 0.812 and RMSE as
33835.537.

13. Scores
With these many models applied on data how can we conclude the best model for data. For this
I compared the R2 and RMSE scores produced by all the models.

Table 13.1

Comparing the train and test scores (R2 and RMSE), Ridge regression with Cross Validation
(Ridge_GridSearchCV) seems to best suited for the data, because there is not much difference between
the scores of train and test data sets. The regression and residual plots from Ridge regression using Cross
Validation also seem to be a good fit for data.

14. Summary
Houses with Full bath in Basement, Good condition, More Low quality finished area (sqft), Bigger
Garage area (sqft), More Square footage in 2nd floor, lesser age, more number of fireplaces, recent
remodeling, more number of Half baths above basement/ ground floor (for houses without basement)
are priced high.

19
Houses with bigger front yard (more than back yard), bigger lot area, more number of rooms
above basement/ ground floor (for houses without basement), Kitchen above ground floor, bigger finished
square footage of second basement, bigger area in 1st floor (sqft), Garage capacity, bigger area in Wooden
Deck, Year Sold and Enclosed Porch decreases the house price.

15. Further Analysis

While conducting my research, I felt that had there been more information about a few areas
there would have been more accurate analysis leading a less erroneous model.

• Schools: Every couple with children wants to move to a location that has good district schools. A
location with good schools will influence the sale price of a house. This piece of data was missing
from the acquired data set.
• Employment and Shopping centers: Working people prefer to live nearby their offices. This is
much convenient so that they can avoid spending hours in traffic. Therefore, locations near the
employment centers have higher property prices. Same goes with shopping centers (includes
grocery stores, malls, theaters etc.) people prefer easy access to stores and entertainment places.
This information was not provided in the data set.
• Crime Rate: A house in a perfect location with all the amenities, might be underpriced if the crime
rate is too high in the area. No information about crime rate was given in the data set.

16. Recommendations
• Businesses/ home owners can quote a price for based on some of the important features such as
the overall condition of the house and basement (if any), bigger garage, extra square footage in
2nd floor, recently built/ remodeled house, and more number of bathrooms.
• Since 96% of the predicted values range between -50,000 to 50,000 when compared to actual
values, house prices in this location (Ames, Iowa) differ by $50,000, from the base price, based on
the quality and features present in a house.
• Parties interested should decide the price of a house based on important features pointed out in
the analysis that increase/ decrease the house price.

Project: Submitted By: Abhijit Kumar Kalita
90% (21)
Project: Submitted By: Abhijit Kumar Kalita
44 pages
Business: Capstone Project House Price Prediction Project Note-1
88% (8)
Business: Capstone Project House Price Prediction Project Note-1
40 pages
FRA Milestone 1 Jupyter Notebook PDF
100% (3)
FRA Milestone 1 Jupyter Notebook PDF
42 pages
Capstone Presentation: Telecom Churn Study
100% (3)
Capstone Presentation: Telecom Churn Study
19 pages
House Prices Prediction in King County
No ratings yet
House Prices Prediction in King County
10 pages
Customer Churn Prediction Project: by Shweta Gupta
100% (6)
Customer Churn Prediction Project: by Shweta Gupta
41 pages
MRA Project MIlestone1
83% (18)
MRA Project MIlestone1
29 pages
Problem Statement
0% (2)
Problem Statement
2 pages
Data Visualisation - Car Claim Insurance Project
100% (5)
Data Visualisation - Car Claim Insurance Project
6 pages
Anushi Project-House Price Prediction
100% (2)
Anushi Project-House Price Prediction
26 pages
This Study Resource Was: Quiz 3
100% (1)
This Study Resource Was: Quiz 3
5 pages
MRA ML1 - Kirtesh
100% (7)
MRA ML1 - Kirtesh
43 pages
30 Days of Interview Preparation
100% (1)
30 Days of Interview Preparation
415 pages
Problem 1: Linear Regression
54% (13)
Problem 1: Linear Regression
14 pages
02 Hot Air Generator
50% (2)
02 Hot Air Generator
60 pages
Project Report - FRA V1.0
71% (7)
Project Report - FRA V1.0
28 pages
MRA Project Milestone 2
100% (2)
MRA Project Milestone 2
31 pages
Project +Sweta+Kumari+ +FRA+Milestone+1+ July+ 2021
100% (2)
Project +Sweta+Kumari+ +FRA+Milestone+1+ July+ 2021
31 pages
Problem Statement
100% (1)
Problem Statement
17 pages
Time Series Forecasting - SoftDrink - Business Report
75% (4)
Time Series Forecasting - SoftDrink - Business Report
37 pages
Exploring Boston Housing Data
No ratings yet
Exploring Boston Housing Data
7 pages
Capstone Proect Notes 2
100% (2)
Capstone Proect Notes 2
16 pages
Business Report Machine Learning-1
100% (7)
Business Report Machine Learning-1
60 pages
Report - Project8 - FRA - Surabhi - Report
0% (1)
Report - Project8 - FRA - Surabhi - Report
15 pages
FRA Project Business Report
100% (2)
FRA Project Business Report
27 pages
Machine Learning Assignments and Answers
No ratings yet
Machine Learning Assignments and Answers
35 pages
S K Sharma Water Chemistry in Thermal Power Plants PDF
100% (2)
S K Sharma Water Chemistry in Thermal Power Plants PDF
42 pages
Mra Project1 - Firoz Afzal
60% (5)
Mra Project1 - Firoz Afzal
20 pages
Boiler Questions
100% (12)
Boiler Questions
39 pages
Boston Condo Sale Story
0% (1)
Boston Condo Sale Story
11 pages
1 Conveyorppt Course 2010
No ratings yet
1 Conveyorppt Course 2010
36 pages
House Price Prediction
No ratings yet
House Price Prediction
59 pages
MRA Project Milestone 1 - Maminulislam
83% (6)
MRA Project Milestone 1 - Maminulislam
30 pages
High Yield Notes
No ratings yet
High Yield Notes
251 pages
Time Series Project
50% (4)
Time Series Project
2 pages
Predictive Modelling Project 1 PDF
50% (2)
Predictive Modelling Project 1 PDF
38 pages
Audit Practitioners Guide To Machine Learning Part 1 - WHPAPG1 - WHP - Eng - 1022
No ratings yet
Audit Practitioners Guide To Machine Learning Part 1 - WHPAPG1 - WHP - Eng - 1022
21 pages
Unit 5
No ratings yet
Unit 5
8 pages
Improving Steam System Performance
100% (1)
Improving Steam System Performance
107 pages
Artificial Intelligence Interview Questions: Click Here
No ratings yet
Artificial Intelligence Interview Questions: Click Here
44 pages
Performance Analysis and Optimisation of Boiler System For Improved Heat Rate
No ratings yet
Performance Analysis and Optimisation of Boiler System For Improved Heat Rate
44 pages
Real Estate Price Prediction With Regression and Classification
No ratings yet
Real Estate Price Prediction With Regression and Classification
5 pages
House Sale Price Prediction
0% (1)
House Sale Price Prediction
11 pages
Lab Report 08: Convolutional Networks For Images With Keras: Sukkur Institute of Business Administration University
No ratings yet
Lab Report 08: Convolutional Networks For Images With Keras: Sukkur Institute of Business Administration University
19 pages
Fortune Guarantee Plus - 2021-03-26T180943.267
No ratings yet
Fortune Guarantee Plus - 2021-03-26T180943.267
3 pages
House
100% (2)
House
19 pages
Ritesh Machine Learning Project
100% (9)
Ritesh Machine Learning Project
46 pages
Capstone Project Business: Predict Customer Churn in E-Commerce
100% (2)
Capstone Project Business: Predict Customer Churn in E-Commerce
10 pages
PN1 Shakti Akshaya S PDF
100% (2)
PN1 Shakti Akshaya S PDF
60 pages
Capstone Project Submission
100% (2)
Capstone Project Submission
31 pages
Project Report
100% (3)
Project Report
36 pages
Linear Regression - House Price Prediction
100% (2)
Linear Regression - House Price Prediction
174 pages
Vaibhav Kumar MRA Project Milestone 1
100% (3)
Vaibhav Kumar MRA Project Milestone 1
29 pages
Comparing Xgboost With Logistic Regression and K-Nearest Neighbours in Music Genre Classification
No ratings yet
Comparing Xgboost With Logistic Regression and K-Nearest Neighbours in Music Genre Classification
11 pages
Jupyter Notebook Project CART RF ANN
100% (1)
Jupyter Notebook Project CART RF ANN
41 pages
Facebook Comment Volume Prediction
100% (1)
Facebook Comment Volume Prediction
12 pages
Machine Learning For Engineers: Ryan G. Mcclarren
No ratings yet
Machine Learning For Engineers: Ryan G. Mcclarren
252 pages
Report - Project8 - FRA - Surabhi - Report
100% (2)
Report - Project8 - FRA - Surabhi - Report
15 pages
PTDLKT
No ratings yet
PTDLKT
11 pages
Lifi
100% (1)
Lifi
16 pages
191IT7310Machine LearningQB
No ratings yet
191IT7310Machine LearningQB
27 pages
FBC Boilers
No ratings yet
FBC Boilers
8 pages
CREDIT RISK and MARKETRISK MILESTONE2
100% (2)
CREDIT RISK and MARKETRISK MILESTONE2
34 pages
Inception
No ratings yet
Inception
56 pages
Mra Project Milestone 2: Kirtesh Tiwari PGP - Data Science and Business Analytics - Pgpdsba Online Sep - C 2021
100% (4)
Mra Project Milestone 2: Kirtesh Tiwari PGP - Data Science and Business Analytics - Pgpdsba Online Sep - C 2021
24 pages
Gowtham Mra 2
No ratings yet
Gowtham Mra 2
18 pages
Predictive Modelling Project - n1
100% (4)
Predictive Modelling Project - n1
36 pages
Predective Modellig Project
100% (1)
Predective Modellig Project
18 pages
Individual Assignment 3 Guideline
No ratings yet
Individual Assignment 3 Guideline
4 pages
Machine Learning For The Classification and Separation of E-Waste
No ratings yet
Machine Learning For The Classification and Separation of E-Waste
5 pages
Data Visvilization Project Boston-Condo-Sales
No ratings yet
Data Visvilization Project Boston-Condo-Sales
61 pages
FRA Report
100% (1)
FRA Report
30 pages
Capstone Project
100% (1)
Capstone Project
7 pages
FRA Assignment - India Credit Model
No ratings yet
FRA Assignment - India Credit Model
14 pages
Financial Risk Analysis Project Report Financial Risk Analysis Project Report
100% (2)
Financial Risk Analysis Project Report Financial Risk Analysis Project Report
29 pages
Project 7 - DVT - Manoj
No ratings yet
Project 7 - DVT - Manoj
1 page
Predictive Modeling
100% (1)
Predictive Modeling
22 pages
A Machine Learning Approach To Big Data Regression Analysis of Real Estate Prices For Inferential and Predictive Purposes
No ratings yet
A Machine Learning Approach To Big Data Regression Analysis of Real Estate Prices For Inferential and Predictive Purposes
39 pages
Project Avinash Ray DVT Car Insurance
No ratings yet
Project Avinash Ray DVT Car Insurance
4 pages
MRA Project Milestone 1 PDF
No ratings yet
MRA Project Milestone 1 PDF
1 page
Machine Learning Notes
No ratings yet
Machine Learning Notes
96 pages
FRA Assignment
100% (1)
FRA Assignment
31 pages
Milestone 1
No ratings yet
Milestone 1
2 pages
Essay: Common Pitfalls in Experimental Design
No ratings yet
Essay: Common Pitfalls in Experimental Design
25 pages
Machine Learning
No ratings yet
Machine Learning
63 pages
Mid Exam Midterm Exam of Applied Machine Learning
100% (1)
Mid Exam Midterm Exam of Applied Machine Learning
6 pages
2023 AJRCSugcapproved June 15 TH
No ratings yet
2023 AJRCSugcapproved June 15 TH
9 pages
Enhancing Neural Network Models For MNIST Digit Recognition
No ratings yet
Enhancing Neural Network Models For MNIST Digit Recognition
6 pages
CSIC 6132 排版870 878
No ratings yet
CSIC 6132 排版870 878
9 pages
Kumar Ramaiah (S6628) Synopsis-Final
No ratings yet
Kumar Ramaiah (S6628) Synopsis-Final
55 pages
Implementation of ML Model For Image Classification
No ratings yet
Implementation of ML Model For Image Classification
19 pages
House Price Prediction Using AI
No ratings yet
House Price Prediction Using AI
12 pages
Vic Bfra (Bike Form Recognition Analysis) Documentation
No ratings yet
Vic Bfra (Bike Form Recognition Analysis) Documentation
4 pages
ML Unit 01
No ratings yet
ML Unit 01
4 pages
House Price Pridiction Using Machine Learning
No ratings yet
House Price Pridiction Using Machine Learning
3 pages
Duplichecker Plagiarism Report
No ratings yet
Duplichecker Plagiarism Report
2 pages
Aiml QB
No ratings yet
Aiml QB
16 pages
Unit 1
No ratings yet
Unit 1
3 pages
Intership Report
No ratings yet
Intership Report
20 pages
Techniques For Developing and Refining Datasets
No ratings yet
Techniques For Developing and Refining Datasets
2 pages

House Prices Predictive Model Summary Report

Uploaded by

House Prices Predictive Model Summary Report

Uploaded by

Springboard DSC Capstone Project I

House Prices – Predictive Model

10. Machine Learning ......................................................................................................................... 13

12. Cross Validation ............................................................................................................................ 19

16. Recommendations ........................................................................................................................ 20

17. Figures ..............................................................................................................................................

i. Figure 8.1 .................................................................................................................................. 13

j. Figure 9.1 .................................................................................................................................. 13

k. Figure 10.1 ................................................................................................................................ 14

l. Figure 10.2 ................................................................................................................................ 14

m. Figure 10.2 ................................................................................................................................ 14

n. Figure 10.3 ................................................................................................................................ 15

o. Figure 11.1 ................................................................................................................................ 16

p. Figure 11.2 ................................................................................................................................ 16

q. Figure 11.3 ................................................................................................................................ 17

r. Figure 11.4 ................................................................................................................................ 17

t. Figure 11.6 ................................................................................................................................ 18

u. Figure 13.1 ................................................................................................................................ 19

Data Cleaning steps carried out in this project are:

i. Handling missing data

House Prices data set information:

b. Handling inconsistent data:

5. New Data Set

a. Multicollinearity: Multicollinearity exists when two or more of the predictors

2. Which neighborhoods are most and least expensive?

3. Does external look of the house effect Sale Price?

4. What effect does Basement Condition have on house price?

5. What is the relationship between HVAC system and Sale Price?

8. Encoding Categorical Data

9. Train and Test Sets

10. Machine Learning

Positive Coefficients Negative Coefficeints

• Shrinking of parameters, therefore it is mostly used to prevent multicollinearity.

Ridge and Lasso Regression techniques are used in Regularization process.

15. Further Analysis

You might also like