0% found this document useful (0 votes)

365 views13 pages

House Price Prediction Using Machine Learning in Python

This document discusses using machine learning to predict house prices based on various features. It introduces the dataset used, which contains information on over 2900 houses like size, number of bedrooms, age, etc. The text then covers data preprocessing steps like one-hot encoding categorical variables, splitting the data into training and test sets, and fitting three regression models - SVM, random forest and linear regression. It reports the mean absolute percentage error for each model on the test set, finding that the SVM model achieved the lowest error of 0.18, indicating it best predicted house prices from the given features.

Uploaded by

Mayank Vasisth Gandhi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

365 views13 pages

House Price Prediction Using Machine Learning in Python

Uploaded by

Mayank Vasisth Gandhi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

House Price Prediction using Machine

Learning in Python
We all have experienced a time when we have to look up for a new house to buy. But
then the journey begins with a lot of frauds, negotiating deals, researching the local areas
and so on.

House Price Prediction using Machine Learning

So to deal with this kind of issues Today we will be preparing a MACHINE LEARNING
Based model, trained on the House Price Prediction Dataset.
You can download the dataset from this link.
The dataset contains 13 features :
1 Id To count the records.

2 MSSubClass Identifies the type of dwelling involved in the sale.

3 MSZoning Identifies the general zoning classification of the sale.

4 LotArea Lot size in square feet.

5 LotConfig Configuration of the lot

6 BldgType Type of dwelling

7 OverallCond Rates the overall condition of the house

8 YearBuilt Original construction year

Remodel date (same as construction date if no remodeling or

9 YearRemodAdd additions).

1
0 Exterior1st Exterior covering on house

1
1 BsmtFinSF2 Type 2 finished square feet.
1
2 TotalBsmtSF Total square feet of basement area

1
3 SalePrice To be predicted

Importing Libraries and Dataset

Here we are using
 Pandas – To load the Dataframe
 Matplotlib – To visualize the data features i.e. barplot
 Seaborn – To see the correlation between features using heatmap
 Python3

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

dataset = pd.read_excel("HousePricePrediction.xlsx")

# Printing first 5 records of the dataset

print(dataset.head(5))

Output:

As we have imported the data. So shape method will show us the dimension of the
dataset.

 Python3

dataset.shape

Output:
(2919,13)

Data Preprocessing
Now, we categorize the features depending on their datatype (int, float, object) and then
calculate the number of them.

 Python3

obj = (dataset.dtypes == 'object')

object_cols = list(obj[obj].index)

print("Categorical variables:",len(object_cols))
int_ = (dataset.dtypes == 'int')

num_cols = list(int_[int_].index)

print("Integer variables:",len(num_cols))

fl = (dataset.dtypes == 'float')

fl_cols = list(fl[fl].index)

print("Float variables:",len(fl_cols))

Output:
Categorical variables : 4
Integer variables : 6
Float variables : 3

Exploratory Data Analysis

EDA refers to the deep analysis of data so as to discover different patterns and spot
anomalies. Before making inferences from data it is essential to examine all your
variables.
So here let’s make a heatmap using seaborn library.
 Python3

plt.figure(figsize=(12, 6))

sns.heatmap(dataset.corr(),

cmap = 'BrBG',

fmt = '.2f',

linewidths = 2,
annot = True)

Output:

To analyze the different categorical features. Let’s draw the barplot.

 Python3

unique_values = []

for col in object_cols:

unique_values.append(dataset[col].unique().size)

plt.figure(figsize=(10,6))

plt.title('No. Unique values of Categorical Features')

plt.xticks(rotation=90)

sns.barplot(x=object_cols,y=unique_values)

Output:

The plot shows that Exterior1st has around 16 unique categories and other features have
around 6 unique categories. To findout the actual count of each category we can plot the
bargraph of each four features separately.

 Python3

plt.figure(figsize=(18, 36))

plt.title('Categorical Features: Distribution')

plt.xticks(rotation=90)

index = 1

for col in object_cols:

y = dataset[col].value_counts()
plt.subplot(11, 4, index)

plt.xticks(rotation=90)

sns.barplot(x=list(y.index), y=y)

index += 1

Output:

Data Cleaning
Data Cleaning is the way to improvise the data or remove incorrect, corrupted or
irrelevant data.
As in our dataset, there are some columns that are not important and irrelevant for the
model training. So, we can drop that column before training. There are 2 approaches to
dealing with empty/null values
 We can easily delete the column/row (if the feature or record is not much important).
 Filling the empty slots with mean/mode/0/NA/etc. (depending on the dataset
requirement).
As Id Column will not be participating in any prediction. So we can Drop it.

 Python3

dataset.drop(['Id'],

axis=1,

inplace=True)
Replacing SalePrice empty values with their mean values to make the data distribution
symmetric.

 Python3

dataset['SalePrice'] = dataset['SalePrice'].fillna(

dataset['SalePrice'].mean())

Drop records with null values (as the empty records are very less).

 Python3

new_dataset = dataset.dropna()

Checking features which have null values in the new dataframe (if there are still any).

 Python3

new_dataset.isnull().sum()

Output:

OneHotEncoder – For Label categorical features
One hot Encoding is the best way to convert categorical data into binary vectors. This
maps the values to integer values. By using OneHotEncoder, we can easily convert object
data into int. So for that, firstly we have to collect all the features which have the object
datatype. To do so, we will make a loop.
 Python3

from sklearn.preprocessing import OneHotEncoder

s = (new_dataset.dtypes == 'object')

object_cols = list(s[s].index)

print("Categorical variables:")

print(object_cols)

print('No. of. categorical features: ',

len(object_cols))

Output:

Then once we have a list of all the features. We can apply OneHotEncoding to the whole
list.

 Python3

OH_encoder = OneHotEncoder(sparse=False)
OH_cols =
pd.DataFrame(OH_encoder.fit_transform(new_dataset[object_cols]))

OH_cols.index = new_dataset.index

OH_cols.columns = OH_encoder.get_feature_names()

df_final = new_dataset.drop(object_cols, axis=1)

df_final = pd.concat([df_final, OH_cols], axis=1)

Splitting Dataset into Training and Testing

X and Y splitting (i.e. Y is the SalePrice column and the rest of the other columns are X)

 Python3

from sklearn.metrics import mean_absolute_error

from sklearn.model_selection import train_test_split

X = df_final.drop(['SalePrice'], axis=1)

Y = df_final['SalePrice']

# Split the training set into

# training and validation set

X_train, X_valid, Y_train, Y_valid = train_test_split(

X, Y, train_size=0.8, test_size=0.2, random_state=0)

Model and Accuracy

As we have to train the model to determine the continuous values, so we will be using
these regression models.
 SVM-Support Vector Machine
 Random Forest Regressor
 Linear Regressor
And To calculate loss we will be using the mean_absolute_percentage_error module. It
can easily be imported by using sklearn library. The formula for Mean Absolute Error :

SVM – Support vector Machine

SVM can be used for both regression and classification model. It finds the hyperplane in
the n-dimensional plane. To read more about svm refer this.
 Python3

from sklearn import svm

from sklearn.svm import SVC

from sklearn.metrics import mean_absolute_percentage_error

model_SVR = svm.SVR()

model_SVR.fit(X_train,Y_train)

Y_pred = model_SVR.predict(X_valid)
print(mean_absolute_percentage_error(Y_valid, Y_pred))

Output :
0.18705129
Random Forest Regression
Random Forest is an ensemble technique that uses multiple of decision trees and can be
used for both regression and classification tasks. To read more about random forests refer
this.
 Python3

from sklearn.ensemble import RandomForestRegressor

model_RFR = RandomForestRegressor(n_estimators=10)

model_RFR.fit(X_train, Y_train)

Y_pred = model_RFR.predict(X_valid)

mean_absolute_percentage_error(Y_valid, Y_pred)

Output :
0.1929469
Linear Regression
Linear Regression predicts the final output-dependent value based on the given
independent features. Like, here we have to predict SalePrice depending on features like
MSSubClass, YearBuilt, BldgType, Exterior1st etc. To read more about Linear
Regression refer this.
 Python3
from sklearn.linear_model import LinearRegression

model_LR = LinearRegression()

model_LR.fit(X_train, Y_train)

Y_pred = model_LR.predict(X_valid)

print(mean_absolute_percentage_error(Y_valid, Y_pred))

Output :
0.187416838

Conclusion
Clearly, SVM model is giving better accuracy as the mean absolute error is the least
among all the other regressor models i.e. 0.18 approx. To get much better results
ensemble learning techniques like Bagging and Boosting can also be used.

Ames Housing Price Prediction - Complete ML Project With Python
No ratings yet
Ames Housing Price Prediction - Complete ML Project With Python
14 pages
House Price Prediction Using Machine Learning
No ratings yet
House Price Prediction Using Machine Learning
6 pages
House Price Prediction for Investors
No ratings yet
House Price Prediction for Investors
3 pages
House Price Prediction 1
No ratings yet
House Price Prediction 1
27 pages
7z1018 CW Example Predicting House Prices in King County
No ratings yet
7z1018 CW Example Predicting House Prices in King County
16 pages
House Prices Prediction
100% (1)
House Prices Prediction
51 pages
Report On Linear Regression Using R
No ratings yet
Report On Linear Regression Using R
15 pages
House Price Prediction Using Data Science
No ratings yet
House Price Prediction Using Data Science
8 pages
Analyze House Price For King County
100% (1)
Analyze House Price For King County
15 pages
King County House Price Prediction
No ratings yet
King County House Price Prediction
10 pages
Data Mini Proj
100% (2)
Data Mini Proj
44 pages
House Price Prediction for Buyers
100% (1)
House Price Prediction for Buyers
10 pages
Linear Regression - House Price Prediction
100% (2)
Linear Regression - House Price Prediction
174 pages
Solution To Problem 1: Importing The Libraries
No ratings yet
Solution To Problem 1: Importing The Libraries
6 pages
Real Estate Price Prediction Tool
No ratings yet
Real Estate Price Prediction Tool
36 pages
Graded Quiz 1 - Working With Python Great Lakes
100% (1)
Graded Quiz 1 - Working With Python Great Lakes
6 pages
Ensemble Techniques Project
100% (2)
Ensemble Techniques Project
28 pages
Capstone Project Report 2
No ratings yet
Capstone Project Report 2
178 pages
Project 5 - Cars
100% (1)
Project 5 - Cars
22 pages
House Sale Price Prediction
0% (1)
House Sale Price Prediction
11 pages
House Price Prediction Using Machine Learning
No ratings yet
House Price Prediction Using Machine Learning
6 pages
Data Science & Business Analytics: Post Graduate Program in
No ratings yet
Data Science & Business Analytics: Post Graduate Program in
16 pages
Capstone Project Final Report
No ratings yet
Capstone Project Final Report
37 pages
House Price Predection
100% (1)
House Price Predection
78 pages
Laptop Price Prediction Using Machine Learning: International Journal of Computer Science and Mobile Computing
100% (1)
Laptop Price Prediction Using Machine Learning: International Journal of Computer Science and Mobile Computing
5 pages
Predictive Modelling Project Gloria Susan Raju 11 APR 2021 PDF
No ratings yet
Predictive Modelling Project Gloria Susan Raju 11 APR 2021 PDF
56 pages
Crime Prediction in Nigeria's Higer Institutions
No ratings yet
Crime Prediction in Nigeria's Higer Institutions
13 pages
Capstone Presentation
No ratings yet
Capstone Presentation
58 pages
California Housing Price Prediction .
No ratings yet
California Housing Price Prediction .
1 page
Project ML
100% (4)
Project ML
36 pages
FRA Project Report - Chilla Nagaraju
100% (1)
FRA Project Report - Chilla Nagaraju
66 pages
Answer Report (Preditive Modelling)
100% (1)
Answer Report (Preditive Modelling)
29 pages
Machine Learning Project Analysis
No ratings yet
Machine Learning Project Analysis
114 pages
Machine Learning Transport Analysis
100% (4)
Machine Learning Transport Analysis
42 pages
P-149 Final PPT
No ratings yet
P-149 Final PPT
57 pages
Sample - Customer Churn Prediction Python Documentation
No ratings yet
Sample - Customer Churn Prediction Python Documentation
33 pages
Cart-Rf-ANN: Prepared by Muralidharan N
0% (1)
Cart-Rf-ANN: Prepared by Muralidharan N
16 pages
Machine Learning - Nabeel Khan - Final Project Report - Problem 2
100% (1)
Machine Learning - Nabeel Khan - Final Project Report - Problem 2
24 pages
Clustering Analysis: Prepared by Muralidharan N
100% (1)
Clustering Analysis: Prepared by Muralidharan N
16 pages
Property Price Prediction Capstone Project
100% (1)
Property Price Prediction Capstone Project
7 pages
Capstone Project Report
No ratings yet
Capstone Project Report
187 pages
Capstone Project Submission
100% (2)
Capstone Project Submission
31 pages
House Price Prediction
No ratings yet
House Price Prediction
52 pages
SMDM Assignment PDF
100% (1)
SMDM Assignment PDF
15 pages
FINANCE & RISK ANALYTICS – PROJECT - YARESH VIJAYASUNDARAM
No ratings yet
FINANCE & RISK ANALYTICS – PROJECT - YARESH VIJAYASUNDARAM
48 pages
Telecom Customer Churn Prediction Assessment PDF
100% (1)
Telecom Customer Churn Prediction Assessment PDF
23 pages
Machine Learning Project Basic - Linear Regression - Kaggle
No ratings yet
Machine Learning Project Basic - Linear Regression - Kaggle
10 pages
Anshul Dyundi Machine Learning July 2022
50% (2)
Anshul Dyundi Machine Learning July 2022
46 pages
AI-Powered House Price Forecast
No ratings yet
AI-Powered House Price Forecast
4 pages
LDA 01 Linear Discriminant Analysis
No ratings yet
LDA 01 Linear Discriminant Analysis
65 pages
Cart-Rf-Ann: Prepared by Muralidharan N
67% (3)
Cart-Rf-Ann: Prepared by Muralidharan N
33 pages
Lead Scoring Subjective Questions
No ratings yet
Lead Scoring Subjective Questions
3 pages
ML - Project - Business Report
No ratings yet
ML - Project - Business Report
43 pages
Report On Java Chatting
No ratings yet
Report On Java Chatting
10 pages
Chapter 5 - Classification Problems
100% (1)
Chapter 5 - Classification Problems
25 pages
Extended Project
No ratings yet
Extended Project
1 page
Capstone Notes-Model
No ratings yet
Capstone Notes-Model
20 pages
Capstone Project - Final Submission
No ratings yet
Capstone Project - Final Submission
36 pages
Machine Learning Based Predicting House Prices Using Regression Techniques
No ratings yet
Machine Learning Based Predicting House Prices Using Regression Techniques
7 pages
Report
No ratings yet
Report
40 pages
SVM Guide: Concepts, Implementation, Tuning
No ratings yet
SVM Guide: Concepts, Implementation, Tuning
13 pages
Refrigeration Air Conditioning Nme604
No ratings yet
Refrigeration Air Conditioning Nme604
3 pages
Pme Unit-1
No ratings yet
Pme Unit-1
115 pages
Add-On Notes of PME UNIT-2
No ratings yet
Add-On Notes of PME UNIT-2
49 pages
Philippines Animation Directory
No ratings yet
Philippines Animation Directory
3 pages
GCSE Maths: Straight Line Graphs Worksheet
No ratings yet
GCSE Maths: Straight Line Graphs Worksheet
20 pages
A Parents Guide To Sensory Processing
No ratings yet
A Parents Guide To Sensory Processing
2 pages
Emotion, Cognition, and Behavior
No ratings yet
Emotion, Cognition, and Behavior
4 pages
Journal Cat 2 Pages FNL
100% (1)
Journal Cat 2 Pages FNL
176 pages
Sumi SSC 2018
No ratings yet
Sumi SSC 2018
2 pages
Preparing Effective ABET Self-Study
No ratings yet
Preparing Effective ABET Self-Study
42 pages
DevOps Brochure Co1evSjPPY
No ratings yet
DevOps Brochure Co1evSjPPY
14 pages
Sample Reflection Paper Thesis
100% (3)
Sample Reflection Paper Thesis
7 pages
Social Studies April 2011 Let Room Assignments-R4
No ratings yet
Social Studies April 2011 Let Room Assignments-R4
12 pages
American (Im)migration History
No ratings yet
American (Im)migration History
10 pages
Semantic 70-73 Winong
0% (1)
Semantic 70-73 Winong
6 pages
Pamela Quispe PDF
No ratings yet
Pamela Quispe PDF
2 pages
Technology in Schools What Research Says
No ratings yet
Technology in Schools What Research Says
56 pages
Eco 245
No ratings yet
Eco 245
6 pages
Summary Writing: Firstly', Then', However', Moreover' and Similarly'. Addition To That (4w), Besides That (2w)
No ratings yet
Summary Writing: Firstly', Then', However', Moreover' and Similarly'. Addition To That (4w), Besides That (2w)
2 pages
CV - Vishal Srivastava
No ratings yet
CV - Vishal Srivastava
3 pages
Gamification of Learning
No ratings yet
Gamification of Learning
14 pages
The Universal Declaration of Human Rights (Abbreviated)
No ratings yet
The Universal Declaration of Human Rights (Abbreviated)
7 pages
Julia Fournier: International Relations & Communication Expert
No ratings yet
Julia Fournier: International Relations & Communication Expert
2 pages
Youth Unemployment
100% (1)
Youth Unemployment
25 pages
2025 Italian PHD Program List V8
No ratings yet
2025 Italian PHD Program List V8
11 pages
Ideal Gases Lecture 3
No ratings yet
Ideal Gases Lecture 3
34 pages
Babysitting & Teaching Experience
No ratings yet
Babysitting & Teaching Experience
2 pages
Avasars For STD 7 - Week 1 - Aug 2020
No ratings yet
Avasars For STD 7 - Week 1 - Aug 2020
22 pages
On The Advantage and Disadvantage of History For Life by Nietzsche
100% (1)
On The Advantage and Disadvantage of History For Life by Nietzsche
3 pages
Nurture Jee (Main) Mt2 Phase 1+1a On 22 June
No ratings yet
Nurture Jee (Main) Mt2 Phase 1+1a On 22 June
66 pages
Practice 4 Advanced Handout
No ratings yet
Practice 4 Advanced Handout
7 pages
Applied Data Science With Machine Learning
100% (2)
Applied Data Science With Machine Learning
21 pages
Understanding Society & Identity
No ratings yet
Understanding Society & Identity
26 pages

House Price Prediction Using Machine Learning in Python

Uploaded by

House Price Prediction Using Machine Learning in Python

Uploaded by

House Price Prediction using Machine

House Price Prediction using Machine Learning

2 MSSubClass Identifies the type of dwelling involved in the sale.

3 MSZoning Identifies the general zoning classification of the sale.

4 LotArea Lot size in square feet.

5 LotConfig Configuration of the lot

6 BldgType Type of dwelling

7 OverallCond Rates the overall condition of the house

8 YearBuilt Original construction year

Remodel date (same as construction date if no remodeling or

Importing Libraries and Dataset

import matplotlib.pyplot as plt

import seaborn as sns

# Printing first 5 records of the dataset

obj = (dataset.dtypes == 'object')

Exploratory Data Analysis

To analyze the different categorical features. Let’s draw the barplot.

for col in object_cols:

plt.title('No. Unique values of Categorical Features')

plt.title('Categorical Features: Distribution')

for col in object_cols:

from sklearn.preprocessing import OneHotEncoder

print('No. of. categorical features: ',

df_final = new_dataset.drop(object_cols, axis=1)

df_final = pd.concat([df_final, OH_cols], axis=1)

Splitting Dataset into Training and Testing

from sklearn.metrics import mean_absolute_error

from sklearn.model_selection import train_test_split

# Split the training set into

# training and validation set

X_train, X_valid, Y_train, Y_valid = train_test_split(

X, Y, train_size=0.8, test_size=0.2, random_state=0)

Model and Accuracy

SVM – Support vector Machine

from sklearn import svm

from sklearn.svm import SVC

from sklearn.metrics import mean_absolute_percentage_error

from sklearn.ensemble import RandomForestRegressor

You might also like