0% found this document useful (0 votes)
26 views14 pages

House-Price-Prediction-Using-Regression-Techniques Retouch - Removed

This document discusses building a machine learning model to predict house prices in Bangalore, India. It scrapes data from real estate websites, explores and visualizes the data. Multiple linear regression is identified as the best model based on mean squared error. The model is deployed through a Python web app. The goal is to create an accurate price prediction tool to help buyers and sellers compare prices.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views14 pages

House-Price-Prediction-Using-Regression-Techniques Retouch - Removed

This document discusses building a machine learning model to predict house prices in Bangalore, India. It scrapes data from real estate websites, explores and visualizes the data. Multiple linear regression is identified as the best model based on mean squared error. The model is deployed through a Python web app. The goal is to create an accurate price prediction tool to help buyers and sellers compare prices.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

lOMoARcPSD|38545651

TABLE OF CONTENTS

Chapter Number Contents


1. Abstract
2. INTRODUCTION

2.1 AIM and IMPORTANCE

2.2 Need and Motivation

3. DATASET
Steps in Preparing Data for Model

3.1 Data Exploration

3.2 Data Visualization

3.3 Data Selection

4. LANGUAGE AND MODELS USED

5. Python
Jupyter Notebook
NumPy
Pandas
Seaborn
Matplotlib

6. Models Used
Multiple Linear Regression
7. RESULTS AND DISCUSSIONS
8. Deployment App
9. Conclusion
10. Repository link

Downloaded by krushnapalsinh vaghela ([email protected])


lOMoARcPSD|38545651

Abstract

House price forecasting is an important topic of real estate. The literature

attempts to derive useful knowledge from historical data of property markets.

Machine learning techniques are applied to analyse historical property

transactions in India to discover useful models for house buyers and sellers.

Revealed is the high discrepancy between house prices in the most expensive

and most affordable suburbs in the city of Bangalore. Moreover, experiments

demonstrate that the Multiple Linear Regression that is based on mean squared

error measurement is a competitive approach.

Downloaded by krushnapalsinh vaghela ([email protected])


lOMoARcPSD|38545651

INTRODUCTION

AIM and IMPORTANCE

Aim
These are the Parameters on which we will evaluate ourselves-

• Create an effective price prediction model

• Validate the model’s prediction accuracy

• Identify the important home price attributes which feed the model’s
predictive power.

Downloaded by krushnapalsinh vaghela ([email protected])


lOMoARcPSD|38545651

Need and Motivation

Having lived in India for so many years if there is one thing that I had been taking

for granted, it’s that housing and rental prices continue to rise. Since the housing

crisis of 2008, housing prices have recovered remarkably well, especially in major

housing markets. However, in the 4th quarter of 2016, I was surprised to read that

Bombay housing prices had fallen the most in the last 4 years. In fact, median

resale prices for condos and coops fell 6.3%, marking the first time there was a

decline since Q1 of 2017. The decline has been partly attributed to political

uncertainty domestically and abroad and the 2014 election. So, to maintain the

transparency among customers and also the comparison can be made easy through

this model. If customer finds the price of house at some given website higher than

the price predicted by the model, so he can reject that house.

Downloaded by krushnapalsinh vaghela ([email protected])


lOMoARcPSD|38545651

DATASET

Here we have web scrapped the Data from 99acres.com website which is one of
the leading real estate websites operating in INDIA.

Our Data contains Bangalore Houses only.

Dataset looks as follows-

Data Exploration

Data exploration is the first step in data analysis and typically involves summarizing the main
characteristics of a data set, including its size, accuracy, initial patterns in the data and other
attributes. It is commonly conducted by data analysts using visual analytics tools, but it can
also be done in more advanced statistical software, Python. Before it can conduct analysis
on data collected by multiple data sources and stored in data warehouses, an organization
must know how many cases are in a data set, what variables are included, how many
missing values there are and what general hypotheses the data is likely to support. An initial
exploration of the data set can help answer these questions by familiarizing analysts with
the data with which they are working.
We divided the data 9:1 for Training and Testing purpose respectively.

Downloaded by krushnapalsinh vaghela ([email protected])


lOMoARcPSD|38545651

Data Visualization

Data visualization is the graphical representation of information and data.

By using visual elements like charts, graphs, and maps, data visualization

tools provide an accessible way to see and understand trends, outliers, and

patterns in data. In the world of Big Data, data visualization tools and

technologies are essential to analyse massive amounts of information and

make data-driven decisions.

Downloaded by krushnapalsinh vaghela ([email protected])


lOMoARcPSD|38545651

Data Selection

Data selection is defined as the process of determining the appropriate data

type and source, as well as suitable instruments to collect data. Data

selection precedes the actual practice of data collection. This definition

distinguishes data selection from selective data reporting (selectively

excluding data that is not supportive of a research hypothesis) and

interactive/active data selection (using collected data for monitoring

activities/events, or conducting secondary data analyses). The process of

selecting suitable data for a research project can impact data integrity.

The primary objective of data selection is the determination of appropriate

data type, source, and instrument(s) that allow investigators to adequately

answer research questions. This determination is often discipline-specific

and is primarily driven by the nature of the investigation, existing literature,

and accessibility to necessary data sources.

Data Transformation

Downloaded by krushnapalsinh vaghela ([email protected])


lOMoARcPSD|38545651

The log transformation can be used to make highly skewed distributions less

skewed. This can be valuable both for making patterns in the data more

interpretable and for helping to meet the assumptions of inferential statistics.

It is hard to discern a pattern in the upper panel whereas the strong relationship is

shown clearly in the lower panel. The comparison of the means of log-transformed

data is actually a comparison of geometric means. This occurs because, as shown

below, the anti-log of the arithmetic mean of log-transformed values is the

geometric mean.

Price in sq.ft

Bathrooms

Downloaded by krushnapalsinh vaghela ([email protected])


lOMoARcPSD|38545651

LANGUAGE AND MODELS USED

Python

Python is widely used in scientific and numeric computing:

• SciPy is a collection of packages for mathematics, science, and


engineering.
• Pandas is a data analysis and modelling library.
• IPython is a powerful interactive shell that features easy editing and
recording of a work session, and supports visualizations and parallel
computing.
• The Software Carpentry Course teaches basic skills for scientific
computing, running bootcamps and providing open-access teaching
materials.
Libraries Used for this Project include –

• Pandas

Downloaded by krushnapalsinh vaghela ([email protected])


lOMoARcPSD|38545651

• NumPy
• Matplotlib
• Seaborn
• Scikit Learn
• XG Boost

MODELS USED

Regression Model

• Linear Regression is a machine learning algorithm based on supervised


learning.

• It performs a regression task. Regression models a target prediction value


based on independent variables.

• It is mostly used for finding out the relationship between variables and
forecasting.

Downloaded by krushnapalsinh vaghela ([email protected])


lOMoARcPSD|38545651

Downloaded by krushnapalsinh vaghela ([email protected])


lOMoARcPSD|38545651

RESULTS
Best Suited Model
So, our study showed that……..

Linear Regression displayed the best performance for this Dataset and can be
used for deploying purposes.

Deployment App
The Model is deployed through Python Web server and using of Postman API
Flask in collaboration with HTML and CSS.

Downloaded by krushnapalsinh vaghela ([email protected])


lOMoARcPSD|38545651

Conclusion

So, our Aim is achieved as we have successfully ticked all our parameters as mentioned in
our Aim Column. It is seen that circle rate is the most effective attribute in predicting the
house price and that the Linear Regression is the most effective model for our Dataset.

Downloaded by krushnapalsinh vaghela ([email protected])


lOMoARcPSD|38545651

GitHub Repository Link:-


https://github.com/anujyadav73/Realstate_Price_Prediction.git

THANK YOU
****************

Downloaded by krushnapalsinh vaghela ([email protected])

You might also like