0% found this document useful (0 votes)

1K views

Project Report Gr-12

This document presents a project report for developing a house price predictor model. It includes the names and roll numbers of the 5 students working on the project under the guidance of Prof. Bidisha Patra. The aim is to create an effective price prediction model using machine learning algorithms and validate the model's accuracy. Multiple regression models will be tested and evaluated based on their RMSE to identify the best performing model. The document outlines the methodology, data processing steps, evaluation metrics and computer specifications used for the project.

Uploaded by

Samik Dey

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views

Project Report Gr-12

Uploaded by

Samik Dey

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 25

HOUSE PRICE PREDICTOR

A Project Report for Major Project

NAME UNIVERSITY ROLL NO.

I. Samik Dey 29401219023
II. Swapnomoy Ghosh 29401219035
III. Abhijit Chakraborty 29401219055
IV. Sattick Nag 29401219057
V. Piyali Mondal 29401219051

Under the guidance of

Prof. Bidisha Patra

FUTURE INSTITUTE OF
ENGINEERING AND MANAGEMENT

1
CERTIFICATE OF THE PROJECT WORK

We do hereby certifying that the work which is being presented in the Major Project Report entitled

“HOUSE PRICE PREDICTOR”, in partial fulfilment of the requirements for the award of the

Bachelor of Computer Application and submitted to the Department of BCA of Future Institute

of Engineering and Management, Kolkata, WB is an authentic record of our own work carried out

during the period from 15.05.2022 to 14.06.2022 under the supervision of Prof. Bidisha Patra. The

matter presented in this thesis has not been submitted by us for the award of any other degree

elsewhere.

Name & Signature of the Candidate(s)

29401219023

29401219035

29401219055

29401219057

29401219051
This is to certify that the above statement made by the candidates is correct to the best of my
knowledge.

Signature of the Supervisor

Name of Supervisor
[Designation of Supervisor]
Date

[Signature of the [Signature of the Panel members]

Head of the Department]

2
Acknowledgement

I take this opportunity to express my deep gratitude and sincerest thanks to my Project

mentor, Prof. Bidisha Patra for giving most valuable suggestion, helpful guidance

and encouragement in the execution of this project work.

I would like to give a special recognition to my colleagues. Last but not the least I am

grateful to all the faculty members of our department and their support.

3
TABLE OF CONTENTS

Chapter Number Contents Page Number

Abstract 6
1 INTRODUCTION 7
1.1 AIM and IMPORTANCE 7
1.1.1 Aim 7
1.1.2 Need and Motivation 8
1.1.3 Methodology 9
1.1.4 Evaluation metrics, Computer specifications 10

2. DATASET 11

2.1.1 Data Exploration 12

2.1.2 Data Visualization 13
2.1.3 Data Selection 14 - 15
2.1.4 Data Transformation 16

3 LANGUAGE AND MODELS USED 17

3.1 Python 17
3.1.1 Jupyter Notebook 17
3.1.2 NumPy 17
3.1.3 Pandas 17
3.1.4 Matplotlib 17

3.2 Models Used 18

3.2.1 Multiple Linear Regression 18
3.2.2 Decision Regressor 18
3.2.3 Random Forest Regressor 18

4
4. RESULTS AND DISCUSSIONS 19

4.1 BEST SUITED MODEL

5. SCREENSHOTS OF THE PROJECT 20 - 21

6. Future scope and further enhancement of the Project, Conclusion,

Bibliography 22

5
Abstract

House Price Index (HPI) is commonly used to estimate the changes in housing price.

Since housing price is strongly correlated to other factors such as location, area,

population, it requires other information apart from HPI to predict individual housing

price. There has been a considerably large number of papers adopting traditional machine

learning approaches to predict housing prices accurately, but they rarely concern about

the performance of individual models and neglect the less popular yet complex models.

As a result, to explore various impacts of features on prediction methods, this paper will

apply both traditional and advanced machine learning approaches to investigate the

difference among several advanced models. This paper will also comprehensively

validate multiple techniques in model implementation on regression and provide an

optimistic result for housing price prediction.

6
INTRODUCTION

AIM and IMPORTANCE

Aim

These are the Parameters on which we will evaluate ourselves-

• Create an effective price prediction model

• Validate the model’s prediction accuracy

• Identify the important home price attributes which feed the model’s predictive power.

7
Need and Motivation

Having lived in India for so many years if there is one thing that I had been taking for

granted, it’s that housing and rental prices continue to rise. Since the housing crisis of

2008, housing prices have recovered remarkably well, especially in major housing

markets. However, in the 4th quarter of 2016, I was surprised to read that Kolkata

housing prices had fallen the most in the last 4 years. In fact, median resale prices for

condos and coops fell 6.3%, marking the first time there was a decline since Q1 of 2017.

The decline has been partly attributed to political uncertainty domestically and abroad

and the 2014 election. So, to maintain the transparency among customers and also the

comparison can be made easy through this model. If customer finds the price of house at

some given website higher than the price predicted by the model, so he can reject that

house.

8
Methodology

The experiment is done to pre-process the data and evaluate the prediction accuracy of the

models. The experiment has multiple stages that are required to get the prediction results. These

stages can be defined as:

 Pre-processing: both datasets will be checked and pre-processed. These methods have

various ways of handling data. Thus, the preprocessing is done on multiple iterations

where each time the accuracy will be evaluated with the used combination.

 Data splitting: dividing the dataset into two parts is essential to train the model with

one and use the other in the evaluation. The dataset will be split 75% for training and

25% for testing.

 Evaluation: the accuracy of both datasets will be evaluated by measuring the R2 and

RMSE rate when training the model alongside an evaluation of the actual prices on the

test dataset with the prices that are being predicted by the model.

 Performance: alongside the evaluation metrics, the required time to train the model

will be measured to show the algorithm vary in terms of time.

 Correlation: correlation between the available features and house price will be

evaluated using the Pearson Coefficient Correlation to identify whether the features

have a negative, positive or zero correlation with the house price.

9
Evaluation Metrics

The prediction accuracy will be evaluated by measuring the Root Mean Square Error (RSME)

of the model used in training. RSME shows the error percentage between the actual and

predicted data, which in this case, the house prices.

Computer Specifications

The needed time to train the model depends on the capability of the used system during the

experiment. Some libraries use GPU resources over the CPU to take a shorter time to train a

model.

Client Machine Server Machine

HDD 1 TB HDD 1 TB
Processor 2.30 GHz Dual- Processor 1.6GHz Quad-Core
Core processor processor
Memory 4 GB Memory 8 GB
Operating System Windows 10 Operating System Windows 10

10
DATASET

Here we have web scrapped the Data from “UCI Machine Learning Repository” website

which is a collection of databases, domain theories, and data generators that are used by

the machine learning community for the empirical analysis of machine learning

algorithms.

Dataset looks as follows-

11
12
Data Exploration

Data exploration is the first step in data analysis and typically involves summarizing the main

characteristics of a data set, including its size, accuracy, initial patterns in the data and other

attributes. It is commonly conducted by data analysts using visual analytics tools, but it can

also be done in more advanced statistical software, Python. Before it can conduct analysis on

data collected by multiple data sources and stored in data warehouses, an organization must

know how many cases are in a data set, what variables are included, how many missing

values there are and what general hypotheses the data is likely to support. An initial

exploration of the data set can help answer these questions by familiarizing analysts with the

data with which they are working.

We divided the data 8:2 for Training and Testing purpose respectively.

13
Data Visualization

Data visualization is the graphical representation of information and data. By using

visual elements like charts, graphs, and maps, data visualization tools provide an

accessible way to see and understand trends, outliers, and patterns in data. In the

world of Big Data, data visualization tools and technologies are essential to analyse

massive amounts of information and make data-driven decisions.

14
15
Data Selection

Data selection is defined as the process of determining the appropriate data type and

source, as well as suitable instruments to collect data. Data selection precedes the

actual practice of data collection. This definition distinguishes data selection from

selective data reporting (selectively excluding data that is not supportive of a research

hypothesis) and interactive/active data selection (using collected data for monitoring

activities/events, or conducting secondary data analyses). The process of selecting

suitable data for a research project can impact data integrity.

The primary objective of data selection is the determination of appropriate data type,

source, and instrument(s) that allow investigators to adequately answer research

questions. This determination is often discipline-specific and is primarily driven by

the nature of the investigation, existing literature, and accessibility to necessary data

sources.

16
Correlation Scatter Matrix

17
Data Transformation

The log transformation can be used to make highly skewed distributions less skewed. This

can be valuable both for making patterns in the data more interpretable and for helping to

meet the assumptions of inferential statistics.

It is hard to discern a pattern in the upper panel whereas the strong relationship is shown

clearly in the lower panel. The comparison of the means of log-transformed data is actually a

comparison of geometric means. This occurs because, as shown below, the anti-log of the

arithmetic mean of log-transformed values is the geometric mean.

18
LANGUAGE AND MODELS USED

Python

Python is widely used in scientific and numeric computing:

 SciPy is a collection of packages for mathematics, science, and engineering.

 Pandas is a data analysis and modelling library.
 IPython is a powerful interactive shell that features easy editing and recording of a work
session, and supports visualizations and parallel computing.
 The Software Carpentry Course teaches basic skills for scientific computing, running
bootcamps and providing open-access teaching materials.

Libraries and Software Used for this Project include –

 Pandas
 NumPy
 Matplotlib
 Scikit Learn
 Anaconda
 Jupyter notebook

19
MODELS USED

Regression Model

• Linear Regression is a machine learning algorithm based on supervised learning.

• It performs a regression task. Regression models a target prediction value based on

independent variables.

• It is mostly used for finding out the relationship between variables and forecasting.

Decision Tree Regressor Model

• Decision tree regression observes features of an object and trains a model in the structure of
a tree to predict data in the future to produce meaningful continuous output.

• The decision tree is used to fit a sine curve with addition noisy observation. As a result, it
learns local linear regressions approximating the sine curve.

Random Forest Regression Model

• A Random Forest is an ensemble technique capable of performing both regression and

classification tasks with the use of multiple decision trees and a technique called
Bootstrap Aggregation, commonly known as bagging.

• Bagging, in the Random Forest method, involves training each decision tree on a
different data sample where sampling is done with replacement.

• The basic idea behind this is to combine multiple decision trees in determining the
final output rather than relying on individual decision trees.

20
RESULTS AND DISCUSSIONS

Best Suited Model

So, our study showed that,

Random Forest Regression Model displayed the best performance for this Dataset and can
be used for deploying purposes.

Decision Tree Regressor Model and Linear Regression are far behind, so can’t be
recommended for further deployment purposes.
4.14796932816945

4.14512470542556

R MSE BA R GR A PH
LR DTR RFR
2.90346586503298

4.5

3.5

2.5

1.5

0.5

0
RMSE GRAPH

21
SCREENSHOTS OF THE PROJECT

Train-Test splitting

Selecting a desired model

Testing the model on test data

22
Predicting the price

23
Future scope and further enhancement of the Project

Since this project has been done by using Machine Learning, therefore this project can be

further enhanced using more advanced Machine Learning and Data Analysis technologies.

Conclusion

So, our Aim is achieved as we have successfully ticked all our parameters as mentioned in

our Aim Column. It is seen that circle rate is the most effective attribute in predicting the

house price and that the Random Forest Regression Model is the most effective model for our

Dataset with final RMSE score of 2.9034658650329894.

References/Bibliography

• UCI machine learning repository

• https://scikit-learn.org/

• Python Machine Learning By Example Author – Yuxi (Hayden) Liu

• http://stackoverflow.com/

24
25

Data Science Project Report
No ratings yet
Data Science Project Report
5 pages
R-Programming Notes
100% (1)
R-Programming Notes
33 pages
Statistics Project Report PGDM Final
No ratings yet
Statistics Project Report PGDM Final
45 pages
Business Statistics Project Report
No ratings yet
Business Statistics Project Report
15 pages
Practical - 6. To Study Single-Row Functions.: 1. Write A Query To Display The Current Date. Label The Column Date
No ratings yet
Practical - 6. To Study Single-Row Functions.: 1. Write A Query To Display The Current Date. Label The Column Date
4 pages
"House Price Prediction": Internship Project Report On
No ratings yet
"House Price Prediction": Internship Project Report On
34 pages
Supermarket Sales Analysis and Prediction
100% (1)
Supermarket Sales Analysis and Prediction
34 pages
01 Excel Test CL 11 and Below (2370)
100% (2)
01 Excel Test CL 11 and Below (2370)
24 pages
Research Methodology Lab File
100% (1)
Research Methodology Lab File
77 pages
"Resume Screening Using ML": R.V.S. College of Engineering and Technology Kolhan University
100% (1)
"Resume Screening Using ML": R.V.S. College of Engineering and Technology Kolhan University
54 pages
Ect MAD 8D Calibration Procedure: Using The Vertical Volts Method
No ratings yet
Ect MAD 8D Calibration Procedure: Using The Vertical Volts Method
7 pages
Iare Befa Tutorial Question Bank-Converted 0
No ratings yet
Iare Befa Tutorial Question Bank-Converted 0
20 pages
Final Year Project Report
50% (2)
Final Year Project Report
53 pages
House Price Predection
100% (1)
House Price Predection
78 pages
DABM Lab Manual Syllabuswise PDF
100% (1)
DABM Lab Manual Syllabuswise PDF
111 pages
Vehicle Count Prediction
100% (2)
Vehicle Count Prediction
33 pages
Exercise On Correlation and Regression1
No ratings yet
Exercise On Correlation and Regression1
10 pages
DBMS Practical
No ratings yet
DBMS Practical
16 pages
Hypothesis - 20200412 - 134040 PDF
100% (1)
Hypothesis - 20200412 - 134040 PDF
27 pages
Project Report
No ratings yet
Project Report
67 pages
CSE MINI PROJECT Report
No ratings yet
CSE MINI PROJECT Report
14 pages
Fundamentals of Economics - Question Bank - Unit 3
No ratings yet
Fundamentals of Economics - Question Bank - Unit 3
22 pages
BSA - PUT - SEM I - 21-22 Solution
No ratings yet
BSA - PUT - SEM I - 21-22 Solution
16 pages
Report
100% (3)
Report
101 pages
Data Mining Using Python Lab
100% (1)
Data Mining Using Python Lab
63 pages
Computer Practical For Bba
100% (1)
Computer Practical For Bba
32 pages
Submitted To: - Submitted By:-: Page 1 of 32
100% (2)
Submitted To: - Submitted By:-: Page 1 of 32
32 pages
Project Report
100% (1)
Project Report
61 pages
INTERNSHIP REPORT Baseer
No ratings yet
INTERNSHIP REPORT Baseer
23 pages
Sequencing Problem - Processing N Jobs Through 2 Machines PDF
No ratings yet
Sequencing Problem - Processing N Jobs Through 2 Machines PDF
5 pages
LP-II Lab Manual
No ratings yet
LP-II Lab Manual
11 pages
Recipe Book: Project Report On
No ratings yet
Recipe Book: Project Report On
22 pages
I M Com QT Final On16march2016
0% (1)
I M Com QT Final On16march2016
166 pages
Predictive Analysis For Big Mart Sales Using Machine
100% (1)
Predictive Analysis For Big Mart Sales Using Machine
11 pages
Mba-1-Sem-Business-Statistics-Mba-Aktu-Previous Year Paper
No ratings yet
Mba-1-Sem-Business-Statistics-Mba-Aktu-Previous Year Paper
7 pages
Model Ba4030 E-Business.i I
No ratings yet
Model Ba4030 E-Business.i I
4 pages
Judgement Sheet
No ratings yet
Judgement Sheet
2 pages
Qualitative Methods of Demand Forecasting
0% (1)
Qualitative Methods of Demand Forecasting
14 pages
18ISL66 - Software Testing Laboratory - Lab Manual
No ratings yet
18ISL66 - Software Testing Laboratory - Lab Manual
56 pages
Frequent Pattern Based Clustering
100% (1)
Frequent Pattern Based Clustering
18 pages
Report of The Summer Internship Project
No ratings yet
Report of The Summer Internship Project
25 pages
Chapter 2 - Evolution of Software Economics
100% (3)
Chapter 2 - Evolution of Software Economics
11 pages
Python
100% (1)
Python
8 pages
MSC Applied Statistics Project
No ratings yet
MSC Applied Statistics Project
25 pages
SVU MBA Model Paper
100% (1)
SVU MBA Model Paper
28 pages
Used Car Price Prediction: B.E. (CSE) VI Semester Case Study
100% (2)
Used Car Price Prediction: B.E. (CSE) VI Semester Case Study
30 pages
CRM Project Report
No ratings yet
CRM Project Report
125 pages
Questions With Answer
No ratings yet
Questions With Answer
6 pages
Mcom Project First Pages Format
No ratings yet
Mcom Project First Pages Format
5 pages
College ERP System Final
No ratings yet
College ERP System Final
19 pages
SPBA102..Quantitative and Research Methods in Business
100% (4)
SPBA102..Quantitative and Research Methods in Business
237 pages
ccw331 Business Analytics Internal I SET 1 Question Paper
No ratings yet
ccw331 Business Analytics Internal I SET 1 Question Paper
1 page
Mba 1 Sem Business Statistics Rmb104 2021
100% (1)
Mba 1 Sem Business Statistics Rmb104 2021
2 pages
General Model of A Human As An Information Processor
No ratings yet
General Model of A Human As An Information Processor
34 pages
SPPU 2022 Solved Question Paper DWDM
50% (2)
SPPU 2022 Solved Question Paper DWDM
25 pages
Project Report
No ratings yet
Project Report
91 pages
Hypothesis Testing Questions
No ratings yet
Hypothesis Testing Questions
20 pages
Practical Slips2019 Sem1
50% (2)
Practical Slips2019 Sem1
30 pages
Document from Komal
No ratings yet
Document from Komal
35 pages
Data Science Project
No ratings yet
Data Science Project
49 pages
Project Report
No ratings yet
Project Report
53 pages
Factors Affecting Safety and Health Amongst Workers in Construction Site
No ratings yet
Factors Affecting Safety and Health Amongst Workers in Construction Site
13 pages
7 - Big and Small Data
No ratings yet
7 - Big and Small Data
2 pages
Retail Management Sample Project
100% (2)
Retail Management Sample Project
49 pages
Occupational Standards - Customer Service Manager
No ratings yet
Occupational Standards - Customer Service Manager
8 pages
Lind 18e Chap004
No ratings yet
Lind 18e Chap004
28 pages
Summer Internship Report On Topic-A Study of Recruitment Process and Strategies at Square Yards
No ratings yet
Summer Internship Report On Topic-A Study of Recruitment Process and Strategies at Square Yards
11 pages
Practical Research 2 Module
No ratings yet
Practical Research 2 Module
37 pages
GraphPad Prism 5 One-Way ANOVA
No ratings yet
GraphPad Prism 5 One-Way ANOVA
36 pages
Tutorial Letter 101/0/2017: Forecasting
100% (1)
Tutorial Letter 101/0/2017: Forecasting
10 pages
Syllabus For Mba Uttar
No ratings yet
Syllabus For Mba Uttar
25 pages
Sd108 - Quantitative Impacts of Project Change
67% (3)
Sd108 - Quantitative Impacts of Project Change
160 pages
Regression
No ratings yet
Regression
7 pages
Article Analysis Template
No ratings yet
Article Analysis Template
4 pages
AdditionalDSUSMaterial Complete
100% (1)
AdditionalDSUSMaterial Complete
403 pages
Cs3352 FDS Question Bank
No ratings yet
Cs3352 FDS Question Bank
145 pages
(NATO Science For Peace and Security Series B - Physics and Biophysics) Plamen Petkov, Dumitru Tsiulyanu, Cyril Popov, Wilhelm Kulisch - Advanced N
No ratings yet
(NATO Science For Peace and Security Series B - Physics and Biophysics) Plamen Petkov, Dumitru Tsiulyanu, Cyril Popov, Wilhelm Kulisch - Advanced N
499 pages
Faculty of Psychology Bachelor of Science (Psychology) Curriculum Structure Document Academic Year
No ratings yet
Faculty of Psychology Bachelor of Science (Psychology) Curriculum Structure Document Academic Year
97 pages
Forcasting Techniques in SM
100% (1)
Forcasting Techniques in SM
43 pages
Rajiv Gandhi University of Health Sciences Bangalore, Karnataka
No ratings yet
Rajiv Gandhi University of Health Sciences Bangalore, Karnataka
11 pages
Assignment 8: 50 I 1 I 50 I 1 I
No ratings yet
Assignment 8: 50 I 1 I 50 I 1 I
3 pages
Business Intelligence and Analytics: Prepared by Dr. Hima Suresh Assistant Professor Division of CS, SOE
No ratings yet
Business Intelligence and Analytics: Prepared by Dr. Hima Suresh Assistant Professor Division of CS, SOE
36 pages
The Effect of E-Service Quality On E-Loyalty With E-Satisfaction As An Intervening For Gofood Application Users
No ratings yet
The Effect of E-Service Quality On E-Loyalty With E-Satisfaction As An Intervening For Gofood Application Users
9 pages
Complex Systems - For Biology To Landscapes
No ratings yet
Complex Systems - For Biology To Landscapes
80 pages
Machine Learning Doc-2
No ratings yet
Machine Learning Doc-2
8 pages
SCI 1020 - wk2
No ratings yet
SCI 1020 - wk2
4 pages
Project Report On Consumer Pepsi Coca Cola
No ratings yet
Project Report On Consumer Pepsi Coca Cola
55 pages
H1-1 Stepwise
No ratings yet
H1-1 Stepwise
14 pages
A Case Study in PA 17 (Special Topic For PA)
No ratings yet
A Case Study in PA 17 (Special Topic For PA)
27 pages
Ankur - Shukla - DS - Almabetter - Ankur Shukla
No ratings yet
Ankur - Shukla - DS - Almabetter - Ankur Shukla
1 page