House Price Estimates Based On Machine Learning Algorithm

Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-4 , June 2021, URL: https://www.ijtsrd.compapers/ijtsrd42367.pdf Paper URL: https://www.ijtsrd.comcomputer-science/other/42367/house-price-estimates-based-on-machine-learning-algorithm/jakir-khan

Uploaded by

Editor IJTSRD

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views

House Price Estimates Based On Machine Learning Algorithm

Uploaded by

Editor IJTSRD

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

International Journal of Trend in Scientific Research and Development (IJTSRD)

Volume 5 Issue 4, May-June 2021 Available Online: www.ijtsrd.com e-ISSN: 2456 – 6470

House Price Estimates Based on Machine Learning Algorithm

Jakir Khan1, Dr. Ganesh D2
1Student, 2Assistant Professor,
1,2Department of MCA, School of CS & IT, Jain University, Bangalore, Karnataka, India

ABSTRACT How to cite this paper: Jakir Khan | Dr.

Housing prices are increasing every year, necessitating the creation of a long- Ganesh D "House Price Estimates Based
term housing price strategy. Predicting a home's price will assist a developer on Machine Learning
in determining a home's purchase price, as well as a consumer in determining Algorithm" Published
the best time to buy a home. The sale price of real estate in major cities in International
depends on the specific circumstances. Housing prices are constantly changing Journal of Trend in
from day to day and are sometimes fired rather than based on estimates. Scientific Research
Predicting real estate prices by real factors is a key element as part of our and Development
analysis. We want to make our test dependent on all of the simple metrics that (ijtsrd), ISSN: 2456- IJTSRD42367
are taken into account when deciding the significance. In this research we use 6470, Volume-5 |
linear regression techniques pathway and our results are not self-inflicted Issue-4, June 2021, pp.795-799, URL:
process rather is a weighted method of various techniques to give the most www.ijtsrd.com/papers/ijtsrd42367.pdf
accurate results. There are fifteen features in the data collection. In this
research. There has been an effort to build a forecasting model for Copyright © 2021 by author (s) and
determining the price based on the variables that influence the price.The International Journal of Trend in Scientific
results have proven to be effective lower error and higher accuracy than Research and Development Journal. This
individual algorithms are used. is an Open Access article distributed
under the terms of
KEYWORDS: Machine Learning, Linear Regression algorithm the Creative
Commons Attribution
License (CC BY 4.0)
(http://creativecommons.org/licenses/by/4.0)

I. INTRODUCTION
Identification House prices continue to change for the day following is a review of the text related to the study of house
and going out for a day and sometimes smoking rather than price prediction.
based on estimates. Machine learning algorithms are used for
In paper [1] 4,000 unprocessed datasets from eight counties
modeling, in which the machine learns data and uses what it
were collected from real estate agents' Services with
has learned to predict new data. The most well-known
Multiple Listings (MLS) in Washington, DC. Three separate
paradigm of forecasting research is backstage. The proposed
learning algorithms was put to the test the hedonic
model for accurately predicting future outcomes has
hypothesis (PCR, SVR and K-NN). For component analysis
economic, financial, banking, health, commercial,
and decomposition, PCA was used. The Chi-Square Quantile-
recreational, sport and other sectors. Many variables are used
Quantile plot and Henze-Multivariate Zirkler's Normality
in a single way to predict house prices. Forecasting houses
Test were used to perform the normality test. Model
prices with real features are our main crux research project.
performance has shown that PCR has side effects for SVR
We use various how to get back on track, too our results are
and K-NN. The suitability and replacement of PCR, SVR, and
not self-determined process rather is a weighty method of
K-NN in the application of the hedonic pricing policy was
various techniques to give the most accurate results. The
also confirmed in this report.
results have proven to be effective lower error and higher
accuracy than individual algorithms are used. The aim of this In paper [2] investigating price redistribution a combination
research is to gain a useful understanding of the housing of Sale Price for each variety and present many new
market in the United States by analyzing the actual historical variables For example, Fig. 1 shows the log conversion
data of the transaction. Looking for models that can estimate Individual price distribution neighbors. The purpose of the
the worth of a house based on a set of features. Practical engineering features is to improve data familiarity and
models can help home buyers and for real estate brokers to equity, while setting the highest parameter Iteration times
make informed decisions. In addition, it can assist in are used to improve data consistency.
predicting home rates in the future and the process of
In paper [3] to improve interpretation and improve
formulating housing market policy. In comparison to other
performance of predictive models, data reduction strategies
traditional methods, our work can achieve a better
like Stepwise and Boosting are exploited to get more
performance by experimenting with real-world property
important predictions. In addition, PCA, data conversion
transaction data. With our research project, consider USA as
technique, used to find things that are important to combine
our main and predictable location prices of real estate in
with SVM. There are 34,857 observations and 21 variables in
various locations around USA.
the original dataset. From 2016 to 2018, each observation
II. LITERATURE REVIEW depicts a real sold house transaction in Melbourne. The aim
A large number of scholars have participated to this work on of this research is to analyze real history a transaction
the analysis of house price prediction in the past. The database to gain valuable insights into real estate at a market

@ IJTSRD | Unique Paper ID – IJTSRD42367 | Volume – 5 | Issue – 4 | May-June 2021 Page 795
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
in downtown Melbourne. Looking for useful models to In paper [10] more than 300,000 data points are included in
predict that a house's worth is determined by its location this dataset, which includes 26 variables that reflect housing
given a set of its features. prices exchanged between 2009 and 2018.Analysis of
experimental data is an important step before building a
In paper [4] estimate each parameter according to its value
retrospective model. In this way, the researcher scans to find
in determining the pricing of the system and this has led us
data patterns, which helps to select appropriate methods for
to increase the value saved by each parameter system. We
machine learning. Three types of Machine Learning methods
rated 3 different machines algorithms to learn and test our
include Random Forest, XGBoost, and LightGBM and two
system with a different combination that can guarantee the
machine learning techniques including. Three types of
best possibly the reliability of our results. A program that
Machine Learning methods include Random Forest, XGBoost,
aims to provide accurate predictions housing prices have
and LightGBM and two machine learning techniques
been built. The program makes good use of Linear
including proven to be effective in a variety of computer-
Regression, Forest retreat, magnified magnitude. Efficiency
aided applications (He et al., 2016)
of algorithm added more with the use of Neural Networks.
consider it Mumbai as our main and predictable location III. EXISTING MODEL
prices of real estate in various locations around Mumbai Our findings show how multiple geo-data sources can be
consider a certified database . used in machine learning applications to demonstrate
housing pricing patterns and policy makers to assist in
In paper [5] develop the industrial revolution's fourth stage
understanding human settlements. The sale price of real
over time with the growing popularity of in big data
estate in major cities depends on the specific circumstances.
technology, it had moved importance and rapid development
Although a macro viewpoint of the real estate appreciation
in the field of data science. The dataset contain 171,155
rate could be more useful, consistency of performance is not
property transaction record from 2013 to 2017. Because
the only metric to consider when determining the right
there are missing data of features. The proposed in-
model.
convolution learning model provides a better solution to real
estate price forecasts with higher accuracy and faster IV. PROPOSED MODEL
integration rate, and that the proposed prediction process The proposed model for accurately predicting future
can increase the robustness of the convolutional neural outcomes has applications in real state. The aim of this
network model. statistical study is to aid in our understanding of the the
house-to-house partnership features and how these
In paper [6] There are five standard machine learning
variables are used to forecast house price. To develop the
processes in place used in this study namely Random Forest
proposed method for calculating rates of increase in house
Regressor, Decision
prices generalisation and replicability. The aim of this
Tree Director, Ridge, Lasso and Regular Linear. Before research is to through analyzing a real historical
confirming the predictable results of each, appropriate transactional dataset to derive valuable insight into the
algorithm suspensions are identified first based on training housing market in USA. Our dataset contains a wide range of
database by to call the best estimator method. The dataset critical parameters, and data mining is at the heart of our
contain 15 columns of 4066 records, which 3252 for training system. We started by cleaning up the entire dataset and
and 814 for validation. truncating the outlier values. Similarly, we weighted each
parameter based entirely on its importance in determining
In paper [7] from June 1996 to August 2014, the data
the system's pricing, which resulted in an increase in the
preparation process resulted in 39,554 housing transactions.
weight that each parameter retains in the system. We choose
This collection program can be done inside a single element
three outstanding system learning algorithms and tested our
or by combining features in a very high-end aircraft. Noble
system with excellent combinations that will ensure the
(2006) suggests that the details are more easily separated
accuracy and consistency of our effects [5].Even after that,
from high-resolution spaces because there is a kernel
we used a totally new approach to improve precision. Even
function that can be applied to any data set divided equally.
after that, we took a novel approach to improve the quality
However, locating the kernel is a difficult challenge function. of our survey, which revealed that the real estate fee is often
influenced by local facilities such as a train station, grocery
In paper [8] study provides insight into machine learning
store, college, pharmacy, temple, parks, and so on. And now
skills as well geographical methods of molding in complex
we'd like to present our specific approach to addressing this
urban areas using a huge volume of information from many
need. We use the Google Maps API to narrow down a
geospatial sources. There are21,928 homes, 125,000 house
distance of 0.5 km based on locality seek. Now, if we discover
portraits, and approximately 470,000 street view pictures.
certain public places in the circle, the value of the belongings
In paper [9] with a simple drop in line we try to minimize rises in proportion. We tested this with manual scenarios,
error, and in SVR we try to match his error within something and the results were excellent in terms of prediction
the limit. It is a retrieval algorithm and uses the same accuracy [8].
Support Vector Machines (SVM) retrieval method analysis.
The train data set consists of 11200 records with 9
explanatory variables. In test data set, there were around
1480 records with 9 variables.

@ IJTSRD | Unique Paper ID – IJTSRD42367 | Volume – 5 | Issue – 4 | May-June 2021 Page 796
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
V. SYSTEM ARCHITECTURE A.1 Training vs Validation Accuracy Curve

Figure: A.1.1
Figure:V.I A.2 Training vs Validation Loss Curve
The structure of the structure in this way, which includes
several important elements as an algorithm. Line reversal is
a basic method of prediction. A linear regression algorithm
will be used to estimate house price based on available data
obtained from image use.
VI. METHODOLOGY
In this research paper method is available in the Python
Scikit Library-Learn to do a grid search efficient use of
hyper-parameter tuning in a given machine learning
algorithm. This is a useful way for an inexperienced data
scientist to get suggestions for configuring parameters in
selected algorithms.
A. Dataset Exploration
The data set is split into two sections, The separation has
been aligned to 75:25 ratio of training and testing. Includes
converting raw data to data loader which is then used for Figure: A.2.1
training of the model and getting insight into our dataset. In A.3 Dataset collection
this process, various steps involved like converting the data The dataset is collected from
to data frame, visualizing the dataset, splitting of data frame https://www.kaggle.com/harlfoxem/housesalesprediction .
into training and validation. Between May 2014 and May 2015, homes were sold in the
A.1 Features Description Table area. A good collection of data to test the basic concept of
Name Type Description simple regression models. Before preparing the data we
Id Numerical Uniq Id need to explore the data first. The dataset has 16204 number
Date Numerical Sold Time which is use to train the model. It has number 5404 which is
House Price used for testing.
Price Numerical
(prediction outcome) A.4Data cleansing and exploratory analysis: Check is there
Bedroom Numerical No. of bedroom any null value.
Bathroom Numerical No. of bathroom
#check for nulls in the data
Sqft_living Numerical Living size
houses.isnull().sum()
Sqft_lot Numerical Lot size
Floors Numerical No. of floor
Waterfront Numerical Waterfront size
Condition Categorical Type of house
grade Numerical Rating of house
Sqft_above Numerical Size
Sqft_basement Numerical Size
Yr_built Numerical Built Year(2015-16)
Table: A.1.1

@ IJTSRD | Unique Paper ID – IJTSRD42367 | Volume – 5 | Issue – 4 | May-June 2021 Page 797
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
The errors (or residuals) are the vertical distances
between the discovered values (the real statistics) and
the pleasant suit lane.
the blue line's slope is 1; the intercept (the value of y
when x=0) is 0The most straightforward method of
estimation is linear regression. It uses two variables as
variables: a predictor variable and a variable that is the
most important one first, if the predictor variable and
su. This regression estimates are used to explain the
relationship between one known variable and one or
more unobservable variables. The components describe
the equation of the regression equation with one
dependent and one unbiased variable [6].b = y + x*a,
where a = ranking on the independent variable, y =
normal, x = regression coefficient, and b = approximate
existing variable score.

Figure: A.4.1
B. Model defining
Creating a Linear Regression model, training function and
optimizing the hyper parameters like learning rate, number
of epochs. simple linear regression is a mathematical method
of modeling the correlation between the predictor X and the
Y response variable. It assumes that there may be a direct
correlation between these two variables and we use that to Figure: B.1 Linear regression scatter plot
expect a limited output. simple line deceleration is a
C. Implementation
completely simple way to do supervised mastering. but,
The implementation is done in Python 3. To develop and
while it may be the end of the road and the most
train the Linear Regression model, the Ker as library, which
straightforward, it is an important starting point for all the
operates on top of Tensor flow, is used. The Anaconda
regression strategies, so it is important to fully understand
package manager is used to import all dependencies.
what it is about for miles. It is also widely used and smooth
translation: it helps to gain a better understanding of the D. Classification
connection between feedback and prediction. Includes prediction on the test dataset, visualizing the
Mathematically, we are able to document these specific performance of our model and seeing how our model is
relationships as: y = -0 + β1x + e; therey output flexibility performing.
(also called response, targeting or systematic flexibility). e.g.
housing costsx input variables (also called active, descriptive D.1 Any correlations between variables:
or non-descriptive) e.g. the height of the house is square
meters β0 capture (price y when x = 0)β1 is the coefficient x
and the slope of the return line (“normal boom in Y
associated with the increase of one unit in X”)e is a time of
error when using linear regression, the algorithm produces
the best live path using the coefficients version zero and β1,
such as miles as close as possible to the actual facts (reduces
the ratio of distances twice between all data and road). as
soon as we find ero zero and β1 we will use the model to
wait for a response.

The blue line — line of first-class stable — is the line Figure:D.1.1

that minimises the number of squared defects.
The black circles are the found values of x and y (actual
facts).

@ IJTSRD | Unique Paper ID – IJTSRD42367 | Volume – 5 | Issue – 4 | May-June 2021 Page 798
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
VII. RESULT prediction. 2017 IEEE International Conference on
Industrial Engineering and Engineering Management
(IEEM),.
[3] Phan, T., 2018. Housing Price Prediction Using
Machine Learning Algorithms: The Case of Melbourne
City, Australia. 2018 International Conference on
Machine Learning and Data Engineering (iCMLDE),.
[4] Varma, A., Sarma, A., Doshi, S. and Nair, R., 2018.
Figure: VII.I Final outcome House Price Prediction Using Machine Learning and
VIII. CONCLUSION Neural Networks. 2018 Second International
A system has been developed that seeks to provide reliable Conference on Inventive Communication and
Computational Technologies (ICICCT),.
forecasts for housing prices, and allows for the optimal use of
the Linear Regression algorithm. The use of neural networks [5] Piao, Y., Chen, A. and Shang, Z., 2019. Housing Price
has improved algorithm performance even more. Customers Prediction Based on CNN. 2019 9th International
will be content because the device have the right results and Conference on Information Science and Technology
eliminate the chance of finding the wrong structure. In asset (ICIST),.
analysis, new mechanical calculation methods can be used.
Compared to a simple model, we can use one to predict house [6] [6]Masrom, S., Mohd, T., Jamil, N., Rahman, A. and
price. Baharun, N., 2019. Automated Machine Learning
based on Genetic Programming: a case study on a real
IX. FUTURE WORK house pricing dataset. 2019 1st International
The machine's accuracy should be improved. As the gadget's Conference on Artificial Intelligence and Data Sciences
scale and computing power grow, it would be possible to add (AiDAS),.
even more cites. Furthermore, we will incorporate
exceptional UI/UX techniques for improved simulation of the [7] Ho, W., Tang, B. and Wong, S., 2020. Predicting
outcomes in a more interactive way by the use of Augmented property prices with machine learning
truth [eleven]. Also, a mastering device may be developed to algorithms. Journal of Property Research, 38(1), pp.48-
collect user feedback and documents so that the gadget 70.
would display the most suitable effects to the user based on [8] Kang, Y., Zhang, F., Peng, W., Gao, S., Rao, J., Duarte, F.
his choices. and Ratti, C., 2020. Understanding house price
Acknowledgment appreciation using multi-source big geo-data and
I should to pass on my genuine propensity and commitment machine learning. Land Use Policy, p.104919.
to Dr MN. Nachappa and Asst. Prof: Dr Ganesh D and [9] Manasa, J., Gupta, R. and Narahari, N., 2020. Machine
undertaking facilitators for their compelling steerage and Learning based Predicting House Prices using
steady motivations all through my evaluation work. Their Regression Techniques. 2020 2nd International
optimal bearing, total co-activity and second insight have Conference on Innovative Mechanisms for Industry
made my work productive. Applications (ICIMIA),.
References [10] Truong, Q., Nguyen, M., Dang, H. and Mei, B., 2020.
[1] Oladunni, T. and Sharma, S., 2016. Hedonic Housing Housing Price Prediction via Improved Machine
Theory — A Machine Learning Investigation. 2016 Learning Techniques. Procedia Computer Science, 174,
15th IEEE International Conference on Machine pp.433-442
Learning and Applications (ICMLA),.
[2] Lu, S., Li, Z., Qin, Z., Yang, X. and Goh, R., 2017. A
hybrid regression technique for house prices