House Resale Price Prediction Using Classification Algorithms
House Resale Price Prediction Using Classification Algorithms
Abstract— Now a days house resale is majorly seen in metro cities. home. This approach has recently been used as the key kernel
The market demand for housing is always increasing every year due for the Kaggle Challenge “House prices: Advanced techniques
to increase in population and migrating to other cities for their
for regression”.Muhammad Fahmi Mukhlishin et.al, [2] uses
financial purpose. Prediction of house resale price for long-term
temporary basis is important especially for the people who stays several methods to predict the value of land and house. This
who will stay the long time period but not permanent and the people paper compares Fuzzy logic, Artificial Neural Network, and
who do not want to take any risk during the house construction. In K-Nearest Neighbor to find the most appropriate method to
this paper, the resale price prediction of the house is done using determine the sellers ' price. Atharva choogle et.al, [3] House
different classification algorithms like Logistic regression, Decision price forecasting has been introduced using data mining
tree, Naive Bayes and Random forest is used and we use AdaBoost
techniques. It provides a description of the prediction markets
algorithm for boosting up the weak learners to strong learners.
Several factors that are affecting the house resale price includes the and also the current markets that help to make useful
physical attributes, location as well as several economic factors predictions in understanding the market. It is therefore,
persuading at that time. Here we consider accuracy as the necessary to predict the efficient pricing of real estate
performance metrics for different datasets and these algorithms are customers for their budgets and priorities.
applied and compared to discover the most appropriate method that Ruth Erna Febrita et.al,[ 4 ] presents a Data driven Fuzzy Rule
can be used the reference for determining the resale price by the
Extraction for Prediction of Housing Prices in Malang, East Ja
sellers.
Index Terms— House Resale Price, Prediction, Random Forest, va In this way, the K Means clustering method is used to extra
AdaBoost, Naïve Bayes. ct initial values to form fluid membership functions and infere
nce rules for several residential groups.The main objective of
I. INTRODUCTION Lim WT et.al, [ 5 ] is to compare the predictive performance o
f artificial neural networks with auto regressive integrated mov
ing average and multiple regression analysis for condominium
The value of a home is well known as a combination of a large
prices in Singapore.Tan F, Cheng C et.al, [ 6 ] presents a Time
variety of options. Therefore, the prediction of home price
aware Latent Hierarchical Model for Predicting House Prices
presents a novel set of challenges. Though an oversized variety
In this context, they introduced a latent hierarchical time aware
of space units are dedicated to the present task, their
model to capture the underlying spatiotemporal interactions b
performance and applications are restricted by a really long
ehind the evolution of house prices. Banerjee D et.al, [7]
delay within the handling of data, the dearth of real-world
introduced to Predicting the
settings and so the inadequacy of choices for housing. Our aim
Housing value prediction exploitation machine Learning
is to predict the value of the house marketing mistreatment
Techniques. During this, the performance of the machine
classification techniques during this paper. Most of the present
learning techniques is measured by the four parameters of
studies have concentrated on breakdown the distraction of the
accuracy, precision, specificity, and sensitivity. The work
prediction of house costs. Several theories are born as an
considers 2 distinct values zero and one as individual
outcome of the analysis work committed by completely
categories. Wang JJ et.al, [8] had Predicting
different researchers around the world. This paper picks up the
House value with a memristor-based Artificial Neural
most recent prediction analysis to assist economic predictors to
Network. In this, they'd designed a man-made neural
use it. It provides a summary of the prediction markets and,
network supported memristors is to find out a multivariable
together, the present markets that build it easier to predict the
regression model with back-propagation formula. Febrita
market.
RE et.al, [9] presents Data-driven Fuzzy Rule Extraction
for Housing value Prediction in Malang, East Java. In
II. RELATED WORK this, they're used KMeans agglomeration technique to
Sifei Lu et.al, [1] introduced a hybrid model for the regression extract initial values to make fuzzy membership functions
of Lasso and Gradient to predict the price of the individual and abstract thought rules of many teams of residential.
The resale price data of the house was taken from the website
of Kaggle. This data set contains various parameters such as
house price, storage range, floor size details, etc.
Fig. 1. Structure of Proposed Methodology
TABLE I
SPECIFICATIONS OF DATA SET The proposed methodology consists of three phases.
• Pre-processing
• Modeling
• Resale price prediction
Phase 1: Pre-processing
In this phase, we carry out the encode variables, outer
values, input missing values and take every feasible action that
can eliminate disparity in the data set and it partition the given
data set into a training data and a test data.
Phase 2: Modeling
This phase uses different classification algorithms like
Decision tree, Random forest, AdaBoost, Naïve Bayes,
Logistic regression etc. These algorithms give better
performance for classification.
Phase 3: Price Prediction
After we get the results from classification we predict the
house sale prices and result analysis will also be performed.
Step 1: Convert the data set into a frequency table Logistic Regression: Accuracy is constant between 0.45 to
0.55 threshold (Accuracy: 80.5-81.4)
Step 2: Create a Likelihood table by finding the probabilities
Naïve Bayes: More towards predicting High Sensitivity
Step 3: Now, use a Naive Bayesian equation to calculate the
posterior probability for each class. The class with the highest Decision Tree C5.0: Concentrated on Generating rules, Gives
posterior probability is the outcome of prediction. best results (Accuracy, TNR, TPR, above 92%)
Ada boost Classification Ada Boost: More towards predicting High Sensitivity
.
Step 1: It fits a sequence of weak learners on different
weighted training data.
Step 3: Each node always splits into two other nodes. The
depth of tree (levels) can be specified.
Step 4: The minimum number of samples is presented in each Fig. 3. Comparing the resale price with a target
node of the tree.
REFERENCES
[1] Lu, Sifei, et al. "A hybrid regression technique for house prices
prediction." Industrial Engineering and Engineering Management
(IEEM), 2017 IEEE International Conference on. IEEE, 2017
[2] Mukhlishin, Muhammad Fahmi, Ragil Saputra, and Adi Wibowo.
"Predicting house sale price using fuzzy logic and K-Nearest
Neighbor." Informatics and Computational Sciences (ICICoS), 2017 1st
International Conference on. IEEE, 2017
[3] Atharva chogle1, Priyanka khaire2, Akshata gaud3, Jinal Jain ”House
Price Forecasting using Data Mining Techniques”, International Journal
of Advanced Research in Computer and Communication Engineering,
Vol. 6, December 2017
[4] Febrita, Ruth Ema, et al. "Data-driven fuzzy rule extraction for housing
price prediction in Malang, East Java." Advanced Computer Science
and Information Systems (ICACSIS), 2017 International Conference on.
IEEE, 2017
[5] Lim WT, Wang L, Wang Y, Chang Q. Housing price prediction using
neural networks. In Natural Computation, Fuzzy Systems and
Knowledge Discovery (ICNC-FSKD), 2016 12th International
Conference on 2016 Aug 13 (pp. 518-522). IEEE.