XG Boost

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

www.ijcrt.

org © 2023 IJCRT | Volume 11, Issue 4 April 2023 | ISSN: 2320-2882

Property Price Prediction Engine Using XGBoost


Regression
1Himanshu Wankar, 2Kalpesh Dimble, 3Pratiksha Dasgaonkar, 4Vaishnavi Chavan, 5Ayesha
Sayyad
(1234 UG Students,5 Guide, Department Of Information Technology, Trinity College Of Engineering And Research
Pune, India - 411046)
ABSTRACT: Real estate is a significant investment, INDEXED TERMS: Real estate, investment, machine
and it is essential to know the value of a property before learning techniques, predicting, housing prices,
investing hard-earned money. Machine learning XGBoost regression algorithm, ensemble method,
techniques have been increasingly used to predict real decision trees, non-linear relationships, , mean
estate prices in megacities like Mumbai, Chennai, absolute error, mean squared error, R-squared.
Bangalore, and Pune. This paper focuses on the
XGBoost regression algorithm, a powerful technique
that can be used to predict housing prices with high I. INTRODUCTION
accuracy. The XGBoost algorithm is an ensemble
method that combines multiple decision trees to create The real estate sector plays a significant part in the
a more robust and accurate model. It is particularly profitable development of any country. The property
useful in handling non-linear relationships between prices have a substantial impact on the Gross Domestic
input variables and real estate prices. The algorithm can Product( GDP) of a nation. In India, the real estate
efficiently process large datasets with multiple request has witnessed a tremendous growth over the
variables, making it an excellent tool for real estate times, and the property prices in major metropolises
prediction. This study highlights the importance of the like Mumbai, Chennai, Bangalore, and Pune have been
XGBoost algorithm in predicting real estate prices constantly adding . still, the high property prices make
accurately. It examines various input variables, it delicate for investors to make informed opinions
including carpet area, number of bedrooms and baths, while investing in the real estate request. Inaccurate
balcony, amenities, and area type, to build a reliable prognostications of property prices can affect in
model. The XGBoost algorithm is evaluated based on significant losses for investors, affecting the growth of
various performance metrics, such as mean absolute the real estate request and, in turn, impacting the Gross
error, mean squared error, and R-squared, to assess its Domestic Product( GDP) of the country. The accurate
accuracy and effectiveness. The results show that the vaticination of property prices is pivotal for investors,
XGBoost algorithm outperforms other machine as it enables them to make informed opinions while
learning techniques such as linear regression, decision investing in the real estate request. Traditional styles of
trees, random forest, and neural networks, in predicting prognosticating property prices, like retrogression
real estate prices. It can efficiently handle complex analysis, have limitations in dealing with complex and
non-linear relationships and accurately predict the non-linear connections between input variables and
prices of properties in megacities like Mumbai, property prices. Hence, machine literacy algorithms
Chennai, Bangalore, and Pune. In conclusion, this like XGBoost retrogression have been decreasingly
study demonstrates the effectiveness of the XGBoost used to prognosticate property prices directly.
algorithm in predicting real estate prices. The XGBoost is an ensemble system that combines
algorithm can help investors make informed decisions multiple decision trees to produce a more robust and
by providing accurate predictions based on various accurate model. It's particularly useful in handling on-
input variables. The study emphasizes the importance linear connections between input variables and
of using advanced machine learning techniques like property prices, making it an excellent tool for real
XGBoost for real estate investments, especially in estate vaticination. The XGBoost algorithm can
megacities where property prices are highly volatile.
IJCRT2304920 International Journal of Creative Research Thoughts (IJCRT) h190
www.ijcrt.org © 2023 IJCRT | Volume 11, Issue 4 April 2023 | ISSN: 2320-2882
efficiently reuse large datasets with multiple variables, II. LITERATURE REVIEW:
making it a important fashion for prognosticating
The real estate industry is rapidly growing, and the
property prices.
evaluation and prediction of property prices using
This paper aims to punctuate the impact of property mathematical modelling and scientific methods have
prices on the GDP of India and how machine literacy become an urgent need for decision-making by all
algorithm XGBoost retrogression can play a pivotal involved stakeholders. In this context, the application
part in prognosticating property prices directly. We of machine learning algorithms for property price
estimate colorful input variables, including carpet area, prediction has gained significant attention in recent
number of bedrooms and cataracts, deck, amenities, years. This literature review examines various studies
and area type, to make a dependable model. We'll also that have used regression analysis and machine
examine the performance of the XGBoost algorithm learning algorithms, such as linear regression, Random
compared to other machine learning algorithms like Forest, and CNN, to predict housing prices in different
direct retrogression, decision trees, arbitrary timber, cities worldwide. The review also highlights the
and neural networks, to assess its delicacy and importance of considering various factors, such as
effectiveness. We'll estimate the XGBoost algorithm physical conditions, locations, and topographical
grounded on colorful performance criteria like mean features, when developing property price prediction
absolute error, mean squared error, and R- squared to models. The findings of these studies can guide
dissect its effectiveness in prognosticating property stakeholders in making informed decisions in the real
prices. The study aims to emphasize the significance of estate industry.
using advanced machine literacy ways like XGBoost
[1] Nyan Chen's research article "House Price
for real estate investments, especially in megacities
Prediction Model of Zhaoqing City Based on
where property prices are largely unpredictable.
Correlation Analysis and Multiple Linear Regression
Accurate vaticination of property prices can help
Analysis" discusses the use of multiple linear
investors make informed opinions and contribute to the
regression and Pearson coefficient correlation analyses
growth of the real estate request, which, in turn, can
to predict house prices in Zhaoqing City from 2010 to
have a positive impact on India's GDP.
2018. The study identified variables that have a strong
Impact of Property Prices on Gross Domestic Product( relationship with house prices, which were then
GDP) of India: The real estate sector has a significant examined using a multiple linear regression model to
impact on the Gross Domestic Product( GDP) of India. calculate the goodness of fit R2. The difference
It's one of the swift- growing sectors in the country and between the projected and actual house prices for the
has contributed to the growth of the Indian frugality years 2019 and 2020 was calculated using the multiple
significantly. The real estate sector contributes linear regression model formula, yielding |D|. The
around7.7 to India's GDP and is anticipated to reach$ 1 goodness-of-fit R2 value and the difference between
trillion by 2030. The growth of the real estate sector has the projected and actual house prices of house prices
a multiplier effect on frugality, as it generates |D| were combined to observe the prediction effect.
employment openings and promotes the growth of This approach of using linear regression models to
other sectors like cement, sword, and construction. The predict housing prices has been used in different cities
increase in property prices in major metropolises like around the world by scholars in recent years.[4] N.
Mumbai, Chennai, Bangalore, and Pune has redounded Ghosalkar and S. N. Dhage, “Real estate value
in the growth of the real estate sector. still, the high prediction using linear regression,”: Ghosalkar and S.
property prices have also made it delicate for investors N. Dhage analyzed the impact of three key factors-
to make informed opinions while investing in the real physical conditions, ideas, and locations- on home
estate request. The accurate vaticination of property prices in Mumbai, India. The researchers opted to
prices is pivotal for investors, as it enables them to employ linear regression to predict house prices for the
make informed opinions while investing in the real selected region. Notably, the researchers did not
estate request. consider market price or cost growth in their analysis.
By ignoring these variables, Ghosalkar and Dhage's
Machine Learning Algorithm XGBoost Regression:
model focuses solely on the impact of the three
Machine learning algorithms like XGBoost regression
identified factors, providing insights into their relative
have been increasingly used to prognosticate property
importance in determining property values in
prices directly. XGBoost is an ensemble system that
Mumbai.[2] Aditi Mahale, “Housing Price Prediction
combines multiple decision trees to produce a more
Using Supervised Learning”: explained that it is
robust and accurate model. It's particularly useful in
possible to predict the buying and selling prices of real
handling non-linear data.
estate properties by considering factors such as
location, living space, number of rooms, and other
related factors. She also mentioned that topographical
features, including the nearest police and fire station,
IJCRT2304920 International Journal of Creative Research Thoughts (IJCRT) h191
www.ijcrt.org © 2023 IJCRT | Volume 11, Issue 4 April 2023 | ISSN: 2320-2882
were taken into account. To achieve this prediction, property price prediction model. The implementation
Mahale used an amalgamation of Random Forest and of the XGBoost regression model involves training the
CNN methods. [13] Nihar Bhagat, Ankit Mohorkar, model using the selected dataset and evaluating its
and Shreyas Mane, "House Price Forecasting using performance using appropriate metrics such as RMSE,
Data Mining": They used the linear regression MAE, and R-squared. The model is then tested on a
algorithm to predict property values and identify the separate test set to ensure its generalizability.
factors that influence them. In order to make accurate
The results of the study are analyzed and presented in
predictions, they analyzed real-time data. [10] R
a way that is easily understandable to the stakeholders
Manjula,“Real estate value prediction using
in the real estate industry. The findings of the study
multivariate regression models”: suggests that the
have implications for the real estate industry, as they
prices of homes are influenced by multiple variables.
can be used to inform decision-making processes, such
To construct a reliable prediction model, various
as pricing strategies and property investments. Overall,
features can be used, and these features can be derived
the research design and implementation of this study
from various sources. One notable study on feature
involves selecting a suitable dataset, identifying the
extraction used visual features to forecast house values.
variables to be used in the model, pre-processing the
This involved grouping houses with similar features
data, using the XGBoost regression algorithm to build
and pricing through clustering. [15] V.Sampathkumar,
the model, and evaluating its performance. The
“Forecasting the land price using statistical and neural
findings of the study have the potential to contribute to
network software” historical trends are used to predict
the development of more accurate and reliable property
future land prices. These trends are analyzed to
price prediction models in the real estate industry. And
determine the rate of growth or decline. Additionally,
the whole model is imported into the flask server and
economic factors may be incorporated into the analysis
then converted into an interactive web page using
to establish a more accurate relationship. A survey
Html, CSS , javascript and Nginx for interactive
conducted by 99acres.com was also referenced in the
interaction with end consumers.
study. [7] S. Raheel. "Choosing the right encoding
method-Label vs One hot encoder" it is explained that XGBoost is an ensemble learning algorithm that uses
one hot encoding is a method of dividing a column with decision trees to make predictions. It can be used for
categorical data that has already been label encoded regression problems by minimizing a loss function that
into multiple columns. The values in these columns are measures the difference between the predicted and
then converted to 1s or 0s, depending on which column actual target values. The mathematical model for
contains the value. This article was published in XGBoost regression can be expressed as:
Toward Data Science.
y = f(x)
where y is the predicted property price, x is the vector
III. METHODOLOGY of input features (such as square footage, number of
bedrooms, etc.), and f(x) is the XGBoost model that
The research design for the property price prediction
predicts y based on x.
engine using the XGBoost regression algorithm
involves selecting a suitable dataset for the study, To compute f(x), XGBoost builds an ensemble of
identifying the variables to be used in the model, and decision trees that are trained to minimize the mean
explaining the steps involved in data pre-processing squared error (MSE) loss function. The model
and feature selection. In this study, a dataset containing combines the predictions from multiple decision trees
information about various properties, such as location, to arrive at a final prediction.
size, age, and amenities, is used. The variables used in
the model include the property location, size, number The general form of the XGBoost regression model can
of bedrooms and bathrooms, age, and amenities. The be expressed as:
data pre-processing step involves cleaning the data for y = Σ(k=1 to K) fk(x)
further analysis and performing exploratory data
analysis on the data set. This involves handling missing where fk(x) is the prediction of the k-th decision tree,
values, handling outliers, and converting categorical and K is the total number of decision trees in the
variables into numerical form. Feature selection is done ensemble. The prediction of each tree is a weighted
to identify the most important variables that have the sum of the leaf values of the tree, which are learned
most significant impact on property prices. The during training.
XGBoost regression algorithm is a powerful and
The prediction of the XGBoost model for a given input
widely used algorithm for regression problems. It is a
x is obtained by summing the predictions of all the
variant of gradient boosting that uses a combination of
decision trees in the ensemble. The weights used in the
decision trees to make predictions. In this study, the
weighted sum are determined during training by
XGBoost regression algorithm is used to build the
minimizing the MSE loss function.
IJCRT2304920 International Journal of Creative Research Thoughts (IJCRT) h192
www.ijcrt.org © 2023 IJCRT | Volume 11, Issue 4 April 2023 | ISSN: 2320-2882
Mathematical model Based on these results, we can see that XGBoost
regression algorithm outperforms the other three
Let X be a matrix of size n x 5, where n is the number
algorithms with the highest score of 0.9119623 and the
of training examples, and each row corresponds to a
same RMSE of 47.093442 as Random Forest. This
single training example. The five columns of X
suggests that XGBoost is better suited for predicting
correspond to the features: carpet area (in square feet),
property prices using the given features than Linear
total BHK, number of balconies, number of bathrooms,
Regression ,Lasso Regression,SVM and Random
and area type (categorical variable encoded as a
Forest.
numerical value). Let y be a vector of size n, where
each element corresponds to the target property price
for a single training example. The XGBoost regression
model for predicting property prices can be defined as
follows:
i. Define a base prediction for each training example
as the mean property price of the entire training set:
base_prediction_i = mean(y) for i in range(n)
ii. Define the objective function for the XGBoost
regression model as the sum of squared errors
(SSE) between the true property prices and the
predicted property prices:
Fig 1: indicates the comparison of different Ml
objective = SSE = sum((y - base_prediction)^2)
algorithms
iii. Train the XGBoost model by iteratively adding
The machine learning model is converted into web site
decision trees to the base prediction. At each
using HTML, CSS, Javascript, and Nginx. On entering
iteration, the objective function is optimized by
the Area(Square foot), the number of BHK, Total Bath,
fitting a new decision tree to the residual errors
and Location are taken as input to estimate the price of
(i.e., the difference between the true property
the property. The below image shows the implemented
prices and the current predicted property prices):
system
prediction_i = base_prediction_i + learning_rate *
sum(tree_j(x_i)) for i in range(n), j in
range(num_trees)
where prediction_i is the predicted property price for
the i-th training example, tree_j(x_i) is the prediction
of the j-th decision tree for the i-th training example,
and learning_rate is a hyperparameter that controls the
contribution of each decision tree to the final
prediction.
iv. The prediction for a new input feature vector x is
Fig 2: showing the implementation and results of the
the sum of the base prediction and the predictions
proposed model.
of all the decision trees:
ŷ = mean(y) + learning_rate * sum(tree_j(x)) for j in
range(num_trees) V. CONCLUSION :
The optimal values for the hyperparameters (e.g., XGBoost regression algorithm has proven to be an
learning rate, number of trees, depth of each tree) can effective tool for property price prediction. Through
be found using techniques such as grid search, random the use of advanced optimization techniques, it can
search, or Bayesian optimization. handle large datasets and complex feature interactions,
making it ideal for real estate applications. By training
on historical data and predicting future property prices,
IV. RESULTS: this algorithm can provide valuable insights to real
estate professionals, investors, and homeowners.
To compare the performance of XGBoost regression However, it's important to note that accurate property
algorithm with other machine learning algorithms, we price prediction depends on several factors, such as the
need to train and evaluate the models on the same quality and quantity of data, the choice of features, and
dataset using the same evaluation metric. the model's hyperparameters. Therefore, it's crucial to
IJCRT2304920 International Journal of Creative Research Thoughts (IJCRT) h193
www.ijcrt.org © 2023 IJCRT | Volume 11, Issue 4 April 2023 | ISSN: 2320-2882
use best practices and continually monitor the model's Conference on Industrial Engineering and Engineering
performance to ensure reliable predictions. Overall, Management: 2017
XGBoost regression algorithm is a powerful tool for
[13] Nihar Bhagat, Ankit Mohorkar and Shreyas
property price prediction that can aid decision-making
Mane,” House Price Forecasting using Data Mining”
in the real estate industry.
International Journal of Computer Applications, 2016
[14] R.Victor” Machine learning project: Predicting
REFERENCES: Boston house prices with regression”
[1] Ningyan Chen, Research Article “House Price [15] V.Sampathkumar,” Forecasting the land price
Prediction Model of Zhaoqing City Based on using statistical and neural network software” 3rd
Correlation Analysis and Multiple Linear Regression International Conference on Recent Trends in
Analysis” Business School, University of Aberdeen, Computing 2015 (ICRTC-2015)
Aberdeen, UK 2022
[2] Aditi Mahale” House price prediction using
supervised learning” 3rd International Conference on
Advances in Engineering, Technology & Business
Management (ICAETBM-2022)
[3] Siddhant Burse and DhritiAnjaria, “Housing Price
Prediction Using Linear Regression” 2021 JETIR
October 2021, Volume 8, Issue 10
[4] Ghosalkar and Dhage, “Real estate value prediction
using linear regression,” in 2018 Fourth International
Conference on Computing Communication Control
and Automation (ICCUBEA), pp. 1–5, Pune, India,
2018.
[5] S.Neelam and G. Kiran,” 5 Valuation of house
prices using predictive techniques, Internal Journal of
Advances in Electronics and Computer Sciences.”
[6] S.Abhishek, “Ridge regression vs Lasso, How these
two popular ML Regression techniques work”
Analytics India magazine,2018.
[7] S.Raheel,“Choosing the right encoding method-
Label vs One hot encoder.” Towards data science,2018
[8] R.E.Febrita, A. N. Alfiyatin, H. Taufiq, and W. F.
ahmudy,” Data-driven fuzzy rule extraction for
housing price prediction in Malang, East Java,” 2017
Int. Conf. Adv. Comput. Sci. Inf. Syst. ICACSIS 2017,
vol. 2018-Janua, pp. 351–358, 2018, DOI:
10.1109/ICACSIS.2017.8355058.
[9] G.Gao et al, “Location-Centred House Price
Prediction: A Multi-Task Learning Approach” pp. 1–
14, 2019
[10] R Manjula,” Real estate value prediction using
multivariate regression models”, IOP Conf. Series:
Materials Science and Engineering 263 (2017)
[11] Rawat T, Khemchandani V, “Feature Engineering
(FE) Tools and Techniques for Better classification
Performance” International Journal of Innovations in
Engineering and Technology (IJIET). 2017 April; 8(2)
[12] Lu.Sifei et al,” A hybrid regression technique for
house prices prediction” In Proceedings of IEEE
IJCRT2304920 International Journal of Creative Research Thoughts (IJCRT) h194

You might also like