0% found this document useful (0 votes)
27 views12 pages

Survey Paper Updated

This document discusses using machine learning algorithms to predict real estate prices in Gurgaon, India. It reviews different machine learning models and their performance on housing price prediction tasks. The study aims to develop an accurate predictive model for Gurgaon property prices based on features like location, size, and amenities by applying regression techniques such as linear regression, Lasso regression, and XGBoost regression.

Uploaded by

guptaroshan0264
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views12 pages

Survey Paper Updated

This document discusses using machine learning algorithms to predict real estate prices in Gurgaon, India. It reviews different machine learning models and their performance on housing price prediction tasks. The study aims to develop an accurate predictive model for Gurgaon property prices based on features like location, size, and amenities by applying regression techniques such as linear regression, Lasso regression, and XGBoost regression.

Uploaded by

guptaroshan0264
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

“GURGAON REAL ESTATE PRICE PREDICTIONS

AND OPTIMAL FLAT RECOMMENDATIONS”


1st SANTOSH RAI, 2nd SAURAV SHARMA 5th SHRUTI PATHAK
3rd SOMIL SINGH, 4th VIKAS SINGH Assistant Professor
Computer Science and Engineering Computer Science and Engineering
Greater Noida Institute of Technology Greater Noida Institute of Technology
Greater Noida, India Greater Noida, India
[email protected] [email protected]

ABSTRACT
This paper explores the application of machine learning algorithms for predicting real
estate/house prices in Gurgaon. Predictive systems for house prices play a major role in helping
individuals find homes that align with their preferences and requirements. The primary objective
of house price prediction is to present a curated selection of properties tailored to the customer's
criteria, which may include factors such as location, amenities, and technological features.

This study reviews the utilization of various machine learning algorithms on diverse datasets and
aims to implement a prediction engine for real-life applications. Real estate investment, along
with other avenues such as gold and shares, is a popular choice due to its potential for significant
returns. Understanding housing price trends is essential for both sellers and buyers, providing
insights into current market conditions.

Various factors influence housing prices, including the number of bedrooms, location, and
square footage. Proximity to essential amenities such as schools, parks, and employment centre
can also impact house prices. This study focuses on Pune as a case study location and seeks to
develop a model for real-time house price prediction in the surrounding areas. The analysis
utilizes datasets
from prominent real estate websites, including 99acres.com, magicbricks.com, and
nobroker.com, incorporating features such as area, bedrooms, and bathrooms. Regression
methods, including MLR (OLS), Lasso, and XG Boost, are employed to develop predictive
models, with the most accurate model selected based on a comparative evaluation of their
performance.

Keywords: Real Estate, House Price Prediction, OLS Regression, Gradient Boosting, Deep
Learning
1. INTRODUCTION
Machine Learning (ML) plays a pivotal role in modern business and research, continually
enhancing the performance of computer systems through the utilization of algorithms and neural
network models. ML algorithms autonomously construct mathematical models using sample
data, known as "training data," to make decisions without explicit programming for those
decisions. Whether for personal or business purposes, individuals and real estate agencies engage
in buying and selling houses, aiming to fulfil specific needs and expectations. However, the
challenge of accurately valuing properties persists, with inadequate measures for detecting
overvaluation or undervaluation in housing markets.

Traditional metrics, such as house/real estate price-to-rent ratios, offer initial assessments.
Nonetheless, a comprehensive analysis and informed judgment are required to address this issue
effectively. Machine Learning emerges as a solution by training models with extensive datasets,
enabling accurate price predictions tailored to diverse requirements. Machine learning algorithms
have demonstrated efficacy across various domains, including medical imaging, spam and fraud
detection, automotive advancements, safety protocols, and business analytics.

This study focuses on leveraging machine learning techniques to analyse housing prices, taking
into account the dynamics of real estate markets and consumer demand. Data serves as the
cornerstone of problem analysis, providing valuable insights in a digestible format. While
owning a home is a lifelong aspiration for many, common pitfalls such as overpaying for
properties can occur. Developing precise models for determining property prices in urban areas
like Gurugram remains challenging, given the complexity of factors such as location, size, and
features.

This research, centred on Gurugram, aims to construct a predictive model for house prices in the
region, utilizing public datasets from realtor websites like 99acres.com. Parameters such as 'area,'
'bedrooms,' and 'bathrooms' are considered in the dataset, comprising nine features. The study
endeavours to develop a model that forecasts prices based on influencing factors. Regression
techniques, including multiple linear regression (OLS), Ridge/Lasso regression, and Extreme
Gradient Boost Regression (XG Boost), are employed and compared for accuracy, with the
model exhibiting the least error identified for its predictive capabilities.

2. LITERATURE SURVEY

1. In the proposed model of machine learning, the dataset is divided into two corridors namely
Training and Testing. 80% of the data is used for training purposes and 20% is used for testing.
The training set includes a target variable. The model is trained by using colourful machine
learning algorithms, out of which random timber retrogressions prognosticate better results. For
enforcing Algorithms, they've used Python Libraries NumPy and Pandas. [1]
2. Manasa and Gupta have taken Bengaluru as a megacity for case study. The property size in
square bases, position, and its installation are all crucial aspects affecting cost. In this, nine
different attributes are used. The Multiple direct retrogression (Least Places), Lasso/Ridge
retrogression, SVM, and XGBoost are used for experimental work.

3. Sawant and Jangid [3] indicate that in the coming decade, India’s casing request is anticipated
to increase at the rate of 30-35 percent. It's only an alternative to the husbandry assistance in
terms of job creation. Pune is an excellent place where people can spend in landholdings. The
inconsistency in casing valuation is a challenge for a house buyer. Estimated price must be a
palm-palm midpoint for both the dealer and the buyer. This will confirm whether the price is
undervalued or overrated. To do this, colourful features from the set of features are picked as
input, while using algorithms similar as Decision Tree and bagging ways similar as Random
Forest.

4. Madhuri etal. [4] concentrated to anticipate house prices grounded upon their fiscal capacity
and objectives in a nonstop manner, for people looking for their first implicit house. Prospective
prices will be reduced by assessing the previous wares, rental ranges, and forthcoming
developments. Multiple direct, Ridge, LASSO, Elastic Net, grade boosting, and Ada Boost
Retrogression are among the retrogression ways employed during work. Physical situations,
conception, and position were duly considered while estimating.

5. Lim et al. [5] compared the vaticination performance of the ANN model, i.e., the multilayer
perceptron, with that of the ARIMA model in prognosticating the Singapore casing request. To
anticipate implicit condominium price indicators, the further superior model is applied (CPI).
The ANN model’s reduced mean square error (MSE) stressed its superiority over all other
vaticination models.

6. Nowadays, Real Estate has become more than a necessity which represents something more.
Not only for people looking to buy Real Estate but also the companies that sell these Estates.
According to [6] Real Estate, Property is not only the basic need of a man but also represents the
riches and prestige of a person. Investing in property can be profitable because their value tends to
increase over time.

7. Fluctuations in real estate prices have significant implications for a variety of stakeholders,
including household investors, bankers, and policymakers. As such, the real estate market
remains a compelling investment opportunity. Consequently, forecasting real estate values serves
as a crucial economic indicator [7] suggests that every single organization in today’s real estate
business is operating fruitfully to achieve a competitive edge over alternative competitors.

8. Simplifying the process for the average person while achieving the best results is crucial. [8]
proposed using machine learning and AI to develop an algorithm that predicts housing prices
based on specific inputs. This algorithm can be used by classified websites to estimate property
prices accurately, bypassing the need for customer-provided prices and reducing errors in the
system.

9. [9] used Google Collab/Jupiter IDE. Jupiter IDE is an open-source web app that helps us to
share as well create documents that have Live Code, visualizations, equations, and text that
narrates. It contains tools for data cleaning, data transformation, simulation of numeric values,
modelling using statistics, visualization of data, and machine learning tools.

10. [10] Developed a system to help users find accurate real estate prices. By inputting their
requirements, users receive price estimates for their desired homes. Additionally, the system
provides sample house plans for reference.

11. In [11], housing values in a Boston suburb are analyzed and forecasted using SVM, LSSVM,
and PLS methods along with their corresponding characteristics. After eliminating missing
samples from the original dataset, 400 samples are used as training data and 52 samples as test
data to predict housing values.

12. According to [12], the Random Forest Regressor achieved the highest accuracy, followed by
the Decision Tree Regressor. Ridge and Linear Regression produced similar results, with only a
slight decrease in accuracy for Lasso. Across different feature selection groups, the differences in
performance were minimal, indicating that buying prices alone can effectively predict selling
prices without overfitting the model. However, a noticeable drop in accuracy was observed with
very weak features. This trend was also reflected in the Root Mean Square Error (RMSE) for all
feature selections.

13. [13] noted that preparing their dataset took more than a day. Instead of performing
computations sequentially, using multiple processors to parallelize the computations could
significantly reduce both preparation and prediction times. Additionally, by enhancing the
model's functionalities, users could have the option to select a region to generate heat maps,
rather than manually entering data.

14. [14] utilized a dataset of 100 houses with various parameters, splitting it evenly with 50% for
training and 50% for testing. The results were highly accurate, even when tested with different
parameters. By not using PSO, the training process for complex problems was simplified,
favoring regression techniques instead. Additionally, [13] experimented with basic machine
learning algorithms such as the decision tree classifier.
3. TECHNIQUES:

Linear Model: The multiple regression model represents a direct line, where:
• The error terms follows a typical distribution.
• The variance of the error terms remains constant.
• The dependent variable and the predictors are linked by a linear relationship.
LASSO: Similar to linear regression, Lasso regression utilizes a "penalty" approach by pushing
coefficients towards zero. It mitigates overfitting, especially in datasets with multicollinearity.
Lasso facilitates variable selection and regularization, penalizing less relevant features, resulting
in simpler and more practical models.
Ridge Regression: This employs L2 regularization, updating feature weights by incorporating
an additional squared term in the loss function compared to traditional linear regression. It
addresses overfitting by reducing the weights during optimization.
Elastic Net Regression: While individual variables may contribute to predictions, traditional
regression methods like Lasso may not entirely eliminate irrelevant variables. Elastic Net
Regression (ENR) combines aspects of both Lasso and Ridge regularization, retaining only
significant and informative features, thereby enhancing prediction accuracy.

Random Forest: This ensemble technique aggregates predictions from multiple decision trees to
produce a more robust final prediction. By employing numerous decision trees, it effectively
mitigates overfitting observed in individual trees, making it a crucial component. The algorithm
operates in several stages. Initially, it randomly selects samples with replacement from the
training dataset. Then, for each node, a decision tree is constructed using a bootstrap sample.
Features are randomly selected without replacement, and the node is split based on the feature
that provides the maximum information gain according to the objective function. This process is
repeated iteratively, and the final prediction is determined by aggregating predictions from each
tree, resulting in a reliable and accurate forecast.

4. METHODOLOGY:

The objective of this system is to predict house prices based on various input features provided
by the user. These features are fed into a machine learning (ML) model, which then produces a
prediction based on their impact on the target variable. The process involves:
- Dataset selection: Identifying an appropriate dataset that meets the requirements of both
developers and users.
- Data cleaning: Removing unnecessary data and converting raw data into a structured format.
- Data preprocessing: Handling missing values, encoding categorical variables, and converting
data types to prepare it for model training.
- Model training: Employing various machine learning algorithms to train the model and
selecting the most accurate algorithm based on error metrics.
- Model evaluation: Assessing the performance of the trained model and finalizing the
algorithm that yields the most accurate predictions.

Block Diagram of the System:

► Data Cleaning
► Statistical Analysis
► Feature Construction
► Identifying Outliers
► Data Conversion
► Collinearity Problem
► Data Visualization

The block diagram represents the traditional Machine Learning Approach, comprising five
sections: Data collection, Pre-processing, Data analysis, Application of algorithm, and Model
evaluation. The Data Collection phase involves web scraping data from the 99acres website
using machine learning algorithms. The Pre-processing section includes tasks such as data
cleaning, statistical analysis, feature construction, outlier identification, data conversion,
collinearity management, and data visualization.

Phase 1, the Pre-processing stage, focuses on enhancing dataset quality and preparing it for
analysis. This involves encoding variables, handling missing values through imputation,
standardizing or normalizing numerical variables, and dividing the dataset into training and test
sets for model evaluation and validation. By systematically executing these steps, the pre-
processing phase establishes a robust and reliable dataset, laying the groundwork for subsequent
analytical tasks.

5. IMPLEMENTATION
In this paper, we utilized datasets containing house sale prices for Gurgaon, India, sourced from
99acres.com. The dataset comprises 21 major attributes known to influence housing prices,
providing a comprehensive foundation for price prediction.

5.1 Correlation and Data Visualization


Analyzing the correlation among features is crucial for understanding their impact on housing
prices. The correlation diagram visually represents the relationship between various housing
features and price fluctuations. Each feature is represented along both the X and Y axes, creating
a matrix that illustrates the correlation between pairs of features.

For instance, features such as carpet area, number of bedrooms, location desirability, and
proximity to amenities are analyzed to assess their correlation with housing prices. The intensity
of color or numerical value within each cell indicates the strength and direction of correlation
between the corresponding features.

Positive correlations between features like larger square footage or prime location and housing
prices suggest areas for potential investment or improvement. Conversely, negative correlations,
such as those observed with higher crime rates or proximity to industrial areas, indicate factors
that may lower housing prices.

By visually analyzing the correlation matrix, stakeholders can identify key features that
significantly influence pricing dynamics. This insight enables informed decision-making
regarding investment strategies and property value enhancement initiatives.
6. CONCLUSION

The study concludes that the XGBoost Regression machine learning algorithm yields the highest
accuracy in predicting housing prices, followed closely by the Random Forest regression
algorithm. Extra trees and Linear Gradient boosting algorithms also demonstrate significant
predictive capabilities, albeit with varying degrees of accuracy. Cross-validation techniques were
employed to ensure the models' accuracy and guard against overfitting, resulting in highly
reliable predictions.

The exploration of housing price prediction has garnered substantial interest, with key factors
such as the number of rooms, carpet area, locality, and floor number identified as significant
influencers on residential property prices. This paper surveys various Machine Learning (ML)
and Deep Learning (DL) techniques employed by researchers, emphasizing the effectiveness of
Random Forest in producing accurate results.
As a continuation of this research, the proposal involves augmenting and hybridizing the dataset
by amalgamating data from multiple real estate websites, specifically focusing on residential
properties in Gurugram city. The objective is to construct a model utilizing advanced ML and
DL techniques for more precise predictions of flat prices based on given configurations.
Additionally, data collection will extend to rental properties with similar configurations listed
within the same date range. A separate model will be developed to forecast rental income for
flats of specified configurations. For users contemplating flat purchases in a designated
Gurugram locality, the model will provide a recommendation (yes/no) based on the financial
viability of such an investment.

6.1 Price Predicting Website

recommendations for apartments related to users' search criteria, enhancing the user experience
and facilitating informed decision-making.
7. REFERENCES
1. AnandG. Rawool, DattatrayV. Rogye, SainathG. Rane,Dr. Vinayk A, House Price
Prediction Using Machine literacy, 2021.
2. In the 2020 2nd International Conference on Innovative Mechanisms for Industry
Applications (ICIMIA), J. Manasa, R. Gupta, and N. Narahari presented a paper titled
"Machine Learning Based Predicting House Prices Using Regression Techniques,"
published by IEEE, covering pages 624-630.
3. De Cock,D. Ames, Iowa Alternative to the Boston casing data as an end of semester
retrogression design.J Stat. Education. 2011, 19.( Google Scholar).

4. Kaggle. House Sales in King County, USA.

5. Rico-Juan and Taltavull de La Paz (2021) evaluated the predictive and explanatory power
of machine learning and spatial hedonics tools in the Alicante housing market. Random
forest models outperformed hedonic regressions in accuracy, while the latter provided
insights into attribute-price relationships.
6. In the International Journal of Advances in Electronics and Computer Science, Volume
5, Issue 6, 2018, Neelam Shinde and Kiran Gawande published a paper titled "Valuation
of House Prices Using Predictive Techniques".

7. In the Proceedings of the 3rd International Conference on Advances in Science &


Technology (ICAST) 2020, Alisha Kuvalekar, Shivani Manchewar, Sidhika Mahadik,
and Shila Jawale presented a paper titled "Predicting house prices with machine
learning", as of April 8, 2020.

8. Sayan Putatunda, "PropTech for Proactive Pricing of Houses in Classified


Advertisements in the Indian Real Estate Market".

9. In the International Journal of Computer Science and Engineering (IJCSE), Volume 8,


Issue 7, 2020, Bindu Sivasankar, Arun P. Ashok, Gouri Madhu, and Fousiya S published
a paper titled "House Price Prediction”.

10. In the International Engineering Research Journal, Volume 3, Issue 3, 2019, Mr.
Rushikesh Naikare, Mr. Girish Gahandule, Mr. Akash Dumbre, Mr. Kaushal Agrawal,
and Prof. Chaitanya Manka authored a paper titled "House Planning and Price Prediction
System using Machine Learning".

11. Jingyi Mu, Fang Wu,and Aihua Zhang , " Housing Value Forecasting Based on Machine
Learning Methods", Hindawi Publishing Corporation Abstract and Applied Analysis,
Volume 2014.

12. Thuraiya Mohd, Suraya Masrom, Noraini Johari, "Machine Learning Housing Price
Prediction in Petaling Jaya, Selangor, Malaysia ", International Journal of Recent
Technology and Engineering (IJRTE), Volume-8, Issue-2S11, 2019.

13. G. Naga Satish, Ch. V. Raghavendran, M.D.Sugnana Rao, Ch.Srinivasulu , "House Price
Prediction Using Machine Learning" , International Journal of Innovative Technology
and Exploring Engineering (IJITEE), Volume-8 Issue-9, 2019.

14. Atharva Chouthai, Mohammed Athar Rangila , Sanved Amate, Prayag Adhikari, Vijay
Kukre , "House Price Prediction Using Machine Learning" , International Research
Journal of Engineering and Technology(IRJET), Vol:06 Issue: 03, 2019.

You might also like