Week1 2 (Research)
Week1 2 (Research)
FA22-BBD-005
Paper 1: House Price Prediction using Machine Learning Algorithm - The Case of Karachi City, Pakistan
Introduction
The study addresses the challenge of forecasting house prices in Karachi, Pakistan, using machine
learning, specifically the XGBoost algorithm. The real estate market lacks advanced tools for accurate
price prediction, which can aid buyers, sellers, and realtors. The dataset utilized comprises 38,961
records, detailing properties' attributes such as location, bedrooms, bathrooms, and prices. The aim is to
improve predictive accuracy for the Karachi housing market, which has seen limited research compared
to other global cities.
Problem Statement
The lack of a robust and accurate method to predict house prices in Karachi, a city experiencing rapid
urbanization and housing demand, leads to inefficiencies for stakeholders. Existing manual or traditional
approaches fail to account for diverse influencing factors like physical attributes and market trends,
causing imprecise predictions.
Methodology
The study uses the XGBoost regression model due to its efficiency and high accuracy in handling tabular
datasets. After data preprocessing to clean missing values and remove irrelevant features, the dataset
was split into training and testing subsets with ratios like 70/30 and 60/40. Features most correlated with
housing prices, such as location, bedrooms, and bathrooms, were selected. The model was trained using
Python libraries and validated using multiple metrics, including mean absolute error (MAE).
Results
The XGBoost model achieved an impressive accuracy of 98% across different train-test splits. The mean
absolute error remained consistent at approximately 22,502, demonstrating the model's reliability and
effectiveness. The results affirm that XGBoost outperforms other regression techniques for housing price
predictions in Karachi.
Introduction
This research explores the use of machine learning to predict house prices based on various influencing
factors. The study leverages algorithms like Linear Regression, Decision Tree, K-Means, and Random
Forest Regression to develop an efficient prediction model. Housing price trends are analyzed to help
buyers and sellers make informed decisions without relying on brokers.
Problem Statement
Traditional methods for predicting house prices are error-prone, with manual approaches often yielding
up to 25% inaccuracies. These inefficiencies cause financial losses for buyers and sellers. The research
aims to create an automated, data-driven model to provide accurate and reliable price predictions.
Methodology
The study utilized datasets collected from online real estate platforms. After preprocessing, including
handling missing values and feature selection, the data was split into 80% for training and 20% for
testing. Various machine learning algorithms were applied, with Random Forest emerging as the best fit.
Techniques like feature engineering and correlation analysis were used to refine the model further.
Results
Among the algorithms tested, Random Forest Regression provided the highest accuracy, with a root
mean squared error (RMSE) of 2.91. This model proved superior in predicting housing prices compared
to others like Linear Regression and Decision Tree, validating its suitability for real estate applications.