0% found this document useful (0 votes)
41 views34 pages

Ay-Sem8-Internship Report

This internship report by Amit Yadav details a project on House Price Prediction using Machine Learning, conducted at Grras Solutions Pvt. Ltd. The project involved data collection, preprocessing, and the implementation of various regression models to predict house prices based on features like location and size. The results indicated that ensemble learning models, particularly XGBoost, provided the highest accuracy in predictions.

Uploaded by

cockylearner999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views34 pages

Ay-Sem8-Internship Report

This internship report by Amit Yadav details a project on House Price Prediction using Machine Learning, conducted at Grras Solutions Pvt. Ltd. The project involved data collection, preprocessing, and the implementation of various regression models to predict house prices based on features like location and size. The results indicated that ensemble learning models, particularly XGBoost, provided the highest accuracy in predictions.

Uploaded by

cockylearner999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

HOUSE PRICE PREDICTION MODEL

AN INTERNSHIP REPORT

Submitted by
AMIT YADAV

210180107001

In partial fulfillment for the award of the degree of

BACHELOR OF ENGINEERING

in

Department of Computer Engineering

Government Engineering College, Dahod

Gujarat Technological University, Ahmedabad

April, 2025

i
Government Engineering College, Dahod
V64R+7QP, Jhalod Road, Dahod, Usarvan Part, Gujarat 389151

CERTIFICATE
This is to certify that the project report submitted along with the project
entitled House Price Prediction has been carried out by Amit Yadav under
my guidance in partial fulfillment for the degree of Bachelor of Engineering
in Computer Engineering, 8th Semester of Gujarat Technological University,
Ahmedabad during the academic year 2024-25.

Prof. Viren Patel Prof. Viren Patel


Internal Guide Head of the Department

ii
Company Certificate

iii
Government Engineering College, Dahod
V64R+7QP, Jhalod Road, Dahod, Usarvan Part, Gujarat 389151

DECLARATION

We hereby declare that the Internship report submitted along with the
Internship entitled House Price Prediction submitted in partial fulfillment
for the degree of Bachelor of Engineering in Computer Engineering to
Gujarat Technological University, Ahmedabad is a bonafide record of
original project work carried out by me at Department of Computer
Engineering, Government Engineering College, Dahod under the supervision
of Prof. Viren Patel and that no part of this report has been directly copied
from any students’ reports or taken from any other source, without providing
due reference.

Name of the Student Sign of Student

Amit Yadav

iv
Acknowledgement

I would like to express my heartfelt gratitude to Grras Solutions Private Limited,

Ahmedabad, for providing me with the opportunity to undertake my internship in

Machine Learning with Python.

I sincerely thank my HOD and Internal Guide, Prof. Viren Patel, for his valuable

guidance and continuous support throughout this internship.

I extend my gratitude to my Industrial Supervisor, Mr. Rajesh Shah, for his expert

insights and technical guidance, which were instrumental in the successful completion of

my project.

I also thank Ms. Suman Vairagi, my Reporting Manager, for providing me with the

necessary resources, mentorship, and a conducive learning environment during my

internship.

A special thanks to Government Engineering College, Dahod, for facilitating this

internship program and Gujarat Technological University (GTU) for providing the

framework for industry-academia collaboration.

Lastly, I am grateful to my family and friends for their unwavering support and

encouragement throughout my academic journey.

v
Abstract

Machine learning has revolutionized data analysis and predictive modeling, enabling
the development of highly accurate forecasting systems. This report presents an
internship project on House Price Prediction using Machine Learning, conducted at
Grras Solutions Private Limited, Ahmedabad as part of a 12-week internship
program.

The objective of this project was to design and implement a machine learning model
capable of predicting house prices based on various features such as location, size,
number of rooms, and other factors. The project involved:
• Data Collection & Preprocessing: Cleaning, handling missing values, and feature
selection.
• Exploratory Data Analysis (EDA): Understanding patterns and relationships in
housing data.
• Model Selection & Training: Using regression algorithms (Linear Regression,
Decision Tree, Random Forest, XGBoost).
• Evaluation & Optimization: Hyperparameter tuning and performance analysis.

The report details the step-by-step implementation of the project, challenges faced,
and key insights gained. The results demonstrate that ensemble learning models
provide higher accuracy, making them a preferable choice for price prediction tasks.

This internship provided hands-on experience in machine learning, data science


methodologies, and model deployment, equipping the student with essential skills for
real-world applications.

vi
List of Figures

Fig 1.1.1 ML in Real Estate……………………………..………………….. 1


Fig 1.4.1 Technologies Used………….…………………………………….. 4
Fig 2.1.1 Company Logo…………………………………………….. 6
Fig 3.1.1 ML in Real Estate…………………………………………….. 7
Fig 3.2.1 Dataset Overview…………………………………………….. 7
Fig 3.3.1 Flowchart of the project ……………………………………….. 10
Fig 4.2.1 Loading of dataset…………………………………………… 10
Fig 4.3.1 Handling of missing data …………………………………….. 11
Fig 4.4.1 Histogram and Box Plot………………………………………... 12
Fig 4.4.2 Correlation Matrix…………………………………………….. 12
Fig 5.1 Splitting of Dataset into training and test data…………………. 15
Fig 5.1.1 Linear Regression……………………………………………… 16
Fig 5.2.1 Decision Tree Regression………………………………………… 17
Fig 5.3.1 Random Forest Regression……………………………………….. 18
Fig 5.4.1 XGBoost Regression………..……………………………………. 22
Fig 6.1.1 Comparison of Regression Model Performance Metrics ………… 22
Fig 6.2.1 Saving and Loading the Model………………………………… 22
Fig 6.3.1 Creation of Flask API…………………………………………… 22
Fig 6.4.1 User View……….………………………………………………… 23
Fig 7.3.1.1 Scatter Plot for various regression models………………………... 23
Fig 7.3.2.1 Feature Importance Plot…….…………………………………….. 26

vii
Abbreviations

AI Artificial Intelligence
ML Machine Learning
CSV Comma-Separated Values
RMSE Root Mean Square Error
MSE Mean Squared Error
MAE Mean Absolute Error
R² Coefficient of Determination
ANN Artificial Neural Network
SVM Support Vector Machine
KNN K-Nearest Neighbors
API Application Programming Interface
GUI Graphical User Interface
CPU Central Processing Unit
GPU Graphics Processing Unit

viii
Table of Contents

Acknowledgement………………………………………………………………… v
Abstract…………………………………………………………………………… vi
List of Figures…………………………………………………………………...... vii
List of Abbreviations……………………………………………………………... ix
Table of Contents………………………………………………………………… x
Chapter 1: Introduction….................................................................................... 1
1.1 Introduction 1
1.2 Objectives of the Project 1
1.3 Scope of the Project 2
1.4 Technologies Used 2
Chapter 2: Company Profile…............................................................................. 3
2.1 About Company
2.2 Mission and Vision
2.3 Internship Program
2.4 Company Culture
Chapter 3: Project Overview ................................................................................. 5
3.1 Problem Statement 5
3.2 Dataset Description 6
3.3 Methodology

Chapter 4: Implementation………………............................................................... 8
4.1 Introduction 8
4.2 Data Collection 9
4.3 Data Preprocessing
4.4 Exploratory Data Analysis

Chapter 5: Model Selection and Training............................................................... 10


5.1 Linear Regression 10
5.2 Decision Tree Regressor 11
5.3 Random Forest Regressor 14
5.4 XGBoost Regressor 15

ix
Chapter 6: Model Evaluation and Deployment................................................... 19
6.1 Model Evaluation 19
6.2 Model Deployment 19
6.3 Deployment Using Flask 22
6.4 Frontend Development

Chapter 7: Results and Discussion ………............................................................. 24


7.1 Introduction 24
7.2 Model Performance Analysis 24
7.3 Graphical Representation of Model Performance 26

7.3.1 Actual price Vs Predicted Price

7.3.2 Feature Importance Plot

7.3.3 Real World Implications

Chapter 8: Conclusion and Future Work…......................................................... 31

8.1 Conclusion
8.2 Challenges Faced
8.3 Future Scope and Improvements

References…………………………………………………………....................... 32
Appendix...…………………………………………………………...................... 33

x
CHAPTER 1: INTRODUCTION

1.1 Introduction

The real estate market has always been dynamic and influenced by multiple factors such as
location, size, amenities, economic trends, and market demand. Accurate house price
prediction is crucial for buyers, sellers, and investors to make informed decisions. With
advancements in Machine Learning (ML), predictive models can analyze large datasets
and uncover hidden patterns, leading to better price estimations.
This project, House Price Prediction using Machine Learning, was developed as part of the
internship program at Grras Solutions Pvt. Ltd., Ahmedabad. The project explores various
supervised learning algorithms to build a robust price prediction model based on historical
housing data.

1.2 Objectives of the Project

The primary objectives of this project are:


• To analyze and preprocess real estate data for better model performance.
• To implement and compare different regression models for price prediction.
• To evaluate the accuracy and efficiency of each model using performance metrics.
• To provide insights and recommendations for real estate stakeholders.

1.3 Scope of the Project


• The model considers historical sales data, house features, and market trends.
• It applies Machine Learning techniques to predict prices.
• The project is developed using Python, Scikit-Learn, Pandas, and Matplotlib.
• The final model will be evaluated based on Mean Squared Error (MSE) and R²
Score.

1
1.4 Technologies Used
• Programming Language: Python
• Libraries & Frameworks: Pandas, NumPy, Matplotlib, Seaborn, Scikit-Learn
• Machine Learning Algorithms: Linear Regression, Decision Tree, Random Forest,
XGBoost
• Tools: Jupyter Notebook, Google Colab

2
CHAPTER 2: COMPANY PROFILE

2.1 ABOUT COMPANY

Fig. 2.1.1 Company Logo

Grras Solutions Pvt. Ltd. is a leading IT training and development company headquartered
in Ahmedabad, India. The company specializes in:

• Machine Learning & Data Science


• Cloud Computing & DevOps
• Cybersecurity & Ethical Hacking
• Full-Stack Web Development

Grras Solutions provides industry-relevant training and collaborates with students and
professionals to bridge the gap between academic knowledge and corporate requirements

2.2 MISSION AND VISION

Mission: To equip students with the latest technical skills through hands-on training and
industry-based projects.

Vision: To be recognized as a center of excellence in IT education and skill development.

3
2.3 INTERNSHIP PROGRAM AT GRRAS SOLUTIONS

The company offers structured internship programs for engineering and IT students in
collaboration with colleges and universities. Key features of the internship include:

• Live project work on real-world datasets.


• Mentorship from industry experts.
• Practical exposure to emerging technologies.

2.4 COMPANY CULTURE

In our culture at Grras Solutions, individual satisfaction and peace of mind are
paramount, leading to consistent excellence. Providing a family environment for every
team member, rooted in Indian Culture, fosters a positive core team and drives excellence
across all our products and solutions.

4
CHAPTER 3: PROJECT OVERVIEW

3.1 PROBLEM STATEMENT:

House pricing is influenced by several factors, and manual estimation is often inaccurate.
Traditional methods fail to capture hidden patterns in real estate data, leading to mispriced
properties.

This project aims to develop a Machine Learning model to predict house prices based on
key attributes such as location, size, number of rooms, and amenities.

3.2 DATASET DESCRIPTION:


The dataset used in this project consists of thousands of housing records with features
including:
• Location: City, neighborhood, and ZIP code
• Size: Square footage of the house
• Rooms: Number of bedrooms and bathrooms
• Market Data: Price trends over time

5
3.3 METHODOLOGY:

The project follows a structured approach:


1. Data Collection: Gathering historical real estate data.
2. Data Preprocessing: Handling missing values, feature selection, and scaling.
3. Exploratory Data Analysis (EDA): Understanding correlations and trends.
4. Model Training: Implementing regression models.
5. Model Evaluation: Comparing models using performance metrics.

6
CHAPTER 4: IMPLEMENTATION

4.1 INTRODUCTION

This chapter covers the step-by-step implementation of the House Price Prediction
System using Machine Learning.
The implementation includes data collection, preprocessing, model selection, training,
evaluation, and deployment.

4.2 DATA COLLECTION

The dataset used in this project is obtained from Kaggle, containing various features
such as:

• Lot Size
• Number of Bedrooms & Bathrooms
• Square Footage
• Location & Zip Code
• Year Built
• House Condition & Grade

The dataset is stored in CSV format and is loaded using Pandas in Python.

Code Snippet: Loading the Dataset

7
4.3 DATA PREPROCESSING

Data preprocessing ensures the dataset is clean and ready for model training. The steps
include:

1. Handling Missing Values


2. Feature Engineering
3. Encoding Categorical Variables
4. Scaling & Normalization

Code Snippet: Handling Missing Data

4.4 EXPLORATORY DATA ANALYSIS (EDA)

EDA helps visualize trends and relationships between variables using graphs such
as histograms, scatter plots, and heatmaps.

Graphical Representation:

Code Snippet: (Histogram and Boxplot)

8
Code Snippet: Correlation Heatmap

9
CHAPTER 5: MODEL SELECTION AND TRAINING

Once the data preprocessing step was completed, several machine learning models
were considered for house price prediction:

➢ Linear Regression
➢ Decision Tree Regressor
➢ Random Forest Regressor
➢ XG Boost Regressor

Each model was trained and evaluated to compare their performance in predicting
house prices.

5.1 Linear Regression

Overview:
Linear Regression assumes a linear relationship between input features and the
target variable. It fits a line to the data by minimizing the difference between
actual and predicted values.

Mathematical Representation:

10
Implementation:

5.2 Decision Tree Regressor:

Overview:
Decision Tree Regressor splits data into branches based on feature values. It
captures non-linear relationships but may overfit.

How it works:
The dataset is recursively split into branches based on the feature that
minimizes the mean squared error (MSE).
At each node, a decision rule is applied to split the data.
The tree grows until a stopping condition is met (e.g., minimum samples per
leaf, maximum depth).

Mathematical Formulation:

Implementation:

11
5.3 Random Forest Regressor:

Overview:
Random Forest is an ensemble learning method that combines multiple Decision
Trees to improve accuracy and reduce overfitting.

How it Works:
Constructs multiple decision trees using random subsets of the training data
(Bootstrap Aggregation or Bagging).
Each tree makes a prediction, and the final prediction is the average of all tree
outputs.

Mathematical Formulation:

Implementation:

12
5.4 XGBoost Regressor (Extreme Gradient Boosting)

Overview:
XGBoost is a powerful gradient boosting algorithm optimized for speed and
performance. It builds an ensemble of weak learners (decision trees) in a sequential
manner, where each tree corrects the errors of the previous one.

How it works:
Uses boosting, meaning trees are added iteratively.
Each new tree corrects the residual errors of the previous trees.
Uses a regularized objective function to prevent overfitting.

Mathematical Formulation:

Implementation:

13
CHAPTER 6: MODEL EVALUATION AND DEPLOYMENT

6.1 MODEL EVALUATION

After training multiple models, we evaluate them based on Mean Absolute Error
(MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R²
Score to determine the best-performing model.

Performance Metrics Used:

Mean Absolute Error (MAE): Measures the average absolute difference between
actual and predicted values.

Mean Squared Error (MSE): Measures the average squared difference between
actual and predicted values.

Root Mean Squared Error (RMSE): The square root of MSE, providing a
measure of error in the same units as the target variable.

R² Score (R-squared): Indicates how well the model explains the variance in the
data. A value close to 1 means a better fit.

Observation:
Model MAE MSE RMSE R² Score

Linear Regression 970,043.40 1,754,318,687,330.66 1,324,595.81 0.6529

Decision Tree 1,195,266.06 2,642,802,637,614.68 1,625,646.29 0.4771

Random Forest 1,021,546.04 1,961,585,044,320.34 1,400,565.80 0.6119

XGBoost 1,054,208.88 2,075,875,606,528.00 1,441,639.98 0.5893

14
Note:
RMSE (Root Mean Squared Error) is calculated as RMSE= sqrt(MSE).
Higher R² Score indicates a better fit.
Lower MAE, MSE, and RMSE indicate better performance in error reduction.

Result:
Linear Regression has the best R² score, meaning it explains the most variance in the
data.
Decision Tree has the highest MSE and RMSE, indicating it has the worst prediction
accuracy.
Random Forest and XGBoost have similar performance, but Random Forest slightly
outperforms XGBoost in terms of MSE and RMSE.

15
6.2 MODEL DEPLOYMENT

Once the best-performing model is selected, we deploy it using Flask or Streamlit


to create a user-friendly interface where users can input property details and get
predicted prices.

Steps in Deployment:

1. Save the trained model using joblib.


2. Develop a Flask-based web application.
3. Create an HTML frontend to take user input.
4. Integrate the model to return price predictions.

Code Snippet: Saving & Loading the Model

6.3 DEPLOYMENT USING FLASK


(This section includes the Flask-based API creation.)
Code Snippet: Creating a Flask API

16
6.4 FRONTEND DEVELOPMENT

The user-friendly frontend is developed using HTML, CSS, and JavaScript to take
user input and display predicted house prices.

17
CHAPTER 7: RESULTS AND DISCUSSION

7.1 INTRODUCTION

This chapter presents the findings of the House Price Prediction Model, analyzing
its performance, accuracy, and potential real-world applications. The evaluation
metrics provide insights into the efficiency of the model and its usability for
predicting property prices.

7.2 MODEL PERFORMANCE ANALYSIS

Comparison of Model Accuracy

The table below summarizes the evaluation metrics of different machine learning
models:

Model MAE MSE RMSE R² Score

Linear Regression 970,043.40 1,754,318,687,330.66 1,324,595.81 0.6529

Decision Tree 1,195,266.06 2,642,802,637,614.68 1,625,646.29 0.4771

Random Forest 1,021,546.04 1,961,585,044,320.34 1,400,565.80 0.6119

XGBoost 1,054,208.88 2,075,875,606,528.00 1,441,639.98 0.5893

7.3GRAPHICAL REPRESENTATION OF MODEL PERFORMANCE

7.3.1 Actual Vs Predicted Price Plot

A scatter plot is used to compare actual vs. predicted house prices.


The closer the points are to the 45-degree line, the better the model performance.

18
Code Snippet: (Include a scatter plot of actual vs. predicted house prices.)

19
7.3.2 Feature Importance Plot(XGBoost Model)

Feature importance helps in understanding which attributes have the most


significant impact on the house price prediction.

Code Snippet: (Include a feature importance plot from the XGBoost model)

7.3.3 Real World Implications:

The trained machine learning model has diverse real-world applications. It aids real
estate market analysis by helping buyers and sellers make informed decisions.
Banks use it for loan approvals by assessing mortgage risks. Investors benefit from
property investment strategies, identifying profitable locations. Governments
leverage it for urban planning, analyzing housing trends and forecasting
development needs.

20
CHAPTER 8: CONCLUSION AND FUTURE WORK

8.1 CONCLUSION

The performance evaluation of various machine learning models for house price
prediction reveals significant insights. Among the tested models, Linear Regression
achieved the best performance with an R² score of 0.6529, indicating a stronger
correlation between the predicted and actual values. It also had the lowest MAE
(970,043.40) and MSE (1.75 trillion), making it the most reliable choice in this study.

The Random Forest model followed closely with an R² score of 0.6119, though its
slightly higher error metrics suggest reduced accuracy compared to Linear Regression.
XGBoost, while often excelling in predictive tasks, did not outperform Linear
Regression in this case, achieving an R² of 0.5893. The Decision Tree model had the
lowest R² score (0.4771) and the highest MAE and MSE, making it the least effective
option for this dataset.

Overall, while Linear Regression demonstrated the best performance, further tuning of
ensemble methods like Random Forest and XGBoost may improve accuracy. Future
enhancements could include feature engineering, hyperparameter optimization, and
exploring deep learning models for more robust predictions.

8.2 CHALLENGES FACED

While working on this project, a few challenges were encountered:

Data Quality Issues – Missing values and inconsistencies in the dataset required
significant preprocessing.
Overfitting in Decision Tree Models – Complex models tended to memorize the
training data, leading to poor generalization.
Computational Complexity – Training advanced models like XGBoost required
high processing power and optimization.

Each challenge was addressed through appropriate data cleaning, feature


engineering, and hyperparameter tuning techniques.

21
8.3 FUTURE SCOPE AND IMPROVEMENTS:

To enhance the accuracy and applicability of this project, the following future
improvements can be considered:

Deep Learning Integration – Implementing neural networks for better prediction


accuracy.
Live Market Trends – Incorporating real-time pricing data from online property
listings.
Geospatial Analysis – Using GPS coordinates and satellite imagery for precise
location-based predictions.
User-Friendly Web Application – Deploying the model as an interactive web tool
for real estate professionals.

By implementing these enhancements, the system can evolve into a powerful AI-
driven property valuation tool, revolutionizing the real estate industry.

22
REFERENCES

• Kaggle-https://www.kaggle.com/datasets/yasserh/housing-prices-dataset

• House Price Prediction using Machine Learning in Python – GeeksforGeeks.


https://www.geeksforgeeks.org/house-price-prediction-using-machine-learning-in-
python/

• Scikit-Learn Documentation. https://scikit-learn.org/stable/

• XGBoost Documentation. https://xgboost.readthedocs.io/en/stable/

• Pandas Documentation. https://pandas.pydata.org/

• Matplotlib Documentation. https://matplotlib.org/stable/contents.html

23
Weekly report scanned copy

24

You might also like