Ay-Sem8-Internship Report
Ay-Sem8-Internship Report
AN INTERNSHIP REPORT
Submitted by
AMIT YADAV
210180107001
BACHELOR OF ENGINEERING
in
April, 2025
i
Government Engineering College, Dahod
V64R+7QP, Jhalod Road, Dahod, Usarvan Part, Gujarat 389151
CERTIFICATE
This is to certify that the project report submitted along with the project
entitled House Price Prediction has been carried out by Amit Yadav under
my guidance in partial fulfillment for the degree of Bachelor of Engineering
in Computer Engineering, 8th Semester of Gujarat Technological University,
Ahmedabad during the academic year 2024-25.
ii
Company Certificate
iii
Government Engineering College, Dahod
V64R+7QP, Jhalod Road, Dahod, Usarvan Part, Gujarat 389151
DECLARATION
We hereby declare that the Internship report submitted along with the
Internship entitled House Price Prediction submitted in partial fulfillment
for the degree of Bachelor of Engineering in Computer Engineering to
Gujarat Technological University, Ahmedabad is a bonafide record of
original project work carried out by me at Department of Computer
Engineering, Government Engineering College, Dahod under the supervision
of Prof. Viren Patel and that no part of this report has been directly copied
from any students’ reports or taken from any other source, without providing
due reference.
Amit Yadav
iv
Acknowledgement
I sincerely thank my HOD and Internal Guide, Prof. Viren Patel, for his valuable
I extend my gratitude to my Industrial Supervisor, Mr. Rajesh Shah, for his expert
insights and technical guidance, which were instrumental in the successful completion of
my project.
I also thank Ms. Suman Vairagi, my Reporting Manager, for providing me with the
internship.
internship program and Gujarat Technological University (GTU) for providing the
Lastly, I am grateful to my family and friends for their unwavering support and
v
Abstract
Machine learning has revolutionized data analysis and predictive modeling, enabling
the development of highly accurate forecasting systems. This report presents an
internship project on House Price Prediction using Machine Learning, conducted at
Grras Solutions Private Limited, Ahmedabad as part of a 12-week internship
program.
The objective of this project was to design and implement a machine learning model
capable of predicting house prices based on various features such as location, size,
number of rooms, and other factors. The project involved:
• Data Collection & Preprocessing: Cleaning, handling missing values, and feature
selection.
• Exploratory Data Analysis (EDA): Understanding patterns and relationships in
housing data.
• Model Selection & Training: Using regression algorithms (Linear Regression,
Decision Tree, Random Forest, XGBoost).
• Evaluation & Optimization: Hyperparameter tuning and performance analysis.
The report details the step-by-step implementation of the project, challenges faced,
and key insights gained. The results demonstrate that ensemble learning models
provide higher accuracy, making them a preferable choice for price prediction tasks.
vi
List of Figures
vii
Abbreviations
AI Artificial Intelligence
ML Machine Learning
CSV Comma-Separated Values
RMSE Root Mean Square Error
MSE Mean Squared Error
MAE Mean Absolute Error
R² Coefficient of Determination
ANN Artificial Neural Network
SVM Support Vector Machine
KNN K-Nearest Neighbors
API Application Programming Interface
GUI Graphical User Interface
CPU Central Processing Unit
GPU Graphics Processing Unit
viii
Table of Contents
Acknowledgement………………………………………………………………… v
Abstract…………………………………………………………………………… vi
List of Figures…………………………………………………………………...... vii
List of Abbreviations……………………………………………………………... ix
Table of Contents………………………………………………………………… x
Chapter 1: Introduction….................................................................................... 1
1.1 Introduction 1
1.2 Objectives of the Project 1
1.3 Scope of the Project 2
1.4 Technologies Used 2
Chapter 2: Company Profile…............................................................................. 3
2.1 About Company
2.2 Mission and Vision
2.3 Internship Program
2.4 Company Culture
Chapter 3: Project Overview ................................................................................. 5
3.1 Problem Statement 5
3.2 Dataset Description 6
3.3 Methodology
Chapter 4: Implementation………………............................................................... 8
4.1 Introduction 8
4.2 Data Collection 9
4.3 Data Preprocessing
4.4 Exploratory Data Analysis
ix
Chapter 6: Model Evaluation and Deployment................................................... 19
6.1 Model Evaluation 19
6.2 Model Deployment 19
6.3 Deployment Using Flask 22
6.4 Frontend Development
8.1 Conclusion
8.2 Challenges Faced
8.3 Future Scope and Improvements
References…………………………………………………………....................... 32
Appendix...…………………………………………………………...................... 33
x
CHAPTER 1: INTRODUCTION
1.1 Introduction
The real estate market has always been dynamic and influenced by multiple factors such as
location, size, amenities, economic trends, and market demand. Accurate house price
prediction is crucial for buyers, sellers, and investors to make informed decisions. With
advancements in Machine Learning (ML), predictive models can analyze large datasets
and uncover hidden patterns, leading to better price estimations.
This project, House Price Prediction using Machine Learning, was developed as part of the
internship program at Grras Solutions Pvt. Ltd., Ahmedabad. The project explores various
supervised learning algorithms to build a robust price prediction model based on historical
housing data.
1
1.4 Technologies Used
• Programming Language: Python
• Libraries & Frameworks: Pandas, NumPy, Matplotlib, Seaborn, Scikit-Learn
• Machine Learning Algorithms: Linear Regression, Decision Tree, Random Forest,
XGBoost
• Tools: Jupyter Notebook, Google Colab
2
CHAPTER 2: COMPANY PROFILE
Grras Solutions Pvt. Ltd. is a leading IT training and development company headquartered
in Ahmedabad, India. The company specializes in:
Grras Solutions provides industry-relevant training and collaborates with students and
professionals to bridge the gap between academic knowledge and corporate requirements
Mission: To equip students with the latest technical skills through hands-on training and
industry-based projects.
3
2.3 INTERNSHIP PROGRAM AT GRRAS SOLUTIONS
The company offers structured internship programs for engineering and IT students in
collaboration with colleges and universities. Key features of the internship include:
In our culture at Grras Solutions, individual satisfaction and peace of mind are
paramount, leading to consistent excellence. Providing a family environment for every
team member, rooted in Indian Culture, fosters a positive core team and drives excellence
across all our products and solutions.
4
CHAPTER 3: PROJECT OVERVIEW
House pricing is influenced by several factors, and manual estimation is often inaccurate.
Traditional methods fail to capture hidden patterns in real estate data, leading to mispriced
properties.
This project aims to develop a Machine Learning model to predict house prices based on
key attributes such as location, size, number of rooms, and amenities.
5
3.3 METHODOLOGY:
6
CHAPTER 4: IMPLEMENTATION
4.1 INTRODUCTION
This chapter covers the step-by-step implementation of the House Price Prediction
System using Machine Learning.
The implementation includes data collection, preprocessing, model selection, training,
evaluation, and deployment.
The dataset used in this project is obtained from Kaggle, containing various features
such as:
• Lot Size
• Number of Bedrooms & Bathrooms
• Square Footage
• Location & Zip Code
• Year Built
• House Condition & Grade
The dataset is stored in CSV format and is loaded using Pandas in Python.
7
4.3 DATA PREPROCESSING
Data preprocessing ensures the dataset is clean and ready for model training. The steps
include:
EDA helps visualize trends and relationships between variables using graphs such
as histograms, scatter plots, and heatmaps.
Graphical Representation:
8
Code Snippet: Correlation Heatmap
9
CHAPTER 5: MODEL SELECTION AND TRAINING
Once the data preprocessing step was completed, several machine learning models
were considered for house price prediction:
➢ Linear Regression
➢ Decision Tree Regressor
➢ Random Forest Regressor
➢ XG Boost Regressor
Each model was trained and evaluated to compare their performance in predicting
house prices.
Overview:
Linear Regression assumes a linear relationship between input features and the
target variable. It fits a line to the data by minimizing the difference between
actual and predicted values.
Mathematical Representation:
10
Implementation:
Overview:
Decision Tree Regressor splits data into branches based on feature values. It
captures non-linear relationships but may overfit.
How it works:
The dataset is recursively split into branches based on the feature that
minimizes the mean squared error (MSE).
At each node, a decision rule is applied to split the data.
The tree grows until a stopping condition is met (e.g., minimum samples per
leaf, maximum depth).
Mathematical Formulation:
Implementation:
11
5.3 Random Forest Regressor:
Overview:
Random Forest is an ensemble learning method that combines multiple Decision
Trees to improve accuracy and reduce overfitting.
How it Works:
Constructs multiple decision trees using random subsets of the training data
(Bootstrap Aggregation or Bagging).
Each tree makes a prediction, and the final prediction is the average of all tree
outputs.
Mathematical Formulation:
Implementation:
12
5.4 XGBoost Regressor (Extreme Gradient Boosting)
Overview:
XGBoost is a powerful gradient boosting algorithm optimized for speed and
performance. It builds an ensemble of weak learners (decision trees) in a sequential
manner, where each tree corrects the errors of the previous one.
How it works:
Uses boosting, meaning trees are added iteratively.
Each new tree corrects the residual errors of the previous trees.
Uses a regularized objective function to prevent overfitting.
Mathematical Formulation:
Implementation:
13
CHAPTER 6: MODEL EVALUATION AND DEPLOYMENT
After training multiple models, we evaluate them based on Mean Absolute Error
(MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R²
Score to determine the best-performing model.
Mean Absolute Error (MAE): Measures the average absolute difference between
actual and predicted values.
Mean Squared Error (MSE): Measures the average squared difference between
actual and predicted values.
Root Mean Squared Error (RMSE): The square root of MSE, providing a
measure of error in the same units as the target variable.
R² Score (R-squared): Indicates how well the model explains the variance in the
data. A value close to 1 means a better fit.
Observation:
Model MAE MSE RMSE R² Score
14
Note:
RMSE (Root Mean Squared Error) is calculated as RMSE= sqrt(MSE).
Higher R² Score indicates a better fit.
Lower MAE, MSE, and RMSE indicate better performance in error reduction.
Result:
Linear Regression has the best R² score, meaning it explains the most variance in the
data.
Decision Tree has the highest MSE and RMSE, indicating it has the worst prediction
accuracy.
Random Forest and XGBoost have similar performance, but Random Forest slightly
outperforms XGBoost in terms of MSE and RMSE.
15
6.2 MODEL DEPLOYMENT
Steps in Deployment:
16
6.4 FRONTEND DEVELOPMENT
The user-friendly frontend is developed using HTML, CSS, and JavaScript to take
user input and display predicted house prices.
17
CHAPTER 7: RESULTS AND DISCUSSION
7.1 INTRODUCTION
This chapter presents the findings of the House Price Prediction Model, analyzing
its performance, accuracy, and potential real-world applications. The evaluation
metrics provide insights into the efficiency of the model and its usability for
predicting property prices.
The table below summarizes the evaluation metrics of different machine learning
models:
18
Code Snippet: (Include a scatter plot of actual vs. predicted house prices.)
19
7.3.2 Feature Importance Plot(XGBoost Model)
Code Snippet: (Include a feature importance plot from the XGBoost model)
The trained machine learning model has diverse real-world applications. It aids real
estate market analysis by helping buyers and sellers make informed decisions.
Banks use it for loan approvals by assessing mortgage risks. Investors benefit from
property investment strategies, identifying profitable locations. Governments
leverage it for urban planning, analyzing housing trends and forecasting
development needs.
20
CHAPTER 8: CONCLUSION AND FUTURE WORK
8.1 CONCLUSION
The performance evaluation of various machine learning models for house price
prediction reveals significant insights. Among the tested models, Linear Regression
achieved the best performance with an R² score of 0.6529, indicating a stronger
correlation between the predicted and actual values. It also had the lowest MAE
(970,043.40) and MSE (1.75 trillion), making it the most reliable choice in this study.
The Random Forest model followed closely with an R² score of 0.6119, though its
slightly higher error metrics suggest reduced accuracy compared to Linear Regression.
XGBoost, while often excelling in predictive tasks, did not outperform Linear
Regression in this case, achieving an R² of 0.5893. The Decision Tree model had the
lowest R² score (0.4771) and the highest MAE and MSE, making it the least effective
option for this dataset.
Overall, while Linear Regression demonstrated the best performance, further tuning of
ensemble methods like Random Forest and XGBoost may improve accuracy. Future
enhancements could include feature engineering, hyperparameter optimization, and
exploring deep learning models for more robust predictions.
Data Quality Issues – Missing values and inconsistencies in the dataset required
significant preprocessing.
Overfitting in Decision Tree Models – Complex models tended to memorize the
training data, leading to poor generalization.
Computational Complexity – Training advanced models like XGBoost required
high processing power and optimization.
21
8.3 FUTURE SCOPE AND IMPROVEMENTS:
To enhance the accuracy and applicability of this project, the following future
improvements can be considered:
By implementing these enhancements, the system can evolve into a powerful AI-
driven property valuation tool, revolutionizing the real estate industry.
22
REFERENCES
• Kaggle-https://www.kaggle.com/datasets/yasserh/housing-prices-dataset
23
Weekly report scanned copy
24