Main Content (1) - Merged
Main Content (1) - Merged
ON
SUBMITTED BY
GITAM, BHUBANESWAR
ACADEMIC YEAR
2024-25
i
GITAM
CERTIFICATE
This is to certify that the work in this Project Report entitled
“HOUSE PRICE PREDICTION USING PYTHON” by Suchismita
Sahoo (2221304049), Suraj Sahoo (2221304051), Tusar Kanta
Dhal(2221304052), Dibya Rashmi Bhanja(2231304002) have
been carried out under my supervision in partial fulfillment of the
requirements for the B.Tech in Computer Science & Engineering
during session 2024-2025 in Department of Computer Science &
Engineering of GITAM and this work is the original work of the
above students.
ii
ACKNOWLEDGMENT
This project is done as a semester project, as a part course titled
“MINOR PROJECT”.
iii
ABSTRACT
We propose to implement a house price prediction model of Bangalore,
India. Buyers focus on finding a suitable home/flat within their budget, and
consider their investment on house to increases over a period of time. On
the other hand, Sellers aim to sell their homes at the best possible price.
Since house prices are subject to fluctuations, customers often face
difficulties in purchasing a house at the right time before prices change in
the near future. To address this major issue in the real estate market, we are
designing a machine learning model for predicting house prices. Machine
learning techniques play a vital role in this project by providing more
precise house price estimations based on user preferences such as location,
number of rooms, and air quality, among others. Housing prices fluctuate
on a daily basis and are sometimes exaggerated rather than based on worth.
The major focus of this project is on predicting home prices using genuine
factors. Here, we intend to base an evaluation on every basic criterion that is
taken into account when establishing the pricing. The goal of this project is
to learn Python and get experience in Data Analytics, Machine Learning.
iv
LIST OF TABLES
Table 1 Application
LIST OF FIGURES
v
TABLE OF CONTENTS
Page Number
Certificate ii
Acknowledgement iii
Abstract iv
List of Table v
List of Figures v
PHASE-I
1. Introduction
1.1 Introduction 1
1.2 Motivation 2
2. Literature Survey
2.1 Literature Survey 3
3. Proposed Work
3.1 Objective of proposed work 9
3.2 Methodology 9
3.2.1 Introduction to machine learning 9
3.2.2 How does Machine Learning work? 11
3.2.3 Need for Machine Learning 12
3.2.4 Applications of Machine learning: - 12
3.2.5 Machine Learning Classifications 15
3.2.5.1 Supervised Learning 15
3.2.5.2 Unsupervised Machine Learning 28
3.2.5.3 Reinforcement Learning: 30
PHASE-II
4. Implementation
4.1 Code 31
5. Result Analysis
5.1 Visualization Insights 33
5.2 Advantages 36
5.3 Disadvantages 37
5.4 Maintenance 39
5.5 Application 41
6. Conclusion and Future Development 42
Reference 43
vi
CHAPTER 1
INTRODUCTION
1.1 INTRODUCTION:
1
1.2 Motivation
We are highly interested in anything related to Machine Learning, the
independent project provided us with the opportunity to study and reaffirm
our passion for this subject. The capacity to generate guesses, forecasts, and
offer machines the ability to learn on their own is both powerful and infinite
in terms of application possibilities. Machine Learning may be applied in
finance, medicine, and virtually any other field. That is why we opted to base
our idea on Machine Learning.
2
CHAPTER 2
LITERATURE SURVEY
3
situation. Regardless, we don't have accurate standardized approaches to live
the significant estate property values.
4
the determination of the price of the house is affected by many variables such
as people’s income level, marital status, industrialization of the society and
agricultural employment rate, interest rates, population growth and migration,
and all variables also affect the price. Since changes in housing prices affect
both socio-economic conditions and national economic conditions, it is an
important issue that concerns governments and individuals (Kim and Park,
2005). Housing demand arises for different purposes such as consumption,
investment, and wealth accumulation. In this part of the literature, some
studies that estimate housing prices are cited. The prediction of houses with
real factors is important for the studies. With the developments in artificial
intelligence methods, it now allows the solution of many problems in daily
life such as purchasing a house. The competitive nature 4 AESTIMUM JUST
ACCEPTED MANUSCRIPT of the housing sector helps the data mining
process in this industry, processing this data and predicting its future trends.
Regression is a machine learning tool that encourages to build expectations
from available measurable information by taking the links between the target
parameter and many different independent parameters. The cost of a house is
based on several parameters. Machine learning is one of the most important
areas to apply ideas on how to increase costs and predict with high accuracy.
Machine learning method is one of the recent methods used for prediction. It
is used to interpret and analyze highly complex data structures and patterns
(Ngiam and Khor, 2019).
Machine learning predicts that computers learn and behave like humans
(Feggella, 2019). Machine learning means providing valid dataset, and
moreover predictions are based on it, machine learns how important a
particular event might be on the whole system based on pre-loaded data and
predicts the outcome accordingly. Various modern applications of this
technique include predicting stock prices, predicting the probability of an
earthquake, and predicting company sales, and the list has infinite
possibilities (Shiller, 2007). Unlike traditional econometrics models, machine
learning algorithms do not require the training data to be normally distributed.
Many statistical tests rely on the assumption of normality. If the data are not
5
normally distributed, these statistical tests will fail and become invalid. These
processes used to take a long time, however, today they can be completed
quickly with the high-speed computing power of modern computers and
therefore this technique is less costly and less timely to use. Rafiei and Adeli
(2016) used SVR to determine whether a property developer should build a
new development or stop the construction at the beginning of a project based
on the prediction of future house prices. The study, in which data from 350
apartment houses built in Tehran (Iran) between 1993 and 2008 were used,
had 26 features such as zip code, gross floor area, land area, estimated cost of
construction, construction time, and property prices. Its results revealed that
SVR was a suitable method for making home price predictions since the loss
of prediction (error) was as low as 3.6% of the test data. Therefore, the
prediction results provide valuable input to the property developer’s decision-
making process. Cechin et al. (2000) analyzed the data of buildings for sale
and rental in Porto Alegre, Brazil, using linear regression and artificial neural
network methods. They used parameters such as the size of the house, district,
geographical location, environmental arrangement, number of rooms,
building construction date and total area of use. According to the study, they
reported that the artificial neural network method was more useful compared
to linear regression. Yu and Wu (2016) used the classification and regression
algorithms. According to the analysis, living area square meter, roof content
and neighborhood have the greatest statistical significance in predicting the
selling price of a house, and the prediction analysis can be improved by the
Principal Component Analysis (PCA) technique. Because the value of a
particular property is closely associated with the infrastructure facilities
surrounding the property. Koktashev et al. (2019) attempted to predict the
house values in the city of Krasnoyarsk by using 1.970 housing transaction
records. The number of rooms, total area, floor, parking lot, type of repair,
number of balconies, type of bathroom, number of elevators, garbage
disposal, year of construction and accident rate of the house were discussed
as the features in that study. They applied random forest, ridge regression,
and linear regression to predict the property prices. Their study concluded
6
that the random forest outperformed the other two algorithms, as evaluated
by the Mean Absolute Error (MAE). Park and Bae (2015) developed a house
price prediction model with machine learning algorithms in real estate
research and compared their performance in terms of classification accuracy.
Their study aimed at helping real estate sellers or real estate agents to make
rational decisions in real estate transactions. The tests showed that the
accuracy-based Repeated Incremental Pruning to Produce Error Reduction
(RIPPER) consistently outperformed other models in house price prediction
performance. Bhagat et al. (2016) studied on linear regression algorithms for
house prediction. The aim of the study was to predict the effective price of
the real estate for clients based on their budget and priorities. They indicated
that the linear regression technique of the analysis of past market trends and
price ranges could be used to determine future house prices. In their study,
Mora-Esperanza and Gallego (2004) analyzed house prices in Madrid using
12 parameters. The parameters they used were the distance to the city center,
road, size of the district, construction class, age of the building, renovation
status, housing area, terrace area, location within the district, housing design,
the floor and the presence of outbuildings. The dataset was created assuming
that the sales values of 100 houses for sale in the region were the real values.
Researchers, who used the ANN and linear regression analysis technically,
reported that the ANN technique was more successful and achieved an
average agreement of 95% and an accuracy of 86%. Wang and Wu (2018)
used 27.649 data on home appraisal price from Airlington County, Virginia,
USA in 2015 and suggested that Random Forest outperformed the linear
regression in terms of accuracy. In their study in the case of Mumbai, India,
Varma et al. (2018) attempted to predict t the price of the house by using
various regression techniques (Linear Regression, Forest regression, boosted
regression) and artificial neural network technique based on the features of
the house (usage area, number of rooms, number of bathrooms, parking lot,
elevator, furniture). In conclusion, they determined that the efficiency of the
algorithm with the use of artificial neural networks was higher compared to
other regression techniques. They also revealed that the system prevented the
7
risk of investing in the wrong house by providing the right output. Thamarai
and Malarvizhi (2020) attempted to predict the prices of houses from real-
time data after the large fluctuation in house price increases in 2018 at the
Tadepalligudem location of West Godavari District in Andhra Pradesh, India
using the features of the number of bedrooms, age of the house, transportation
facilities, nearby schools, and shopping opportunities. They applied these
models in decision tree regression and multiple linear regression techniques,
which are among the machine learning techniques. They suggested that the
performance of multiple linear regression was better than decision tree
regression in predicting the house prices.
Zhao et al. [1] who applied deep learning in combination with extreme
Gradient Boosting (XGBoost) for real estate price predictions, by analyzing
historical property sale records. The dataset was extracted from Online Real
Estate website. The data split into 80% as training set and 20% as testing test.
According to Satish et al. [2] regression deals with specifying the relationship
between dependent also called as response or outcome and independent
variable or predictor. The study aimed to predict future house price with the
help of machine learning algorithm.
8
CHAPTER 3.
PROPOSED WORK
3.1 Objective of proposed work
3.2 Methodology
9
surrounded by humans who can increasing data. This research includes the
history of machine learning learn everything from their experiences with their
learning capability, and we have computers or machines which work on our
instructions. But can a machine also learn from experiences or past data like
a human does? So here comes the role of Machine Learning. It is a science
that will improve more in the future. The reason behind this development is
the difficulty of analyzing and processing the rapidly increasing data.
Machine learning is based on the principle of finding the best model for the
new data among the previous data thanks to this increasing data. Therefore,
machine learning researches will go on in parallel with the, the methods used
in machine learning, its application fields, and the researches on this field.
The aim of this study is to transmit the knowledge on machine learning, which
has become very popular nowadays, and its applications to the researchers.
There is no error margin in the operations carried out by computers based an
algorithm and the operation follows certain steps. Different from the
commands which are written to have an output based on an input, there are
some situations when the computers make decisions based upon the present
sample data. In those situations, computers may make mistakes just like
people in the decision-making process. That is, machine learning is the
process of equipping the computers with the ability to learn by using the data
and experience like a human brain. The main aim of machine learning is to
create models which can train themselves to improve, perceive the complex
patterns, and find solutions to the new problems by using the previous data.
10
3.2.2 How does Machine Learning work?
A Machine Learning system learns from historical data, builds the prediction
models, and whenever it receives new data, predicts the output for it. The
accuracy of predicted output depends upon the amount of data, as the huge
amount of data helps to build a better model which predicts the output more
accurately. Suppose we have a complex problem, where we need to perform
some predictions, so instead of writing a code for it, we just need to feed the
data to generic algorithms, and with the help of these algorithms, machine
builds the logic as per the data and predict the output. Machine learning has
changed our way of thinking about the problem. The below block diagram
explains the working of Machine Learning algorithm:
11
3.2.3 Need for Machine Learning
The need for machine learning is increasing day by day. The reason behind
the need for machine learning is that it is capable of doing tasks that are too
complex for a person to implement directly. As a human, we have some
limitations as we cannot access the huge amount of data manually, so for this,
we need some computer systems and here comes the machine learning to
make things easy for us. We can train machine learning algorithms by
providing them the huge amount of data and let them explore the data,
construct the models, and predict the required output automatically. The
performance of the machine learning algorithm depends on the amount of
data, and it can be determined by the cost function. With the help of machine
learning, we can save both time and money. The importance of machine
learning can be easily understood by its use’s cases, Currently, machine
learning is used in self-driving cars, cyber fraud detection, face recognition,
and friend suggestion by Facebook, etc. Various top companies such as
Netflix and Amazon have built machine learning models that are using a vast
amount of data to analyses the user interest and recommend product
accordingly.
12
efficient machine learning project. The main purpose of the life cycle is to
find a solution to the problem or project. Machine learning life cycle involves
seven major steps, which are given below:
Gathering Data
Data preparation
Data Wrangling
Analyze Data
Train the model
Test the model
Gathering Data:
Data Gathering is the first step of the machine learning life cycle. The goal of
this step is to identify and obtain all data-related problems. In this step, we
need to identify the different data sources, as data can be collected from
various sources such as files, database, internet, or mobile devices. It is one
of the most important steps of the life cycle. The quantity and quality of the
collected data will determine the efficiency of the output. The more will be
the data, the more accurate will be the prediction. This step includes the below
tasks:
13
Identify various data sources
Collect data
Integrate the data obtained from different source
Data preparation:
After collecting the data, we need to prepare it for further steps. Data
preparation is a step where we put our data into a suitable place and prepare
it to use in our machine learning training. In this step, first, we put all data
together, and then randomize the ordering of data. This step can be further
divided into two processes:
Data exploration:
It is used to understand the nature of data that we have to work with. We need
to understand the characteristics, format and quality of data.
A better understanding of data leads to an effective outcome. In this, we find
Correlations, general trends, and outliers.
Data Wrangling:
Data wrangling is the process of cleaning and converting raw data into a
useable format. It is the process of cleaning the data, selecting the variable to
use, and transforming the data in a proper format to make it more suitable for
analysis in the next step. It is one of the most important steps of the complete
process. Cleaning of data is required to address the quality issues. It is not
necessary that data we have collected is always of our use as some of the data
may not be useful. In real-world applications, collected data may have various
issues, including:
Missing Values
Duplicate data
Invalid data
Noise
14
Data Analysis
Now the cleaned and prepared data is passed on to the analysis step.
This step involves:
Selection of analytical techniques
Building models
Review the result
The aim of this step is to build a machine learning model to analyze the data
using various analytical techniques and review the outcome. It starts with the
determination of the type of the problems, where we select the machine
learning techniques such as Classification, Regression, Cluster
analysis, Association, etc. then build the model using prepared data, and
evaluate the model.
Deployment
The last step of machine learning life cycle is deployment, where we deploy
the model in the real-world system. If the above-prepared model is producing
an accurate result as per our requirement with acceptable speed, then we
deploy the model in the real system. But before deploying the project, we will
check whether it is improving its performance using available data or not. The
deployment phase is similar to making the final report for a project.
15
labelled data to understand the datasets and learn about each data, once the
training and processing are done then we test the model by providing a sample
data to check whether it is predicting the exact output or not. The goal of
supervised learning is to map input data with the output data. The supervised
learning is based on supervision, and it is the same as when a student learns
things in the supervision of the teacher. The example of supervised learning
is spam filtering. Supervised learning is a process of providing input data as
well as correct output data to the machine learning model. The aim of a
supervised learning algorithm is to find a mapping function to map the input
variable(x) with the output variable(y). In the real-world, supervised learning
can be used for Risk Assessment, Image classification, Fraud Detection, spam
filtering, etc. In supervised learning, models are trained using labelled
dataset, where the model learns about each type of data. Once the training
process is completed, the model is tested on the basis of test data (a subset of
the training set), and then it predicts the output. The working of Supervised
learning can be easily understood by the below example and diagram:
If the given shape has four sides, and all the sides are equal, then it
will be labelled as a Square.
If the given shape has three sides, then it will be labelled as a triangle.
16
If the given shape has six equal sides, then it will be labelled as
hexagon.
Now, after training, we test our model using the test set, and the task of the
model is to identify the shape. The machine is already trained on all types of
shapes, and when it finds a new shape, it classifies the shape on the bases of
a number of sides, and predicts the output.
17
Fig. 5 Types of supervised Machine learning
Regression
Linear Regression
Regression Trees
Non-Linear Regression
Bayesian Linear Regression
Polynomial Regression
Linear regression is one of the easiest and most popular Machine Learning
algorithms. It is a statistical method that is used for predictive analysis. Linear
regression makes predictions for continuous/real or numeric variables such
as sales, salary, age, product price, etc. Linear regression algorithm shows a
linear relationship between a dependent (y) and one or more independent (y)
variables, hence called as linear regression. Since linear regression shows the
linear relationship, which means it finds how the value of the dependent
18
variable is changing according to the value of the independent variable. The
linear regression model provides a sloped straight line representing the
relationship between the variables. Consider the below image:
19
Simple Linear Regression: If a single independent variable is
used to predict the value of a numerical dependent variable, then such
a Linear Regression algorithm is called Simple Linear Regression.
Multiple Linear Regressions: If more than one independent variable
is used to predict the value of a numerical dependent variable, then
such a Linear Regression algorithm is called Multiple Linear
Regression.
20
Fig 8 Negative Linear Relationship
Cost function
The different values for weights or coefficient of lines (a0, a1) gives
the different line of regression, and the cost function is used to
estimate the values of the coefficient for the best fit line.
Cost function optimizes the regression coefficients or weights. It
measures how a linear regression model is performing.
We can use the cost function to find the accuracy of the mapping
function, which maps the input variable to the output variable. This
mapping function is also known as Hypothesis function.
For Linear Regression, we use the Mean Squared Error (MSE) cost
function, which is the average of squared error occurred between the
21
predicted values and actual values. It can be written as: For the above
linear equation, MSE can be calculated as:
Where,
N=Total number of observations
Yi = Actual value
(a1xi+a0) = Predicted value.
Classification
Classification algorithms are used when the output variable is
categorical, which means there are two classes such as Yes-No,
Male-Female, True-false, etc.
Random Forest
Decision Trees
Logistic Regression
Support vector Machines
22
problem of over fitting. The below diagram explains the working of the
Random Forest algorithm:
23
Land Use: We can identify the areas of similar land use by
this algorithm.
Marketing: Marketing trends can be identified using this
algorithm.
24
Decision Tree Classification Algorithm: -
25
Why use Decision Trees?
Step-1: Begin the tree with the root node, says S, which contains the
complete dataset.
Step-2: Find the best attribute in the dataset using Attribute Selection
Measure (ASM).
Step-3: Divide the S into subsets that contains possible values for the
best attributes.
Step-4: Generate the decision tree node, which contains the best
attribute.
Step-5: Recursively make new decision trees using the subsets of the
dataset created in step -3. Continue this process until a stage is reached
where you cannot further classify the nodes and called the final node
26
as a leaf node. Finally, the decision node splits into two leaf nodes
(Accepted offers and Declined offer). Consider the below diagram:
Logistic Regression
Logistic Regression is a significant machine learning algorithm
because it has the ability to provide probabilities and classify new data
using continuous and discrete datasets.
Logistic Regression can be used to classify the observations using
different types of data and can easily determine the most effective
variables used for the classification. The below image is showing the
logistic function:
27
3.2.5.2 Unsupervised Machine Learning:
Unsupervised learning cannot be directly applied to a regression or
classification problem because unlike supervised learning, we have the input
data but no corresponding output data. The goal of unsupervised learning is
to find the underlying structure of dataset, group that data according to
similarities, and represent that dataset in a compressed format.
K-means clustering
KNN (k-nearest neighbors)
28
Hierarchal clustering
Anomaly detection
Neural Networks
Principle Component Analysis
Independent Component Analysis
Apriori algorithm
Singular value decomposition
29
Apart from these general usages, it is used by the Amazon in its
recommendation system to provide the recommendations as per the past
search of products. Netflix also uses this technique to recommend the movies
and web-series to its users as per the watch history. The below diagram
explains the working of the clustering algorithm. We can see the different
fruits are divided into several groups with similar properties.
30
CHAPTER. 4
4. IMPLIMENTATION
4.1 CODE
import matplotlib. pyplot as plt
def plot_scatter_chart(df,location):
bhk2=df[(df.location==location)&(df.BHK==2)]
bhk3=df[(df.location==location)&(df.BHK==3)]
plt.rcParams['figure.figsize']=(15,10)
plt.scatter(bhk2.total_sqft,bhk2.price,color='Blue',label='2 BHK',s=50)
plt.scatter(bhk3.total_sqft,bhk3.price,color='green',marker='+',label='3
BHK',s=50)
plt.xlabel('Total Square Foot')
plt.ylabel('Price')
plt.title(location)
plt.legend()
plot_scatter_chart(data3,"Rajaji Nagar")
def remove_bhk_outliers(df):
exclude_indices=np.array([])
for location, location_df in df.groupby('location'):
bhk_sats={}
for BHK,BHK_df in location_df.groupby('BHK'):
bhk_sats[BHK]={
'mean':np.mean(BHK_df.price_per_sqft),
'std':np.std(BHK_df.price_per_sqft),
'count':BHK_df.shape[0]
}
for BHK,BHK_df in location_df.groupby('BHK'):
stats=bhk_sats.get(BHK-1)
if stats and stats['count']>5:
31
exclude_indices=np.append(exclude_indices,BHK_df[BHK_df.price_per_s
qft<(stats['mean'])].index.values)
return df.drop(exclude_indices,axis='index')
data4=remove_bhk_outliers(data3)
data4.shape
32
CHAPTER 5
5 RESULT ANALYSIS
5.1 VISUALIZATION INSIGHTS:
2BHK Preference:
The observation that most houses sold are 2BHK suggests that buyers may
prefer smaller-sized homes, possibly due to factors such as affordability,
family size, or lifestyle preferences.
Location Diversity:
With houses from 255 different locations, 'Whitefield' and 'Sarjapur Road'
emerge as popular areas. This information is valuable for understanding
market demand and can aid in targeted marketing or investment decisions.
33
Distribution Plots:
The distribution plots for 'bath', 'bhk', 'price', and 'total_sqft' provide insights
into the spread and variability of these features. Understanding their
distributions can help in identifying outliers, understanding central
tendencies, and assessing data quality.
Train-Test Split and Model Building:
Data Splitting:
The dataset is split into training and testing sets, with 80% of the data used
for training and 20% for testing. This ensures that the model's performance is
evaluated on unseen data, providing a more accurate assessment of its
generalization ability.
Model Selection:
Three regression models - Linear Regression, Lasso Regression, and Ridge
Regression - are chosen for predicting house prices. These models offer
different approaches to regression and can capture different aspects of the
data's underlying relationships.
Preprocessing:
One-hot encoding is used to handle the categorical feature 'location', while
standard scaling ensures that all features are on a similar scale, preventing
any particular feature from dominating the model training process.
Evaluation Metric:
R2 score, also known as the coefficient of determination, is employed as the
evaluation metric. It represents the proportion of the variance in the
dependent variable (house prices) that is predictable from the independent
variables.
34
Fig. 17 Evaluation Metric
Result Analysis:
Model Performance:
Linear Regression and Ridge Regression exhibit similar performance, with
R2 scores of around 0.82. This indicates that approximately 82% of the
variance in house prices is captured by these models.
Impact of Regularization:
Lasso Regression, which applies L1 regularization, slightly underperforms
compared to the other two models. The negligible difference in performance
between Ridge and Linear Regression suggests that regularization might not
significantly affect model performance in this scenario.
35
effectiveness of the chosen regression models in predicting house prices. The
insights gleaned from data visualization aid in understanding market
dynamics, while model evaluation provides valuable feedback for refining
model selection and preprocessing techniques.
5.2 ADVANTAGES
36
projects. It can be integrated with databases, web frameworks, and
cloud services, allowing for end-to-end development and deployment
of predictive models.
Machine Learning Ecosystem: Python's machine learning
ecosystem is well-established and constantly evolving. It offers state-
of-the-art algorithms, techniques, and methodologies for solving
predictive modelling problems, including house price prediction.
Interpretability: Python-based machine learning models are often
highly interpretable, allowing stakeholders to understand the factors
driving predictions. This transparency is crucial, especially in real
estate, where buyers, sellers, and agents seek to understand the
rationale behind house price estimates.
Open Source: Python is open source and free to use, making it
accessible to everyone. This democratization of technology enables
individuals and organizations of all sizes to leverage machine learning
for various applications, including house price prediction.
5.3 DISADVANTAGES
While Python offers numerous advantages for house price
prediction, there are also some potential disadvantages to consider:
37
GIL Limitation: Python's Global Interpreter Lock (GIL) can hinder
multithreaded performance, particularly in CPU-bound tasks. While
libraries like NumPy and Pandas can offload computation to
optimized C or Fortran code, certain operations may still be affected
by the GIL, impacting parallel processing performance.
Dependency Management: Python's dependency management
system, particularly with respect to package versions and
compatibility, can sometimes be challenging. Dependency conflicts
or version mismatches between libraries may arise, requiring careful
management and potentially causing issues with model
reproducibility.
Debugging Complexity: Python's dynamic typing and flexible
syntax, while advantageous for development speed, can sometimes
lead to more challenging debugging processes. Errors may not be
caught until runtime, and troubleshooting issues in complex machine
learning pipelines may require significant effort.
Limited Deployment Options: While Python excels in model
development and experimentation, deploying Python-based machine
learning models into production environments may present
challenges.
Interpretability: While Python-based machine learning models can
offer interpretability, certain advanced techniques such as deep
learning may produce less interpretable models. Understanding and
explaining the predictions of complex models may require additional
effort and expertise.
Security Risks: Python's open-source nature and extensive library
ecosystem can introduce security risks, particularly when using third-
party packages or dependencies. Ensuring the security of machine
learning pipelines and protecting against vulnerabilities requires
careful attention and proactive measures.
38
Learning Curve: While Python's syntax is relatively easy to learn,
mastering the full spectrum of machine learning techniques and
libraries can be challenging. Beginners may face a steep learning
curve, requiring time and dedication to gain proficiency in data
preprocessing, model selection, and evaluation.
5.4 MAINTENANCE
39
of the predictive models, along with the associated data preprocessing
and feature engineering pipelines.
Security Measures: Implement security measures to protect the
integrity and confidentiality of the data used in the prediction system.
Use encryption, access controls, and secure communication protocols
to safeguard sensitive information.
Scalability: Monitor system performance and scalability as the
volume of data and user traffic grows. Optimize code and
infrastructure to handle increasing workloads efficiently and ensure
timely responses to user queries.
Documentation: Maintain comprehensive documentation for the
prediction system, including model specifications, data sources,
preprocessing steps, and evaluation metrics. Document any
40
5.5 APPLICATION
Table 1 Application
41
CHAPTER 6
With several characteristics, the suggested method predicts the property price
in Bangalore. We experimented with different Machine Learning algorithms
to get the best model. When compared to all other algorithms, the Decision
Tree Algorithm achieved the lowest loss and the greatest R-squared. Flask
was used to create the website.
Let's see how our project pans out. Open the HTML web page we generated
and run the app.py file in the backend. Input the property's square footage,
the number of bedrooms, the number of bathrooms, and the location, then
click 'ESTIMATE PRICE.' We forecasted the cost of what may be someone's
ideal home.
The goal of the project "House Price Prediction Using Machine Learning" is
to forecast house prices based on various features in the provided data. Our
best accuracy was around 90% after we trained and tested the model. To make
this model distinct from other prediction systems, we must include more
parameters like tax and air quality. People can purchase houses on a budget
and minimize financial loss. Numerous algorithms are used to determine
house values. The selling price was determined with greater precision and
accuracy. People will benefit greatly from this. Numerous elements that
influence housing prices must be taken into account and handled.
42
REFERENCE
[1] Model “BANGALORE HOUSE PRICE PREDICTION MODEL”
[2] Heroku “Documentation”
[3] Repository: “Web Application” https://github.com/msatmod/Bangalore-
House-Price-Prediction
[4]Repository: “Web Application” https://github.com/Amey-
Thakur/BANGALORE-HOUSE-PRICE-PREDICTION
[5] Pickle ‘’Documentation’
[6] A. Varma, A. Sarma, S. Doshi and R. Nair, "House Price Prediction Using
Machine Learning and Neural Networks," 2018 Second International
Conference on Inventive Communication and Computational Technologies
(ICICCT), 2018, pp. 1936-1939, doi: 10.1109/ICICCT.2018.8473231.
[7] Furia, Palak, and Anand Khandare. "Real Estate Price Prediction Using
Machine Learning Algorithm." e-Conference on Data Science and Intelligent
Computing. 2020.
[8] Musciano, Chuck, and Bill Kennedy. HTML & XHTML: The Definitive
Guide: The Definitive Guide. " O'Reilly Media, Inc.", 2002.
[9] Aggarwal, Shalabh. Flask framework cookbook. Packt Publishing Ltd,
2014.
[10] Grinberg, Miguel. Flask web development: developing web applications
with python. " O'Reilly Media, Inc.", 2018.
[11] Middleton, Neil, and Richard Schneeman. Heroku: up and running:
effortless application deployment and scaling. " O'Reilly Media, Inc.", 2013.
[12]Available:https://www.researchgate.net/publication/347584803_House_
Price_Prediction_using_a_Machine_Learning_Model_A_Survey_of_Literat
ure
[13] House price prediction using a hedonic price model vs an artificial
neural network. American Journal of Applied Sciences. Limsombunchai,
Christopher Gan, and Minsoo Lee. 3:193–201.
43
[14] Joep Steegmans and Wolter Hassink. an empirical investigation of how
wealth and income affect one's financial status and ability to purchase a home.
Journal of Housing Economics. 2017;z36:8–24.
[15] Ankit Mohokar, Nihar Baghat, and Shreyash Mane. House Price
Forecasting Using Data Mining, International Journal of Computer
Applications. 152:23–26.
[16] Joao Gama, Torgo, and Luis. Logic regression using Classification
Algorithms. Intelligent Data Analysis. 4:275-292.
[17] Fabian Pedregosa et al. Python's Scikit-learn library for machine
learning, Journal of Machine Learning Research. 12:2825–830.
[18] Real Estate Economics. Heidelberg, Bork M. and Moller VS, House
Price Forecast Ability: A Factor Analysis. 46:582–611.
[19] Hy Dang, Minh Nguyen, Bo Mei, and Quang Troung. Improvements to
home price prediction methods using machine learning. Precedia
Engineering. 174:433-442.
[20] Atharva Chogle, Priyankakhaire, Akshata Gaud, and Jinal Jain. A article
titled House Price Forecasting Using Data Mining Techniques was published
in the International Journal of Advanced Research in Computer and
Communication Engineering. 6:24-28.
[21] Kai-Hsuan Chu, Li, Li. Prediction of real estate price variation based on
economic parameters, International Conference on. IEEE, Applied System
Innovation (ICASI); 2017.
[22] Subhani Shaik, Uppu Ravibabu. Classification of EMG Signal Analysis
based on Curvelet Transform and Random Forest tree Method. Paper selected
for Journal of Theoretical and Applied Information Technology (JATIT). 95.
[23] Subhani Shaik, Uppu Ravibabu. Classification of EMG Signal Analysis
based on Curvelet Transform and Random Forest tree Method. Paper selected
for Journal of Theoretical and Applied Information Technology (JATIT). 95.
44