0% found this document useful (0 votes)

42 views13 pages

House Price Prediction Using Machine Learning

Uploaded by

Aditya Prasad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views13 pages

House Price Prediction Using Machine Learning

Uploaded by

Aditya Prasad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Proceedings of the 4th International Conference on Signal Processing and Machine Learning

DOI: 10.54254/2755-2721/53/20241426

House price prediction using machine learning

Chenxi Li
School Of International Education, GuangDong University Of Technology, No. 11,
Guangzhou, China

[email protected]

Abstract. The role of the real estate industry in economic development and social progress
reflects the economic well-being of individuals and regions. With the increase of people's income
level, the demand for housing is also increasing. Therefore, making a more accurate house price
forecast will help people make the most correct strategy to buy a house when they need it. This
study focuses on house price prediction in King County, Washington, a diverse real estate market.
Leveraging machine learning models such as linear regression, random forest, neural networks
and XGBoost, these supervised learning models are used to delve into house price forecasting.
This research includes random forest and XGBoost, are implemented using Scikit-Learn tools.
Besides, the Feedforward Neural Network is introduced with the drop out layer in order to reduce
the occurrence of model fitting situations. The findings reveal that XGBoost achieves the highest
accuracy, making it well-suited for precise price predictions. Additionally, the research identifies
grade, sqft_living, and latitude as the three most influential features significantly affecting house
prices within the dataset.

Keywords: House price prediction, linear regression, random forest, neural networks, XGBoost.

1. Introduction
The real estate industry is crucial for economic development and societal progress, reflecting the
aspirations of individuals and families and the overall economic health of a region. A. H. Maslow states
in A Theory of Human Motivation: “Undoubtedly these physiological needs are the most pre-potent of
all needs” (1943, p.374) [1]. Among physiological needs, shelter (house), as a necessity, is essential for
people. Hence, it is essential for people like policymakers, real estate professionals, and homeowners to
comprehend the factors that influence housing prices. In this context, the study of house price prediction
has gained significant attention, given its potential to offer insights into the factors that drive housing
market fluctuations and their importance for various stakeholders.
This paper focuses on the task of house price prediction in King County, Washington. King County,
situated in the heart of the Pacific Northwest, represents a diverse and dynamic real estate market,
characterized by a mix of urban, suburban, and rural areas. With its vibrant economy, cultural attractions,
and natural beauty, King County has drawn a diverse population, contributing to the complexity of its
housing market. It is the most populated county in Washington and the 13th largest populated county in
the US [2]. According to Maslow’s hierarchy of needs, the base level of the pyramid is physiological
needs, which include shelter. Thus, there is currently a significant need for residential properties in King
County.

© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).

225
Proceedings of the 4th International Conference on Signal Processing and Machine Learning
DOI: 10.54254/2755-2721/53/20241426

The importance of precise price predictions in the real estate market is profound, as these forecasts
have the potential to significantly influence the decisions of a multitude of stakeholders, including
prospective homebuyers, sellers, real estate agents, investors, and policymakers. An accurate and
reliable predictive model is, therefore, a cornerstone of informed decision-making in the housing
industry.
This research goes beyond previous economic analysis and uses well-founded models in machine
learning to explore this problem. Four supervised learning models, linear regression, RF, ANN and
XGBoost will be used to predict the relationship between different features of the house and the price
of the house. The result will conclude the important features that influence the house price. This will
provide a guide for future house purchases or investments.
The following is how the paper is organized: The section 2 will go over our data processing, including
data selection and pre-processing. Section 3 is about the methodology of our research. It will include
the four models we used to process our data. Section 4 will illustrate further experiments based on the
results produced by our models. Finally, the conclusions are in section 5.

2. Data Processing

2.1. Data Preprocessing

The "House Sales in King County, USA" [3] dataset contains 21,611 pieces of data and 21 features
representing house prices from May 2014 to May 2015. These features (except the price itself), were
used to predict the house price.
The next step is to investigate and clean the dataset. Features with missing data will be removed (we
are unable to interpolate the method to insert values because of its randomness). Also features such as
id and date will be dropped, because they are apparently subjective factors that will not fit the objective
prediction of house price.
Since there are unit differences in feature data, failure to standardize it will lead to a decrease in
accuracy and speed of model training. So the standardization of the data is to make sure that the data is
on a relatively similar scale. Its goal is to transform the data into a normal distribution with a mean and
standard deviation of 0 and 1, respectively. The formula that follows can be used to express
standardization:
𝑋𝑖−𝜇
𝑍= (1)
𝜎

After the process with the dataset and features, the final dataset contains 21,611 data with 19 features
(including price), 6 of which are categorical values and 12 of which are numerical values. Remove
examples 12 and 19 due to the absence of sqft_above feature. Additionally, exclude the features id and
date on account of subjectivity. Each property is described in depth in Table 1.
Table 1. List of Attributes (list few important attributes)
Attribute Data
Description
Name Type
bedrooms int64 Number of bedrooms
bathrooms float64 Number of bathrooms
sqft_living int64 Size of the apartment's internal living area in square feet
A scale from 1 to 13 is used, with 1–3 representing poor building
grade int64 construction and design, 7–11 representing average and 11–13
representing excellent building construction and design.
sqft_above int64 The area of inner housing that is above ground measured in square feet
yr_built int64 The year the house was initially built
lat float64 Latitude

226
Proceedings of the 4th International Conference on Signal Processing and Machine Learning
DOI: 10.54254/2755-2721/53/20241426

Table 1. (continued)
long float64 Longitude
The area of the interior where the 15 closest neighbors' dwelling quarters
sqft_living15 int64
are located

2.2. Data Analysis

Data exploration is a crucial phase in the prediction model that may assist choose the best model more
precisely. Also, the exploration will help the researchers to easily find the inherent link among data.
Figure 1 shows the distribution of different grades in the King County region. Also with the Lat,
Long, Price, and Year information. (We just show the Grade 8-12 distribution here). From Figure 1, the
map shows the houses distributed in the coastal areas are generally of higher grade and density. Besides,
Figure 2 (Blue being cheap and red being expensive) shows the price distribution in the King County
region. It becomes clear from this map that house prices are higher in the downtown area (near Elliott
Bay in the top left of the map) and in Redmond (east of Lake Washington). House prices are intermediate
north and south of downtown. In addition, we can see that waterfront property is more expensive [4].
In addition, Figure 3 gives the correlation between different features, which clearly shows the degree
of the correlation by judging the color. More specifically, Figure 4 shows the features whose correlation
is all greater than 0.5 which has a big positive influence on the house price.

Figure 1. Grade Distribution (Original) Figure 2. Price Distribution (Original)

Figure 3. Features correlation heatmap (Original)

227
Proceedings of the 4th International Conference on Signal Processing and Machine Learning
DOI: 10.54254/2755-2721/53/20241426

Figure 4. Features correlation > 0.5 (Original)

3. Methodology

3.1. Linear regression (LR)

Linear regression is a method employed to establish relationships between a dependent variable and one
or more independent variables. The linear regression model's connection may be used to predict the
dependent variable by varying the independent variables. It is a very efficient and effective model to
help predict the relationships in machine learning.
Linear regression encompasses two principal forms: simple linear regression, applicable when there
is only one independent variable, and multiple linear regression, employed when there are multiple
independent variables. In the situation of house price prediction, because there are multiple features that
may influence the house price, a multiple linear regression model will be deployed.
The hypothesis of the linear regression model is as follows:
ℎ𝜃 (𝑥) = 𝜃0 + 𝜃1 𝑥1 + 𝜃2 𝑥2 +. . . + 𝜃𝑛 𝑥𝑛 (1)
where h is the target, which is the price of the house. 𝑥1 , 𝑥2 , …, 𝑥𝑛 are the independent variables or
features, and 𝜃0 , 𝜃1 , 𝜃2 , …, 𝜃𝑛 are the coefficients of the independent variables 𝑥1 , 𝑥2 , …, 𝑥𝑛 .
The target of this analysis will be the price of the house, which will be presented in the training
dataset. The whole linear regression model aims to fit a curve to the provided dataset while minimizing
errors [5].

3.2. Random forest (RF)

RF is a type of ensemble model that aggregates numerous decision trees' predictions to produce a more
precise ultimate forecast. Based on prior research, RF algorithm is a proven strong technique [6]. S.
Raschka and V. Mirjalili summarised the RF method in the following procedures [7]:
1. Create a random bootstrap sample of size n (choose n house price samples at random from the
training set with replacement).
2. 2. Create a decision tree from the bootstrap sample, using the following nodes:
(a) Select d house features at random (from a total of 18 house features in our research) without
substitution.
(b) Divide the node using the feature that offers the optimal split based on the objective function,
such as maximizing information gain.
3. Repeat the procedure 1–2 k times more.
4. Use the majority vote to give the class label based on the predictions of each tree.
To train the model in this study, we followed the methods outlined below:
Creating a Random Forest Regression Model: We utilize the RandomForestRegressor class from the
sklearn library to create a random forest regression model.
Using RandomForestRegressor method from the sklearn library to choose the best combination of
parameters:
'n_estimators': 200, 'min_samples_split': 5, 'min_samples_leaf': 1
Model Training: We train the random forest regression model using the training dataset.

228
Proceedings of the 4th International Conference on Signal Processing and Machine Learning
DOI: 10.54254/2755-2721/53/20241426

Model Prediction: We use the test dataset to make predictions with the trained model, obtaining
predicted house prices.

3.3. Artificial Neural network (ANN)

The artificial neural network (ANN) is a supervised learning approach that is based on the operation of
neural networks that are biological [8]. It is made up of synthetic neurons, each of which can receive
input and output signals from several other neurons.
Neurons are usually divided into several layers. The input is transferred from the input layer to the
output layer through several layers. Neural networks are frequently utilized to tackle a wide range of
issues that traditional rule-based programming cannot. Therefore, it is a good comparison object for
other methods. In this research, the ANN method is used to compare other methods to show the
performance of other methods.
The neural network model used for training has a total of three hidden layers. We performed a total
of 500 iterations during training [8]. A multi-layer feedforward neural network with an input layer, three
hidden layers, a dropout layer, and an output layer is the structure of the neural network model. The
specific parameter in each layer and structure Figure 5 will show below:
⚫ Input Layer:
The input layer's number of neurons equals the number of features. We use our 18 house features
here.
⚫ Hidden Layers:
There are three hidden layers with 32, 64, and 128 neurons respectively.
To introduce non-linearity, each hidden layer employs the Rectified Linear Activation (ReLU)
function.
⚫ Dropout Layer:
To reduce overfitting, a Dropout layer is inserted after the third hidden layer.
The dropout probability is set to 0.2, which means that during training, each neuron has a 20%
probability of being randomly turned off, helping to improve the model's generalization.
⚫ Output Layer:
For regression tasks (predicting housing prices), the output layer is made out of a single neuron.

Figure 5. Structure of ANN [8]

229
Proceedings of the 4th International Conference on Signal Processing and Machine Learning
DOI: 10.54254/2755-2721/53/20241426

3.4. XGBoost
XGBoost is one of the applications in gradient boosting machines (gbm), which is known as one of the
top performing supervised learning algorithms. It is applicable to both regression and classification
issues [9]. The structure of XGBoost is showed in Figure 6 [10]. The way XGBoost works is as follow
as well:
⚫ Initializing the Model:
Initialize a model, typically a decision tree with only one leaf node. The initial leaf node's prediction
is set to the average of all target values in the training data:
𝐹0 (𝑥) = ∑ 𝑦𝑖 /𝑚, where i ranges from 1 to m ( m is the number of training samples, Seven tenths of
21,611 in our research)

Figure 6. Schematic of Xgboost trees [10]

⚫ Calculating Initial Predictions:

Use the initialized model to calculate initial predictions:
𝐹0 (𝑥) = ∑ 𝑦𝑖 /𝑚, where i ranges from 1 to m.
⚫ Computing Negative Gradients of the Loss Function:
Calculate the loss function's negative gradients with regard to the present model.
The loss function for regression issues is often mean squared error (MSE):
Negative Gradient(𝑔𝑖 ) = −𝜕𝐿(𝑦𝑖 , 𝐹(𝑥𝑖 ))/𝜕𝐹(𝑥𝑖 ) = 𝑦𝑖 − 𝐹0 (𝑥𝑖 ), where i ranges from 1 to m.
⚫ Fitting a New Tree Model:
Fit a new decision tree model to approximate the negative gradients of the loss function. This new
tree model is referred to as a "weak learner."
When fitting the new tree model, XGBoost uses first-order and second-order gradient information to
select the best splitting points that minimize the loss function.
⚫ Updating the Model:
Update the model by adding the predictions of the new tree model to the previous model:
𝐹1 (𝑥) = 𝐹0 (𝑥) + 𝜂1 ∗ ℎ1 (𝑥), where h1 (x) is the prediction of the new tree model, and η is the
learning ratio.
The learning ratio (η) controls the impact of each new model, and it's typically set to a small value.
(After using GridSearchCV method from the sklearn library we set the η as 0.1).
⚫ Repeat Steps 3-5:
Repeat the process of calculating negative gradients, fitting a new tree model, and updating the model
iteratively until a preset number of repetitions is reached or other stopping requirements are met.
⚫ Introducing Regularization Terms:
Introduce regularization terms, such as L1 and L2 regularization, to control the complexity of the
model and prevent overfitting.

230
Proceedings of the 4th International Conference on Signal Processing and Machine Learning
DOI: 10.54254/2755-2721/53/20241426

Regularization Term (𝛺(𝐹(𝑥))) = 𝛾 ∗ 𝛷(𝑇) + ½ ∗ 𝜆 ∗ 𝛴(𝑤𝑖 ²)

Where Φ(T) represents the number of leaf nodes in the tree (we used get_booster().get_dump() to
get the tree structure of the model and len(tree.split('\n')) to count the number of leaf nodes per tree in
the research), wᵢ represents the score of each leaf node, and γ and λ are regularization hyperparameters.
⚫ Model Output:
The final model output is the sum of predictions from all weak learners, each scaled by the learning
rate:
𝐹(𝑥) = 𝐹0 (𝑥) + 𝜂1 ∗ ℎ1 (𝑥) + 𝜂2 ∗ ℎ2 (𝑥) + . . . + 𝜂𝑡 ∗ ℎ𝑡 (𝑥) (2)

The final model output is used for making predictions in regression problems.

1. Minimizing the Loss Function:

XGBoost's objective is to minimize the loss function, which includes the loss value and regularization
terms.
Objective Function (𝑂𝑏𝑗(𝜃)) = 𝐿(𝑦, 𝐹(𝑥)) + 𝛺(𝐹(𝑥)) (3)
In our research, we build our model as follow:
2. Build the initial tree model:
When creating an XGBoost regression model, we initializes a model with a simple tree as the baseline
model.
3. Iterative training:
GridSearchCV is used to search for the best hyper parameter configuration as part of XGBoost's
training process. During the search process, multiple tree models are built and iteratively trained
according to the given parameter grid.
The GridSearchCV in the code fits the model for each parameter configuration and calculates the
corresponding mean square error (gradient of the loss function). The model building and updating
process in these iterations is similar to the training process of XGBoost.
4. Combination of final models:
In the code, grid_search.best_estimator_ is used to get the best model in the search process
(combining multiple tree models). The best parameters of the model are obtained: learning_rate: 0.1,
max_depth: 4, mid_child_weight:3, n_estimators: 300. This optimal model will be used in the final
forecast.
5. Final prediction:
best_xgb_model.predict(X_test) uses the best model to make the ultimate prediction, which involves
adding the predictions of multiple tree models and weighting them by the learning rate.

4. Experiment and Analysis

This paper uses kaggle.com to obtain the housing price data of King County in one year and the property
characteristics of houses as the experimental objects. Besides, this paper predicts the housing price
through the different degree of influence of different property characteristics on the housing price. In
the end, after the horizontal comparison of different prediction algorithms, an overall ranking of the
influence of house characteristics on house prices is obtained.

4.1. Experiment Details

The dataset will be divided into two parts for the experiment: 70% of the data will be used to train the
model, and 30% will be used to test the model.

4.1.1. The Result Of Linear regression. Before running the linear regression model, the relationship
between different factors has been analyzed (Figure 7). Most of the relationships between price and the

231
Proceedings of the 4th International Conference on Signal Processing and Machine Learning
DOI: 10.54254/2755-2721/53/20241426

area of different parts of the house are linear, which satisfied the prerequisite for running a linear
regression model.

Figure 7. Correlation between price and area (Original)

Following the analysis of the relationship between price and area, a simple LR model for the price
and living area of the house was built by using gradient descent to find the minimum cost function
(Figure 8).

Figure 8. Linear regression model for price and living area (Original)

Subsequently, after evaluating the performance of the simple linear regression model, we proceeded
to develop a multiple linear regression model based on our hypothesis using the Ordinary Least Squares
(OLS) regression test. The resulting model demonstrated a commendable R-squared value of 0.706,
affirming the appropriateness of the linear regression model for this research.

4.1.2. The Result Of Random forest and XGBoost. In the process of hyperparameter optimization
of Random Forest and XGBoost models, we use the RandomizedSearchCV function from python's
scikit-learn library to achieve better performance of our model. The following two graphs (Figure 9 and
Figure 10) show the performance improvement of the two algorithms after 100 iterations. It can be seen
that their RMSLE starts to decline significantly and becomes stable at about 80 iterations, and the
hyperparameters reach the optimal level.

232
Proceedings of the 4th International Conference on Signal Processing and Machine Learning
DOI: 10.54254/2755-2721/53/20241426

The result of our research gives an exceptional R-squared value of 0.878 in random forest algorithm
and 0.888 in XGBoost algorithm.

Figure 9. Random Forest Learning Curve (Original)

Figure 10. XGBoost Learning Curve (Original)

4.1.3. The Result of Artificial Neural network. In ANN, we use ReLU function as activation function,
because its characteristics are helpful to train deep neural networks, and the calculation of ReLU is
simpler and more efficient than other activation functions. Because it does not cause gradient
disappearance problems, neural networks can be trained more easily. The TensorFlow library is used
for hyperparameter optimization.
The Figure 11 below is our training of ANN learning rate to find the best learning rate. It can be seen
that when the learning rate is 0.1, the MSE quickly reaches a relatively stable value, but it is easy to
miss the local minimum and the best parameter combination. When the learning rate is 0.0001, it takes
a long time to reach the relatively stable value of MSE, and the time cost is relatively high. Therefore,
we choose 0.001 with moderate decline speed as the learning rate for the configuration of
hyperparameters.
The result of the ANN with 0.001 learning ratio gives 0.846 in R-squared value.

233
Proceedings of the 4th International Conference on Signal Processing and Machine Learning
DOI: 10.54254/2755-2721/53/20241426

Figure 11. Different learning rates curve (Original)

4.2. Overall comparison

Table 2. Comparison of Accuracy
Model type R-squared RMSE MSE
Linear regression 0.706 210649.7710 44373326009.8143
Random forest 0.878 136170.2574 18542338996.4575
Neural network 0.846 143075.1230 22825808222.9256
XGBoost 0.888 130281.3637 16973233719.7849

The above Table 2 shows the comparison between the algorithms that are used in this research, where
it is found that XGBoost gives the highest accuracy, 88.8 percent. While the Linear regression is the
lowest at 70.6 percent with the Neural network in 84.6 percent and Random forest in 87.8 percent. The
degree of fit of the four algorithms is also shown below in Figure 12.
Overall, XGBoost and RF are generally more complex nonlinear models in terms of complexity, and
they can better capture complex relationships and nonlinear patterns in the data (because the relationship
between house prices and features is generally complex and non-linear). In contrast, linear regression,
which performs poorly when dealing with non-linear relationships. ANN is somewhere in between, it
can learn nonlinear relationships, but its performance is highly dependent on the choice of network
structure and parameter adjustment (given that the neural network in our study only uses a relatively
simple three-layer feedforward neural network, the parameters are default, and the amount of data is
limited).
High adaptive advantage over integrated methods: XGBoost and RF are both integrated learning
methods that improve performance by combining multiple weak learners. This makes them more robust
and predictive in modeling, able to deal effectively with noise and outliers.
Properties of the data set itself: Given the complex non-linear relationships, interactions, and
importance of features in house price data, XGBoost and RF are often better able to fit these properties.
ANN can also learn nonlinear relationships, but requires more data and tuning to maximize its
performance, and the number of data sets here may not be enough for ANN to further learn to improve
model performance.

234
Proceedings of the 4th International Conference on Signal Processing and Machine Learning
DOI: 10.54254/2755-2721/53/20241426

Figure 12. Four methods fit degree (Original)

While running the RF model, LR model and XGBoost model (ANN model generally not gives the
importance of different features), we conducted an assessment of feature importance within our dataset.
We used the coefficients trained by polynomial linear regression as the feature importance; RF-
feature_importances_ and feature_importance were used to obtain the contribution degree of the feature
value in the RF and XGBoost to perform feature ranking as the importance of the feature. The result of
the feature importance comparison shows below (Table 3).
Table 3. Feature Importance
Features Linear regression Random forest XGBoost
Bedrooms 0.06 0.0037 0.0038
Bathrooms 0.07 0.0071 0.0160
Sqft_living 0 0.2731 0.2197
Sqft_lot 0 0.0144 0.0072
Floors 0.01 0.0020 0.0039
Waterfront 0.97 0.0317 0.1151
View 0.09 0.0136 0.0756
Condition 0.04 0.0031 0.0088
Grade 0.16 0.2959 0.3139
Sqft_above 0 0.0238 0.0173

235
Proceedings of the 4th International Conference on Signal Processing and Machine Learning
DOI: 10.54254/2755-2721/53/20241426

Table 3. (continued).

Sqft_basement 0 0.0070 0.0072

Yr_built 0 0.0276 0.0273
Yr_renovated 0 0.0024 0.0058
Zipcode 0 0.0163 0.0198
Lat 1.0 0.1649 0.0802
Long 0.36 0.0667 0.0459
Sqft_living15 0 0.0123 0.0272
Sqft_lot15 0 0.0144 0.0054

Leveraging these results, we computed an aggregate ranking of feature importance by assigning

weights based on the accuracy of each algorithm. As displayed in Figure 13 below, our analysis reveals
that the three most influential features affecting house prices in our dataset are in decreasing order of
significance: "lat", "waterfront" and the "grade" of the house.

Figure 13. Overall feature ranking (Original)

5. Conclusion
This article examines three different house price prediction models and compares their accuracy. They
are linear regression, random forest and network neural. This article explores the impact of each feature
on price at same time.
In the experiment, we can see that XGBoost has the highest accuracy, linear regression has the lowest.
Therefore, for predicting house prices, considering accuracy, the XGBoost method would be more
suitable. In comparison with neural networks, random forests and XGBoost although linear regression
has the lowest accuracy, its accuracy is relatively close to the other three methods. Considering that the
linear regression method is relatively simple and straightforward, when time is limited and accuracy
requirements are not so high, the linear regression method will be an option. More research is needed in
this part.
Regarding the importance of features, we can see that the three most influential features affecting
house prices in our dataset are lat, waterfront and grade in the experiment. In other words, for the houses
we choose, their latitude, waterfront condition and house grade can most affect their prices. This result
may hold true for other houses and be helpful in their price predictions. However, more research is
needed to confirm this.

236
Proceedings of the 4th International Conference on Signal Processing and Machine Learning
DOI: 10.54254/2755-2721/53/20241426

Future research should focus on the comparison of the time and space complexity of random forest
and linear regression and the impact of the features of houses in other areas on their prices.

References
[1] Maslow, A. H. (1943). A theory of human motivation. Psychological Review, 50 (4), 370-96.
[2] “King County, Washington”, wikipedia, 31 October 2023,
https://en.wikipedia.org/wiki/King_County,_Washington
[3] House Sales in King County, USA,
https://www.kaggle.com/datasets/harlfoxem/housesalesprediction/data
[4] Visualization-on-a-Map https://www.kaggle.com/code/chrisbronner/regression-r2-0-82-and-
map-visualization#3.
[5] N. N. Ghosalkar and S. N. Dhage, "Real Estate Value Prediction Using Linear Regression," 2018
Fourth International Conference on Computing Communication Control and Automation
(ICCUBEA), Pune, India, 2018, pp. 1-5, doi:10.1109/ICCUBEA.2018.8697639
[6] Breiman L. Random Forests. SpringerLink. https://doi.org/10.1023/A:1010933404324 (accessed
September 11, 2019).
[7] Raschka S, Mirjalili V. Python machine learning: Machine learning and deep learning with
Python, scikit-learn, and TensorFlow. 2nd ed. Birmingham: Packt Publishing; 2017.
[8] Cireşan, D.C.; Meier, U.; Gambardella, L.M.; Schmidhuber, J. Deep, big, simple neural nets for
handwritten digit recognition. Neural Comput. 2010, 22, 3207–3220.
[9] T. Chen, C. Guestrin XGBoost: A scalable tree boosting system, Association for Computing
Machinery (2016), pp. 785-794
[10] W. Dong, Y. Huang, B. Lehane, G. Ma XGBoost algorithm-based prediction of concrete electrical
resistivity for structural health monitoring and Automation in Construction, 114 (2020), p.
103155

237

DS Capstone Presentation
No ratings yet
DS Capstone Presentation
46 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
Real Estate Price Prediction Based On Linear Regre
No ratings yet
Real Estate Price Prediction Based On Linear Regre
10 pages
Forecasting The Real Estate Housing Prices Using A
No ratings yet
Forecasting The Real Estate Housing Prices Using A
19 pages
Project, Appraisal, Planning & Control
No ratings yet
Project, Appraisal, Planning & Control
23 pages
Industrial Internship
No ratings yet
Industrial Internship
27 pages
Decision Tree - A Step-by-Step Guide
No ratings yet
Decision Tree - A Step-by-Step Guide
36 pages
JustAcceptedYamuretal
No ratings yet
JustAcceptedYamuretal
18 pages
Draft of PRGT
No ratings yet
Draft of PRGT
8 pages
Applied Econometrics: H S K C, USA
No ratings yet
Applied Econometrics: H S K C, USA
8 pages
Housing
No ratings yet
Housing
6 pages
Artificial Intelligence Approach For Modeling House Price Prediction
No ratings yet
Artificial Intelligence Approach For Modeling House Price Prediction
5 pages
House Price Prediction A Multi-Source Data Fusion Perspective
No ratings yet
House Price Prediction A Multi-Source Data Fusion Perspective
18 pages
House Price Prediction Project Report
No ratings yet
House Price Prediction Project Report
36 pages
Housing Market Research Assignment
No ratings yet
Housing Market Research Assignment
12 pages
VaibhavKumarPPT Modified
No ratings yet
VaibhavKumarPPT Modified
12 pages
AS1 GCS200400 TranTrungTien BI
No ratings yet
AS1 GCS200400 TranTrungTien BI
44 pages
Report
No ratings yet
Report
7 pages
Girish Chadha Capstone Final Report Submission 16 Jul 23
No ratings yet
Girish Chadha Capstone Final Report Submission 16 Jul 23
33 pages
Analyze House Price For King County
100% (1)
Analyze House Price For King County
15 pages
Crop and Nutrient Recommendation System Using Machine Learning For Precision Agriculture
No ratings yet
Crop and Nutrient Recommendation System Using Machine Learning For Precision Agriculture
6 pages
Unit I
No ratings yet
Unit I
23 pages
Personality Prediction
No ratings yet
Personality Prediction
4 pages
House Price Prediction Using A Machine Learning Mo
No ratings yet
House Price Prediction Using A Machine Learning Mo
10 pages
Geographical Big Data Analysis of House Price in Shenzhen For Adaptation in Malaysia
No ratings yet
Geographical Big Data Analysis of House Price in Shenzhen For Adaptation in Malaysia
10 pages
A GIS Framework To Forecast Residential Home Prices
No ratings yet
A GIS Framework To Forecast Residential Home Prices
27 pages
Web Application With Machine Learning For House Price Prediction
No ratings yet
Web Application With Machine Learning For House Price Prediction
20 pages
HOUSE PRICE PREDICTION Shreya Majumder012345678910111213141516171819 - Sign
No ratings yet
HOUSE PRICE PREDICTION Shreya Majumder012345678910111213141516171819 - Sign
21 pages
JawaherAlbaddawi 8221614forecasting Housing Prices Using A Random Forest Machine Learning Model
No ratings yet
JawaherAlbaddawi 8221614forecasting Housing Prices Using A Random Forest Machine Learning Model
7 pages
A Comparative Study For Predicting House Price Based On Machine Learning
No ratings yet
A Comparative Study For Predicting House Price Based On Machine Learning
7 pages
Machine Learning Based Predicting House Prices Using Regression Techniques
No ratings yet
Machine Learning Based Predicting House Prices Using Regression Techniques
7 pages
Fyp Proposal
No ratings yet
Fyp Proposal
3 pages
Synopsis Format1 PDF
No ratings yet
Synopsis Format1 PDF
6 pages
FML PROJECT Diya
No ratings yet
FML PROJECT Diya
9 pages
Khare 2021 IOP Conf. Ser. Mater. Sci. Eng. 1099 012053
No ratings yet
Khare 2021 IOP Conf. Ser. Mater. Sci. Eng. 1099 012053
15 pages
House Price Prediction Using Machine Learning Algorithm - The Case of Karachi City, Pakistan
No ratings yet
House Price Prediction Using Machine Learning Algorithm - The Case of Karachi City, Pakistan
5 pages
Paper 4404
No ratings yet
Paper 4404
10 pages
10.2478 - Remav 2022 0025
No ratings yet
10.2478 - Remav 2022 0025
16 pages
Ahtesham 2020
No ratings yet
Ahtesham 2020
5 pages
18BCS115
No ratings yet
18BCS115
25 pages
House Price - Prediction
No ratings yet
House Price - Prediction
4 pages
Understanding Housing Prices Using Geographic Big Data: A Case Study in Shenzhen
No ratings yet
Understanding Housing Prices Using Geographic Big Data: A Case Study in Shenzhen
20 pages
120 GSJ10713
No ratings yet
120 GSJ10713
8 pages
Facility Location and Decision Tree
No ratings yet
Facility Location and Decision Tree
17 pages
SSRN Id3635948
No ratings yet
SSRN Id3635948
8 pages
House Price Prediction Based On Machine Learning: A Case of King County
No ratings yet
House Price Prediction Based On Machine Learning: A Case of King County
9 pages
Project1 Report1
No ratings yet
Project1 Report1
3 pages
Ijcse Icter P113
No ratings yet
Ijcse Icter P113
5 pages
Real Estate Price Prediction Using A Logistic Regression Model
No ratings yet
Real Estate Price Prediction Using A Logistic Regression Model
8 pages
Fin Irjmets1685380014
No ratings yet
Fin Irjmets1685380014
9 pages
Predictive Analytics For Housing Market Trends and Valuation
No ratings yet
Predictive Analytics For Housing Market Trends and Valuation
6 pages
Real Estate Price Prediction
No ratings yet
Real Estate Price Prediction
7 pages
Capstone Project PPT by Roshan Padhi
No ratings yet
Capstone Project PPT by Roshan Padhi
9 pages
Machine Learning Algorithmsfor Predictionofmobilephone Price
No ratings yet
Machine Learning Algorithmsfor Predictionofmobilephone Price
9 pages
House Price Prediction Using Machine Learning
No ratings yet
House Price Prediction Using Machine Learning
6 pages
Housepricepdf 2
No ratings yet
Housepricepdf 2
3 pages
CSIC 6132 排版870 878
No ratings yet
CSIC 6132 排版870 878
9 pages
SSRN Id3565512
No ratings yet
SSRN Id3565512
5 pages
Predicting House Prices Using Regression Techniques: Problem Statement: Problems Faced During Buying A House
No ratings yet
Predicting House Prices Using Regression Techniques: Problem Statement: Problems Faced During Buying A House
20 pages
ML Project CLG
No ratings yet
ML Project CLG
62 pages
Iamsp 2
No ratings yet
Iamsp 2
8 pages
7z1018 CW Example Predicting House Prices in King County
No ratings yet
7z1018 CW Example Predicting House Prices in King County
16 pages
Topic01 Classification Basics Jiawei Han Extra
No ratings yet
Topic01 Classification Basics Jiawei Han Extra
198 pages
Decision Analysis
No ratings yet
Decision Analysis
59 pages
House Pricing Regression
No ratings yet
House Pricing Regression
11 pages
House Price Predicting Model Using
No ratings yet
House Price Predicting Model Using
7 pages
Report On Linear Regression Using R
No ratings yet
Report On Linear Regression Using R
15 pages
DMUU Assignment 1 - GroupC
No ratings yet
DMUU Assignment 1 - GroupC
4 pages
UG 4-1 R19 CSE Syllabus
No ratings yet
UG 4-1 R19 CSE Syllabus
33 pages
1.7 Organizational Planning Tools
No ratings yet
1.7 Organizational Planning Tools
31 pages
Module 4
No ratings yet
Module 4
54 pages
Chap 5 - Product Design (FIX)
No ratings yet
Chap 5 - Product Design (FIX)
80 pages
MN2032 Exam Paper (Final) - May 2024
No ratings yet
MN2032 Exam Paper (Final) - May 2024
11 pages
Aiml Manual
No ratings yet
Aiml Manual
27 pages
D13 Final
No ratings yet
D13 Final
23 pages
Cyber Bullying Detection Using Machine Learning
No ratings yet
Cyber Bullying Detection Using Machine Learning
4 pages
HOUSE PRICE PREDICTION Shreya Majumder
No ratings yet
HOUSE PRICE PREDICTION Shreya Majumder
22 pages
Crime Analysisand Prediction Using Data Mining
No ratings yet
Crime Analysisand Prediction Using Data Mining
8 pages
B15-Content - Analysis - in - Social - Media (1) - Bbhavani
No ratings yet
B15-Content - Analysis - in - Social - Media (1) - Bbhavani
59 pages
Chapter 04 Test Bank - Static - Version1
No ratings yet
Chapter 04 Test Bank - Static - Version1
34 pages
C5 IEEE CreditRiskScoringAnalysisBasedonMachineLearningModels
No ratings yet
C5 IEEE CreditRiskScoringAnalysisBasedonMachineLearningModels
6 pages
grp-3 - Project MNGT - Term Report
No ratings yet
grp-3 - Project MNGT - Term Report
15 pages
Sentiment Analysis of IMDb Movie Reviews
No ratings yet
Sentiment Analysis of IMDb Movie Reviews
6 pages
Decission Making
No ratings yet
Decission Making
4 pages
Implementing A Driver Drowsiness Detection System
No ratings yet
Implementing A Driver Drowsiness Detection System
6 pages
Philosophical Thoughts - Term Report
No ratings yet
Philosophical Thoughts - Term Report
22 pages
Cyber Security Term Paper
No ratings yet
Cyber Security Term Paper
17 pages
1.10. Decision Trees - Scikit-Learn 0.24.1 Documentation
No ratings yet
1.10. Decision Trees - Scikit-Learn 0.24.1 Documentation
10 pages
SSVC Decision Tree
No ratings yet
SSVC Decision Tree
10 pages
Marketing Thesis
No ratings yet
Marketing Thesis
18 pages
Irjet V11i4226
No ratings yet
Irjet V11i4226
8 pages
Agricultural Data Analysis Using Machine Learningpdf
No ratings yet
Agricultural Data Analysis Using Machine Learningpdf
5 pages
Exploring Text-Based Emotions Recognition Machine
No ratings yet
Exploring Text-Based Emotions Recognition Machine
8 pages
Philo Ca2 Assign PDF
No ratings yet
Philo Ca2 Assign PDF
2 pages
Building a Smarter Community: GIS for State and Local Government
From Everand
Building a Smarter Community: GIS for State and Local Government
Christopher Thomas
No ratings yet
The Versatility of the Real Estate Asset Class - the Singapore Experience
From Everand
The Versatility of the Real Estate Asset Class - the Singapore Experience
Kim Hin David HO
No ratings yet

House Price Prediction Using Machine Learning

Uploaded by

House Price Prediction Using Machine Learning

Uploaded by

Proceedings of the 4th International Conference on Signal Processing and Machine Learning

House price prediction using machine learning

2.1. Data Preprocessing

2.2. Data Analysis

Figure 1. Grade Distribution (Original) Figure 2. Price Distribution (Original)

Figure 3. Features correlation heatmap (Original)

Figure 4. Features correlation > 0.5 (Original)

3.1. Linear regression (LR)

3.2. Random forest (RF)

3.3. Artificial Neural network (ANN)

Figure 5. Structure of ANN [8]

Figure 6. Schematic of Xgboost trees [10]

⚫ Calculating Initial Predictions:

Regularization Term (𝛺(𝐹(𝑥))) = 𝛾 ∗ 𝛷(𝑇) + ½ ∗ 𝜆 ∗ 𝛴(𝑤𝑖 ²)

1. Minimizing the Loss Function:

4. Experiment and Analysis

4.1. Experiment Details

Figure 7. Correlation between price and area (Original)

Figure 9. Random Forest Learning Curve (Original)

Figure 10. XGBoost Learning Curve (Original)

Figure 11. Different learning rates curve (Original)

4.2. Overall comparison

Figure 12. Four methods fit degree (Original)

Sqft_basement 0 0.0070 0.0072

Leveraging these results, we computed an aggregate ranking of feature importance by assigning

Figure 13. Overall feature ranking (Original)

You might also like