MY PRO DAY 9 Copy
MY PRO DAY 9 Copy
B. Sc. (Hons)
in
Mathematics & Computing
by
PRANAB KUMAR PANDA
(7473U196013)
date
Supervisor’s Certificate
This is to certify that the work presented in this project entitled “COMPARISION OF
PERFORMANCE OF SOME MACHINE LEARNING MODELS FOR HOUSE PRICE
PREDICTION" by PRANAB KUMAR PANDA, 7473U196013, is a record of original
research/ review work carried out by him/her under my supervision and guidance in par-
tial fulfilment of the requirements of the B. Sc. (Hons) in Mathematics and Computing.
Neither this project nor any part of it has been submitted for any degree or diploma to any
institute or university in India or abroad.
This project is the document to support my candidature for the academic degree
Bachelor of Science (B.Sc.), in which is detailed explanation of my research and
findings on the topic COMPARISION OF PERFORMANCE OF SOME MACHINE
LEARNING MODELS FOR HOUSE PRICE PREDICTION. Mentioned study was
developed in Institute of Mathematics and Applications, Bhubaneswar at Utkal
University, India. I would like to thank Prof. Sudarsan Padhy, who has supervised the
academic and professional development of investigation of implementation of various
Machine Learning Algorithms and comparing their accuracy. Also, I appreciate my
family and personal friends; especially to Suman Sourav Biswal, Prateek Kumar for their
enthusiastic support, constant motivation, and constant encouragement, and to my
seniors Sampa Mondal and Swati Sucharita for their constructive comments that
improved the presentation of these findings.
Abstract
This project proposes a performance comparison between some machine learning algorithms.
The machine learning models used in this study are Multiple Linear regression, Ridge regression,
Least Absolute Selection Operator (Lasso), XGBoost, Random Forest, SVR(Support Vector Re-
gression), ANN(Artificial Neural Network). Moreover, this study attempts to analyse the correla-
tion between variables to determine the most important factors that affect house prices in a dataset
taken from kaggle.com containing data of residential homes in Ames, Iowa.
The accuracy of the prediction is evaluated by checking the Mean Absolute Error (MAE),
Mean Squared Error (MSE), RMSE (Root Mean Square Error), R2 Square of all the applied model.
The test is performed after applying the required pre-processing methods and splitting the data
into two parts. However, one part will be used in the training and the other in the test phase. This
project attempts to show that Lasso gives the best score among other algorithms. The correla-
tion graphs show the variables’ level of dependency. In addition, the empirical results show that
’EnclosedPorch’, ’KitchenAbvGr’, ’OverallCond’ influence the house prices negatively whereas
’GarageCars’, ’GarageArea’, ’GrLivArea’, ’OverallQual’ impact the house prices positively to a
great extent. At the end of the project, we will compare the accuracy of prediction by the machine
learning models mentioned earlier.
1 Introduction 1
2 Background 2
3 Experiment 10
3.4.2 Outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
i
3.4.3 Train-Test Split of the dataset . . . . . . . . . . . . . . . . . . 14
5 Conclusion 34
5.1 Ethics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
A Features 36
B Python Code 39
C Github Link 49
Bibliography 50
ii
Chapter 1
Introduction
“ Data science is the art of extracting meaningful insights from various data sources and
deriving useful information from them! ”
In this project, we will compare some of the well-known machine learning models
like Multiple Linear regression, Ridge regression, Lasso, XGBoost, SVM(Support Vec-
tor Machine), ANN(Artificial Neural Network) after applying them on a dataset imported
from the kaggle competition "House Prices - Advanced Regression Techniques". Here,
our aim is to find the best model that will fit our problem (predicting the price of a house).
We begin by EDA (Exploratory Data Analysis), where we try to visualize various param-
eters by analyzing the datasets and summarizing their main characteristics. It helps us
to get an initial impression of the data we have before making any assumptions. In fact,
It can help identify obvious errors, as well as better understand patterns within the data,
detect outliers or anomalous events, find interesting relations among the variables. Then,
we prepare the data for modelling, by cleaning the data, processing the missing data, se-
lecting the relevant variables, deducing some features, run statistical tests, defining the
regression models and finally choosing the best price prediction model for the test set.
For comparison between models we will use four common evaluation metrics i.e, Mean
Absolute Error (MAE), Mean Squared Error (MSE), RMSE (Root Mean Square Error),
R2 Square and we will also try to cross validate our findings. We have used Python pro-
gramming language in Jupyter notebook for implementing all the tasks. Please find the
code attached in the end of this report to see the codes and the respective outputs.
1
Chapter 2
Background
2
Background
grammes can predict outcomes more accurately without having to be explicitly instructed
to do so. In order to forecast new output values, machine learning algorithms use histori-
cal data as input.
1. SUPERVISED LEARNING
2. UNSUPERVISED LEARNING
3. REINFORCEMENT LEARNING
1. Supervised Learning
Supervised comes from the word supervise which means to observe and direct the
execution of a task. In supervised learning, a supervisor is required for training the
machine to learn from the data.
In supervised learning we do predictions and classification (image classification,
audio classification) mainly.
2. Unsupervised Learning
Unsupervised learning is a type of machine learning that involves algorithms that
train on unlabeled data.In this we can do clustering and recommendation.Clustering
means grouping smiliar data in one group and unsimilar data in others.Here we
can see that provide recommendations like TSNE, PCA(for dimensionality reduc-
tion),FACTORING etc.
3. Reinforcement Learning
Reinforcement learning helps us to make decision.It works on the goal and pre-
scribed set of rules for accomplishing the goal.In Reinforcement learning we train
robot(agent)to takes decision(action) so that it will earn maximum reward .We teach
the robot to take decision by giving them reward.Reward in general we can say it’s
a token of appreciation for doing correct things or to justify the performance.
3
Background
It comes under supervised machine learning and is used to estimate the relationship
between one dependent variable and more than one independent variables. Identifying
the correlation and its cause-effect helps to make predictions by using these relations.
To estimate these relationships, the prediction accuracy of the model is essential; the
complexity of the model is of more interest. However, Multiple Linear Regression is
prone to many problems such as multicollinearity, noises, and overfitting, which effect on
the prediction accuracy.
The Ridge Regression is an L2-norm regularised regression technique that was in-
troduced by Hoerl in 1962. It is an estimation procedure to manage collinearity without
removing variables from the regression model. In multiple linear regression, the multi-
collinearity is a common problem that leads least square estimation to be unbiased, and
its variances are far from the correct value. Therefore, by adding a degree of bias to the
regression model, Ridge Regression reduces the standard errors, and it shrinks the least
square coefficients towards the origin of the parameter space.
When Least Squared Error determines the values of parameters, it minimises the sum of
squared residuals. However, when Ridge determines the values of parameters, it reduces
the sum of squared residuals. It adds a penalty term, where α determines the sever-
4
Background
ity of the penalty and the length of the slope. In addition, increasing the α makes the
slope asymptotically close to zero. Like Lasso, α is determined by applying the Cross-
validation method. Therefore, Ridge helps to reduce variance by shrinking parameters
and make the prediction less sensitive.
Lasso introduces a bias term, but instead of squaring the slope like Ridge regression,
the absolute value of the slope is added as a penalty term.
Where Min( sum o f squared residuals) is the Least Squared Error, and α ∗ | slope | is the
penalty term. However, alpha al pha is the tuning parameter which controls the strength
of the penalty term. In other words, al pha the tuning parameter is the value of shrinkage.
|slope | the sum of the absolute value of the coefficients.
5
Background
in which new models are created that predict the residuals or errors of prior models and
are then combined to make the final prediction. It is called gradient boosting because it
employs a gradient descent algorithm to reduce the loss while adding new models. [7]
It is a supervised machine learning algorithm used for both classification and regres-
sion. [?]It is very effective in high dimensional cases. In classification problem,the objec-
tive of SVM algorithm is to find a hyperplane in an N-dimensional space that distinctly
classifies the data points. The dimension of the hyperplane depends upon the number of
features. If the number of input features is two, then the hyperplane is just a line. If the
number of input features is three, then the hyperplane becomes a 2-D plane. It becomes
difficult to imagine when the number of features exceeds three.
6
Background
In the above figure the two red lines are the decision boundary and the green line is
the hyperplane. The objective of SVR is to basically consider the points that are within
the decision boundary line. Our best fit line is the hyperplane that has a maximum number
of points. [6]
Random forest is an ensemble of decision trees. This is to say that many trees,
constructed in a certain “random” way form a Random Forest. [5] In a random forest
regressor each tree is created from a different sample of rows and at each node, a different
sample of features is selected for splitting. Each of the trees makes its own individual
prediction. These predictions are then averaged to produce a single result. The averaging
makes a Random Forest better than a single Decision Tree hence improves its accuracy
and reduces overfitting.
7
Background
The data that is being held in each neuron is called activation. Activation value ranges
from 0 to 1. As shown in the above figure, each neuron is linked to all neurons in the
previous layer. Together, all activations from the first layer will decide if the activation
will be triggered or not, which is done by taking all activations from the first layer and
computing their weighted sum
w1 a1 + w2 a2 + w3 a3 ... + wn an (2.3.3)
However, the output can be any number,although it should be only between 0 and 1. Thus,
specifying the range of the output value to be within the accepted range. It can be done
8
Background
by using the Sigmoid function that will put the output to be ranging from 0 to 1. Then
the bias is added for inactivity to the equation so it can limit the activation to when it is
meaningfully active
Where ai is activation, wi presents the weight, b is the bias and σ is the sigmoid function.
Nevertheless, after getting the final activation, its predicted value needs to be com-
pared with the actual value. The difference between these values is considered as an
error, and it is calculated with the cost function. The cost function helps to detect the
error percentage in the model, which needs to be reduced. Applying back-propagation on
the model reduces the error percentage by running the procedures backwards to check on
how the weights and bias are affecting the cost function.
Back-propagation is simply the process of reversing the whole activations transference
among neurons. The method calculates the gradient of the cost function concerning the
weight. It is performed in the training stage of the feed-forward for supervised learning
9
Chapter 3
Experiment
https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/data
10
Experiment
Mean Absolute Error (MAE) is the mean of the absolute value of the errors:
1 n
∑ |yi − ŷi|
n i=1
Root Mean Squared Error (RMSE) is the square root of the mean of the squared
errors: s
1 n
∑ (yi − ŷi)2
n i=1
NOTE: Here yi represents the actual value (SalePrice) and ŷi represents the predicted
value (SalePrice).
R2 Square is the proportion of the variation in the dependent variable that is pre-
dictable from the independent variable(s). It is also known as the coefficient of determi-
nation.
If R2 = 0 , it indicates that the dependent variable cannot be predicted from the indepen-
dent variable(s).
If R2 = 1 , it indicates the dependent variable can be predicted without error from the
independent variable(s).
MSE is more popular than MAE, because MSE "punishes" larger errors, which tends to
be useful in the real world.
RMSE is even more popular than MSE, because RMSE is interpretable in the "y" units.
The first three are loss functions, so we want to minimize them. And more is the
value of R2 Square better and more accurate will be our predictions.
11
Experiment
The main motive of EDA is to get an overview of the numerical and categorical
features of a dataset. In this project, python libraries like pandas, matplotlib, seaborn
are imported and pre-defined functions are used for graphical representation of various
features and to see the relationships in between them. Also we have used heat map to see
the covariance between various numerical features. In the program we have used various
graphs like scatter plot, box plot to give a pictorial description dataset and compare the
accuracy of different models.
3.4.2 Outliers
Outliers are those data points which differs significantly from other observations
present in given dataset. It can occur because of variability in measurement and due to
misinterpretation in filling data points. [4] They can be natural or because of entry errors,
or due to measurement errors as well. In the project we have used box plots for visualising
the outliers. Box plot is a graphical display for describing the distributions of the data.
12
Experiment
Box plot uses the median and the lower and upper quartiles to showcase the distribution
of data points .
1 import pandas as pd
2 import numpy as np
3 import matplotlib . pyplot as plt
4 import seaborn as sns
5
13
Experiment
The procedure of spliting a dataset involves taking a dataset and dividing it into two
subsets. The first subset is used to fit the model and is referred to as the training dataset.
The second subset is not used to train the model; instead, the input element of the dataset
14
Experiment
is provided to the model, then predictions are made and compared to the expected values.
This second dataset is referred to as the test dataset.
15
Chapter 4
In the beginining, we got to see that the dataset on which we were going to work had 81
columns including ’Id’ that is used for identifying houses and ’SalePrice’ which depicts
the price of a house. On evaluation, we get to know that the used dataset has 38 Numerical
features and 43 Categorical features.
We then drop all the columns having categorical features, since we wish to work on nu-
merical features to predict the Sale Price of house.
Now, we have 38 columns which means 37 features (including Id) to predict the
house price. But we can’t use them without pre-processing, since on checking we get
to see that there are three columns (LotFrontage, GarageYrBlt, MasVnrArea ) having
some missing values under them(NULL VALUES).
16
Results and Disscussion
As we can see here, ’LotFrontage’ has 259 null missing row entries under it out of
1460 entries that is around 17.7 % missing entries. Similarly, ’GarageYrBlt’, ’MasVn-
rArea’ have 81, 8 missing entries respectively. So, we drop these three columns. So,
finally we have 34 features to predict the Sale Price of a house.
17
Results and Disscussion
The code and it’s output given above shows the features that we have retained for our
work. Now we can see we have 1460 non-null entries.
18
Results and Disscussion
During EDA, we plot several graphs to see the co-relations in between Sale Price
and other features. The following scatter plots represent the relationship of ’1stFlrSF’,
’GrLivArea’, ’LotArea’, ’YearBuilt’ with ’Sale Price’.
19
Results and Disscussion
20
Results and Disscussion
between the features in the dataset. Deeper the red color more positive is the correla-
tion between the features and deeper the blue color more will be the negative correlation
between the features
Since we are interested in predicting the Sale Price of houses, we also plotted a
bar graph to see the dependence of Sale Price with other features. The following graph
depicts the same.
Then we have removed outliers from certain features like ’GarageArea’, ’GrLi-
vArea’, ’1stFlrSF’, ’TotalBsmtSF’ and have visualised them by the help of box plots.
Now, we have 1347 entries to work on with.
21
Results and Disscussion
Then, we split the dataset into two parts for training and testing ML models in the
ratio of 3:1. Thus, 75% of data will be used for training the ML models and 25% for
testing the accuracy of the predictions made by the ML models
22
Results and Disscussion
Then we define some functions for our evaluation metrics using predefined fuctions
from sklearn library. This helps us to aviod repeatation of same block of codes again and
again in the program.
evaluate() function is defined to evaluate the all the four required metrics that we are
going to use to compare the accuracy of ML models. print_evaluate function is defined
to print the results.
1 import numpy as np
2 from sklearn import metrics
3
23
Results and Disscussion
ML MODELS IMPLEMENTATION
We begin by Multiple linear regression. The code given below fits multiple linear
regression by the help of training data that is X_train and y_train then, it is used to predict
the Sale Price of the houses present in the testing dataset that is X_test. Finally by the help
of our user-defined functions, we evaluate the accuracy of the multiple linear regression
model and store it into a dataframe results_df so that we can compare all the results later,
altogether.
1 from sklearn . linear_model import LinearRegression
2
24
Results and Disscussion
Then, we have applied Ridge regression after importing predefined fuctions from
sklearn.linear_model library. Again we apply it to the training data and check the accu-
racy by applying our user-defined fuctions on testing data.
1 from sklearn . linear_model import Ridge , RidgeCV
2
25
Results and Disscussion
26
Results and Disscussion
5 model = XGBRegressor ()
6 model . fit ( X_train , y_train )
7
27
Results and Disscussion
28
Results and Disscussion
29
Results and Disscussion
9 model = Sequential ()
10
30
Results and Disscussion
After using these ML models, we finally pictorially represent the values of evalu-
ation metrics that we obtained. If we compare them, we can see that Random Forest
Regressor has the maximum R2 Square value and minimum Mean Absolute Error
(MAE), Mean Squared Error (MSE), RMSE (Root Mean Square Error).
31
Results and Disscussion
32
Results and Disscussion
From the above bar graphs, we can see that Random Forest Regressor performs
better than other ML model used in the project.
33
Chapter 5
Conclusion
After running the python program, we compared the ML models that we had taken. Fi-
nally, the result we got is as follows:
From the results, we are getting a MAE of about 15921 in the prediction by Random
Forest regressor that is the least among all models used in this project. This means on an
average the predictions of house price differ from the actual price by $ 15,921. Whereas
the highest MAE we got is of 24120, that was from ANN which makes it the poorest ML
model used for prediction, at least under these conditions that were taken in this project.
Usually, ANN requires a lot of training data to get a good accuracy during testing.
The final results of this project showed that Random Forest regressor makes better
prediction as compared to other used algorithms since it gives us the least MAE, MSE,
RMSE and the highest R2 Square values.
Although this study has shown that Random forest Regressor makes the best pre-
diction, one cannot guarantee that it will perform the same when used for other purposes
34
Conclusion
than the ones that have been presented in this study. Infact, if we use different parameters
for ML models like SVM, ANN, then their performance might increase too. Also, we can
model the data, instead of deleting the features having some NULL entries, we could fill
them up with the mean or mode of all the observed values so that we get more number of
predictors. This can also boost the accuracy of some ML models.
5.1 Ethics
This project is taking into consideration the ethical part, where the dataset is down-
loaded from a public website called Kaggle. Kaggle allows users to find and publish data
sets, explore and build models in a web-based data-science environment, and work with
other data scientists.In this project, the algorithms are public and open source. In addition,
the algorithms are trained and tested on the same public dataset.
• Changing the way we have handled missing values. We can fill them up by mean
or mode of observations present in that column. By this we will have more number
of predictors which can also boost the accuracy of some ML models.
• Making use of categorical features which have not used in this project.
• Tweeking with the ANN model by using different parameters that can help it train
the model better.
• The used pre-processing methods do help in the prediction accuracy. However,
experimenting with different combinations of pre-processing methods to achieve
better prediction accuracy.
35
Appendix A
Features
FEATURE DESCRIPTION
SalePrice the property’s sale price in dollars
MSSubClass The building class
MSZoning The general zoning classification
LotFrontage Linear feet of street connected to property
LotArea Lot size in square feet
Street Type of road access
Alley Type of alley access
LotShape General shape of property
LandContour Flatness of the property
Utilities Type of utilities available
LotConfig Lot configuration
LandSlope Slope of property
Neighborhood Physical locations within Ames city limits
Condition1 Proximity to main road or railroad
Condition2 Proximity to main road or railroad (if a second is present)
BldgType Type of dwelling
HouseStyle Style of dwelling
OverallQual Overall material and finish quality
OverallCond Overall condition rating
YearBuilt Original construction date
YearRemodAdd Remodel date
RoofStyle Type of roof
RoofMatl Roof material
36
Features
FEATURE DESCRIPTION
Exterior1st Exterior covering on house
Exterior2nd Exterior covering on house (if more than one material)
MasVnrType Masonry veneer type
MasVnrArea Masonry veneer area in square feet
ExterQual Exterior material quality
ExterCond Present condition of the material on the exterior
Foundation Type of foundation
BsmtQual Height of the basement
BsmtCond General condition of the basement
BsmtExposure Walkout or garden level basement walls
BsmtFinType1 Quality of basement finished area
BsmtFinSF1 Type 1 finished square feet
BsmtFinType2 Quality of second finished area (if present)
BsmtFinSF2 Type 2 finished square feet
BsmtUnfSF Unfinished square feet of basement area
TotalBsmtSF Total square feet of basement area
Heating Type of heating
HeatingQC Heating quality and condition
CentralAir Central air conditioning
Electrical Electrical system
1stFlrSF First Floor square feet
2ndFlrSF Second floor square feet
LowQualFinSF Low quality finished square feet (all floors)
GrLivArea Above grade (ground) living area square feet
BsmtFullBath Basement full bathrooms
BsmtHalfBath Basement half bathrooms
FullBath Full bathrooms above grade
HalfBath Half baths above grade
Bedroom Number of bedrooms above basement level
Kitchen Number of kitchens
KitchenQual Kitchen quality
TotRmsAbvGrd Total rooms above grade (does not include bathrooms)
Functional Home functionality rating
Fireplaces Number of fireplaces
37
Features
FEATURE DESCRIPTION
FireplaceQu Fireplace quality
GarageType Garage location
GarageYrBlt Year garage was built
GarageFinish Interior finish of the garage
GarageCars Size of garage in car capacity
GarageArea Size of garage in square feet
GarageQual Garage quality
GarageCond Garage condition
PavedDrive Paved driveway
WoodDeckSF Wood deck area in square feet
OpenPorchSF Open porch area in square feet
EnclosedPorch Enclosed porch area in square feet
3SsnPorch Three season porch area in square feet
ScreenPorch Screen porch area in square feet
PoolArea Pool area in square feet
PoolQC Pool quality
Fence Fence quality
MiscFeature Miscellaneous feature not covered in other categories
MiscVal Value of miscellaneous feature
MoSold Month Sold
YrSold Year Sold
SaleType Type of sale
SaleCondition Condition of sale
38
Appendix B
Python Code
Below is the full python code that we have made to implement and evaluate the accuracy
of various ML models.
1 import pandas as pd
2 import numpy as np
3 import matplotlib . pyplot as plt
4 import seaborn as sns
5
9 ghar . info ()
10
11 ghar . describe ()
12
13 ghar . columns
14
15
19 cate gorical_ feats = ghar . dtypes [ ghar . dtypes == " object " ]. index
20 print ( " Number of Categorical features : " , len ( cate gorical_ feats ) )
21
22
23
39
Complete Python Code
28
32 ghar . columns
33
34 ghar . shape
35
43 ghar . info ()
44
45 # EDA
46
47 features = []
48 for i in ghar . columns :
49 features . append ( i )
50
66 import warnings
67 warnings . filterwarnings ( " ignore " )
68
40
Complete Python Code
84
85
99
100
41
Complete Python Code
114
115
129
137
138
42
Complete Python Code
167
168
169 # MODELS
170 # LINEAR REGRESSION
171 from sklearn . linear_model import LinearRegression
172
184 results_df = pd . DataFrame ( data =[[ " Linear Regression " , * evaluate (
y_test , test_pred ) ]] ,
185 columns =[ ’ Model ’ , ’ MAE ’ , ’ MSE ’ , ’ RMSE ’ , ’
R2 Square ’ ])
186
187 # RIDGE
188 from sklearn . linear_model import Ridge , RidgeCV
189
43
Complete Python Code
209
210
211 results_df_2 = pd . DataFrame ( data =[[ " Ridge " , * evaluate ( y_test ,
test_pred ) ]] ,
212 columns =[ ’ Model ’ , ’ MAE ’ , ’ MSE ’ , ’ RMSE ’ ,
’ R2 Square ’ ])
213 results_df = results_df . append ( results_df_2 , ignore_index = True )
214
215
216
217 # LASSO
218 from sklearn . linear_model import Lasso , LassoCV
219
228
44
Complete Python Code
237
240 results_df_3 = pd . DataFrame ( data =[[ " Lasso " , * evaluate ( y_test ,
test_pred ) ]] ,
241 columns =[ ’ Model ’ , ’ MAE ’ , ’ MSE ’ , ’ RMSE ’ ,
’ R2 Square ’ ])
242 results_df = results_df . append ( results_df_3 , ignore_index = True )
243
244
245 # XGB
246 from numpy import loadtxt
247 from xgboost import XGBRegressor
248
257
267 results_df_4 = pd . DataFrame ( data =[[ " XGBRegressor " , * evaluate ( y_test
, test_pred ) ]] ,
268 columns =[ ’ Model ’ , ’ MAE ’ , ’ MSE ’ , ’ RMSE ’ ,
’ R2 Square ’ ])
269 results_df = results_df . append ( results_df_4 , ignore_index = True )
270
45
Complete Python Code
286 results_df_5 = pd . DataFrame ( data =[[ " Random Forest Regressor " , *
evaluate ( y_test , test_pred ) ]] ,
287 columns =[ ’ Model ’ , ’ MAE ’ , ’ MSE ’ , ’ RMSE ’ ,
’ R2 Square ’ ])
288 results_df = results_df . append ( results_df_5 , ignore_index = True )
289
290 # SVM
291 from sklearn . svm import SVR
292
305 results_df_6 = pd . DataFrame ( data =[[ " SVM Regressor " , * evaluate (
y_test , test_pred ) ]] ,
306 columns =[ ’ Model ’ , ’ MAE ’ , ’ MSE ’ , ’ RMSE ’ ,
’ R2 Square ’ ])
307 results_df = results_df . append ( results_df_6 , ignore_index = True )
308
309 # ANN
310
46
Complete Python Code
348 results_df_7 = pd . DataFrame ( data =[[ " Artficial Neural Network " , *
evaluate ( y_test , test_pred ) ]] ,
349 columns =[ ’ Model ’ , ’ MAE ’ , ’ MSE ’ , ’ RMSE ’ ,
’ R2 Square ’ ])
350 results_df = results_df . append ( results_df_7 , ignore_index = True )
351
47
Complete Python Code
362
363
364 results_df
48
Appendix C
Github Link
49
Bibliography
[2] Artificial Intelligence (AI) Explained "what is ai? learn about artificial intelligence".
https://www.oracle.com/in/artificial-intelligence/what-is-ai/.
[5] RFR "random forest regression: When does it fail and why?". https://neptune.
ai/blog/random-forest-regression-when-does-it-fail-and-why.
50