Performance Comparison of Simple Regression Random Forest and XGBoost Algorithms For Forecasting Electricity Demand
Performance Comparison of Simple Regression Random Forest and XGBoost Algorithms For Forecasting Electricity Demand
net/publication/366693600
CITATION READS
1 223
2 authors, including:
Erkan Duman
Firat University
27 PUBLICATIONS 66 CITATIONS
SEE PROFILE
All content following this page was uploaded by Erkan Duman on 02 January 2023.
Abstract—Electrical energy is the locomotive of the economy, demand occurs[3]. The first condition for planning energy
industry, and development in terms of the development of production is to make possible energy demand forecasts. With
countries. In order to meet the need during the periods when the realistic forecasts, investments in energy systems and service
energy demand reaches its peak and to prevent the market to end users will be brought to the best levels.
participants from making economic losses when it is at the
lowest level, the closest prediction should be made. Load II. RELATED WORKS
forecasting is very important in planning the generation,
transmission, and management of energy and in pricing Pence et al. trained electricity consumption data for the
electricity in the most appropriate way. Regional, demographic industry for the period 1970-2016 with artificial neural
and meteorological variables are effective in the energy networks. The model they used aimed to predict electricity
production plan. These factors affect the electricity market consumption in 2017-2023. The neural network has been
operated by the system operator in every sense. An energy tested with the single-output cross-validation method. As a
forecasting plan is needed in order to keep the supply and result, electricity consumption amounts for the years 2017-
demand of energy in balance. Today, the use of large data sets 2023 have been estimated with high performance.
has a positive effect on machine learning and artificial neural
Wang et al. used an approach based on the Long Short
network training. By using these data sets, very high
performances are achieved in modeling. In this study, Turkey's
Term Memory (LSTM) method to predict periodic energy
electricity consumption between the years 2018-2021 was consumption. In this study, correlation and mechanism
modeled using the Linear Regression from supervised learning analysis were used to find secondary variables. In addition, the
techniques, Random Forest and XGBoost algorithms from time variable was defined to obtain the full periodicity and the
machine learning models. In our study, short-term consumption LSTM network was created to model the ordinal data. The
load forecastings were made hourly, considering the results showed that the LSTM method has higher forecasting
meteorological factors and public holidays in the country, and performance compared to some traditional forecasting
the forecasting performances of the three different algorithms methods such as Autoaggressive Moving Average Model
used were compared. If the study is used, it is foreseen that it
(ARMA), Autoregressive Fractional Integrated Moving
will eliminate the energy supply-demand imbalance.
Average Model (ARFIMA) and Back Propagation Neural
Keywords: Electricity Consumption, Regression Techniques, Network (BPNN)[4].
Machine Learning, Supervised Learning. In their experiments, Ayvaz et al. used deep learning
methods for electricity consumption forecasting. They made a
I. INTRODUCTION predict of daily consumption with a time series dataset
Energy is one of the main needs of social and economic containing hourly electricity consumption in a year. First, the
developments in the world and is used especially in industry, necessary preparatory processes for the analysis phase have
residences and workplaces. Electrical energy is an important been completed. For time series analysis, LSTM, GRU and
factor that shows the level of development and welfare of ARIMA methods were tested on the data set. Hyperparameter
countries in terms of being an energy source used in almost tuning have been made for accurate estimation. They searched
every area of daily life and being easily converted into other for the parameters that produced the best results for each
types of energy [1]. It has the features of electrical energy, modeling type and compared the most appropriate parameters
ease of use and not creating environmental waste. Therefore, with the results. When the test results of the models were
it is used more than different energy sources. In order for compared, the LSTM model made the most accurate
electrical energy to be used uninterruptedly and efficiently, prediction with 13 percent. The GRU method predicted with
production and consumption must be at the same time [2]. a lower error rate than the LSTM. The ARIMA method was
Population growth, the need for development, found to be the most unsuccessful estimation method when
industrialization, urbanization and globalization phenomena compared to the other two deep learning methods[5].
and increasing trade opportunities due to these increase the In their thesis study, Tokgöz et al. used Recurrent Neural
demand for natural energy resources and energy. This leads Networks (RNN), Long-Short-Term Memory (LSTM) and
countries to concentrate even more on energy and to conduct Gated Repetitive Units (GRU) based time series forecasting
many researches in this field. Since it is not possible to store methods for Turkey electricity consumption forecasting. As a
electrical energy in large quantities, it is necessary to plan result of their experiments, they obtained a better average
energy facilities and productions to meet this demand when
Authorized licensed use limited to: ULAKBIM UASL - Firat Universitesi. Downloaded on January 02,2023 at 09:47:24 UTC from IEEE Xplore. Restrictions apply.
absolute error percentage compared to ARIMA and artificial III. MATERIAL AND METHODS
neural networks studies applied in the past[6].
A. Linear Regression
Nespoli et al. in their thesis study, examined the accuracy
of day-ahead load forecasting based on the data typology used Regression analysis is a method used to analyze the
in LSTM training. A real study of an Italian industrial energy relationship between one or more independent variables (x)
load was examined, with data recorded every 15 minutes for and a continuous value dependent variable (y)[14]. Linear
the years 2017 and 2018[7]. regression is based on the assumption that input values
Le et al. propose an Electrical Energy Consumption approximate output values through a rule-based regression.
Forecasting model using a combination of Convolutional In other words, it corresponds to the situation where the data
Neural Network (CNN) and Bidirectional Long Short-Term set used and other unknown values are located in a
Memory (Bi-LSTM). In the experimental results, they hyperplane connected to a single point. A linear regression
examined that the combination they used outperformed the approach is based on lines, planes, or hyperplanes. Therefore,
state-of-the-art approaches in terms of various performance it cannot adapt to datasets with high dispersion[15]. A simple
measures for electrical energy consumption forecasting on linear regression uses two continuous variables to predict the
various variations of the real-time, short-term, medium-term dependent variable y from the independent variable x as
and long-term individual household electrical power shown in Equation 1[16].
consumption dataset [8].
Bedi et al. applied their thesis in the system of Union 𝑦 = 𝑎 + 𝑏𝑥 + 𝜀 (1)
Territory Chandigarh, India. Based on seasonal, day and
interval data, electricity demand forecasts were made with the In the equation, y is the dependent variable, x is the
LSTM method. In this study, the concept of active window- independent variable, and a is how far the line will be shifted,
based active learning was added to the studies to improve the while ε represents the amount of error.
prediction results [9]. B. Ensemble Learning
Zheng et al. investigated Long-Short-Term Memory
(LSTM)-based Recurrent Neural Network (RNN) to The concept of ensemble learning, also called collective
overcome short-term electric charge forecasting difficulties learning, covers a set of machine learning methods developed
due to the time series being nonlinear and non-stationary. to improve the prediction performance of algorithms by
They accurately predicted complex electric charge time series combining predictions from multiple models[17]. At the
by taking advantage of the long forecasting capability of same time, community learning means that several weak
LSTM-based RNN [10]. learners come together to form a strong learner[18].
Ozkurt et al. used Long-Short-Term Memory (LSTM) Community learning is used in many areas in real life. In the
deep learning method to accurately predict electricity diagnosis of disease in medicine, physicians' diagnosis as a
consumption in order to minimize the energy imbalance cost. result of a joint decision or reaching comments about a
With this method, hourly forecasting was made using hourly product before purchasing a product and making a decision
electricity generation data provided from the official website from them can be given as examples of this type of learning
of Energy Markets Operations Inc. They used the historical [19].
data of 24, 48 and 72 hours as a test and obtained the result Stacking method works with a set of weak learner logic
that the single-layer LSTM network makes successful such as SVM (Support Vector Machine), logistic regression
predictions [11]. and decision trees. Each classifier is trained independently of
Kurt et al. examined the performances of XGBoost and each other. While finding the result, majority voting, average
Random Forest Algorithms to detect and classify network- value of all outputs or an auxiliary classifier that generates
based attacks in their study. When the machine learning intermediate predicts are used.
methods they used were compared, it was observed that the The Blending method, also called the blending method,
sensitivity and precision criteria values of the XGBoost has the same approach as the stacking method. However, only
algorithm were higher than the Random Forest algorithm. a certain validation data leaving the training set is used to
Abbasi et al. converted Australian Energy Market make predictions [20].
Operator load data into weekly time series to predict future In the blending method, the training set is first divided
load for use in smart grids. They used the XGBoost algorithm into a new training set and a validation set. These datasets
to extract features from the data and predict the electrical load used to train the base classifiers form a meta-training data.
for a single time delay. It has been observed that XGBoost Collation is simpler than the stacking method and can
performs extremely well in terms of efficient processing time therefore eliminate the problem of data leakage. However, it
and memory resources[13]. is possible for the mixing method to cause over-learning[21].
In our study, the forecasting performances of Linear The concept of bagging is a combination of the words
Regression, Random Forest and XGBoost algorithms from Bootstrap and Aggrigation. The bagging technique is based
machine learning methods were examined by using hourly on combining the predictions obtained by multiple estimators
total consumption data of Turkey between the years 2018- (decision trees) created with the bootstrap technique. The
2021, the temperature values of the provinces with the highest bagging technique gave successful results in linear regression
consumption, and 24-hour retrospective electrical load values. classification and regression trees in tests on data sets [22].
Random Forests consists of a combination of tree
estimators. These trees have the same distributions as other
trees and work according to the values of a random
vector[23].
Authorized licensed use limited to: ULAKBIM UASL - Firat Universitesi. Downloaded on January 02,2023 at 09:47:24 UTC from IEEE Xplore. Restrictions apply.
The Random Forest algorithm is a collective machine method, each subset is used for training and testing purposes,
learning algorithm inspired by the bagging technique. The minimizing the errors caused by dispersion and
main predictors of this technique are decision trees. This fragmentation in the data set.
model first generates random subsets in the original dataset. GridSearchCV is a method developed for hyperparameter
This is called bootstrap. Then a randomly selected feature is optimization. All possible combinations of hyperparameters
used to make a decision. This is the best method to make the used in the Machine Learning model are tested to ensure the
best split on each node. A decision tree model is placed on best performance of the model. Accordingly, the most
the resulting subsets. The estimation results of all decision successful hyperparameters are determined according to the
trees are averaged. The result estimate is calculated. Random specified measurement values.[28].
Forests are a very robust ensemble learning method that can
C. Data Set
reduce both bias and variance similar to boosting. Also, the
nature of the algorithm allows it to be fully parallelized both In this study, with a 24-hour cycle, Turkey's total
during training and during prediction. This is a significant electricity consumption data on an hourly basis, together with
advantage over augmentation methods, especially when it the hourly temperature data of the 16 provinces with the
comes to large datasets. They also require less highest electricity consumption, were tested with several
hyperparameter tweaking compared to boost techniques, machine learning algorithms. In the data set used in the study,
especially XGBoost. electricity consumption data was obtained from Real Time
The main weaknesses of random forests are their Consumption Data in Energy Exchange Istanbul (EXIST)
susceptibility to class imbalances, as well as the problem Transparency Platform[29] and meteorological data were
involving a low proportion of related and unrelated features obtained from climatological reports in POWER Data Access
in the training set. Also, Random Forests generally Viewer web application[30]. The data set consists of a total of
outperform deep neural networks when the data contains low- 35064 records between 01.01.2018 and 31.12.2021.
level nonlinear patterns (for example, in raw, high-resolution Our data set; It is divided into approximately 80% training
image recognition). Finally, Random Forests can be and 20% test dataset. Training data was checked and it was
computationally costly when very large datasets are used with determined that there was no missing data. Month, weekday
unlimited tree depths[24]. and hour for training; as a categorical feature, official national
In this method, a subset of the dataset is taken and a model holiday features; as logical features, temperature data and
is built on that subset. However, a base model (weak learner) consumption values; used as numerical features. In addition,
to be established later is not independent of the first, but consumption data of 24 hours ago has been added to our
rather works to develop the first model. dataset as delay features. In Fig. 1, the distribution graph of
Boosting, a machine learning algorithm, can be used to the load values in the data set is shown in MWh.
reduce bias and variance in a dataset. This algorithm aims to
transform weak learners into strong learners. A poorly
learning classifier has poor correlation with true
classification, and vice versa. Many algorithms learn weak
classifiers iteratively and add them to a strong classifier.
Added data are weighted so that correctly classified data loses
their weight and incorrectly classified gains weight.
Algorithms that perform these tasks are called boost [25].
XGBoost (Extreme Gradient Boosting) is a machine
Fig. 1. Display of Training and Test Data
learning technique based on decision trees and the Gradient
Boosting algorithm. The gradient boost method combines In the data preparation line (pipeline); One-Hot Encoding
weak classifiers. The purpose of this combination is to create is used for categorical features (time), Standard-Scaler for
a more powerful classifier. Starting from the basic learner, the numeric features (consumption data and temperature) and
strong learner is trained iteratively. Gradient Boost and FunctionTransformer for logical features (holiday data). The
XGBoost basically have the same approaches. However, it training data was fitted in the data line function we prepared
differs in practice. By controlling the tree complexity, and applied to the training and test datasets.
XGBoost works better than the Gradient Boost method [26].
One of the most important factors to get good IV. EXPERIMENTAL WORKS AND RESULTS
performance during training of the XGBoost model is In the study, learning curves were drawn to observe how
parameter tuning. First, a baseline model is developed to much the three machine learning models applied to the data
observe the overall performance of the model. Then, the set learned from the training data and to observe the
results obtained with this basic model are compared and the generalization behavior based on the validation data. From the
parameter tuning technique is used. This technique is close to learning curves in Fig. 2 (RMSE/Number of Observations); It
the general Gradient Boosting algorithm. In the first stage, a is seen that the Linear Model performance increased up to a
learning rate is defined in order to increase the training speed certain point, after which it did not show any significant
of the model. In the second step, parameter tuning are made improvement. It has been observed that the training error of
for the tree. Finally, the model is trained to find the most the Random Forest Model increases gradually and the
suitable predictor amount by reducing the learning rate [27]. validation error is almost constant after a certain point. It has
K-Folds Cross Validation is a validation method used by been determined that the training error of the XGBoost Model
dividing the data set into k different subsets in order to increases with a low acceleration and the validation error
eliminate the overfitting problem in training models. In this gradually decreases after a certain point.
Authorized licensed use limited to: ULAKBIM UASL - Firat Universitesi. Downloaded on January 02,2023 at 09:47:24 UTC from IEEE Xplore. Restrictions apply.
Fig. 3, 18 possible combinations were tested with 4-fold
cross.
The parameters used in the training of the Random Forest validation with the GridSearchCV method. The model
Model are given in TABLE II. constructed using the optimum hyperparameters found as a
result of the tests had the lowest RMSE value
TABLE II. PARAMETERS USED IN RANDOM FOREST MODEL
(best_score:2038.54) and these hyperparameter values were
Model N_Estimators Criterion Min_Samples_Le
af
Random
_State used as final values.
RandomForest 1000 mse 0.001 42 The percentage error amounts between the predicted
Regesssor values made by the Linear Model, Random Forest and
XGBoost technique and the actual load values are calculated
The parameters specified in TABLE III. in the XGBoost and the values associated with the three models are shown in
Model were run using 4-fold cross validation in a Fig. 4.
GridSearchCV algorithm consisting of 18 possible
combinations, 72 models were trained and tested, the error
values (best_score) were found, and the most suitable values
in terms of model performance were selected for training.
Authorized licensed use limited to: ULAKBIM UASL - Firat Universitesi. Downloaded on January 02,2023 at 09:47:24 UTC from IEEE Xplore. Restrictions apply.
Fig. 5. Representation of Absolute Error, Squared Error and Percent Error Sums of All Models in Bar Graph
Authorized licensed use limited to: ULAKBIM UASL - Firat Universitesi. Downloaded on January 02,2023 at 09:47:24 UTC from IEEE Xplore. Restrictions apply.
[23] Kyriakides, G., & Margaritis, K. G. (2019). Hands-On Ensemble [27] Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to
Learning with Python: Build highly optimized ensemble machine support vector classification.
learning models using scikit-learn and Keras. Packt Publishing Ltd. [29] URL-3,
[24] Dhaliwal, S. S., Nahid, A. A., & Abbas, R. (2018). Effective intrusion https://seffaflik.epias.com.tr/transparency/tuketim/gerceklesen-
detection system using XGBoost. Information, 9(7), 149. tuketim/gercek-zamanli-tuketim.xhtml. 2022.
[25] Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree [30] URL-4, https://power.larc.nasa.gov/data-access-viewer/. 2022.
boosting system. In Proceedings of the 22nd acm sigkdd international
conference on knowledge discovery and data mining (pp. 785-794).
[26] Salam Patrous, Z. (2018). Evaluating XGBoost for user classification
by using behavioral features extracted from smartphone sensors.
Authorized licensed use limited to: ULAKBIM UASL - Firat Universitesi. Downloaded on January 02,2023 at 09:47:24 UTC from IEEE Xplore. Restrictions apply.
View publication stats