Electronics 12 01007
Electronics 12 01007
Article
Meteorological Variables Forecasting System Using Machine
Learning and Open-Source Software
Jenny Aracely Segovia *, Jonathan Fernando Toaquiza *, Jacqueline Rosario Llanos *
and David Raimundo Rivas
Department of Electrical and Electronic Engineering, Universidad de las Fuerzas Armadas (ESPE),
Sangolquí 171103, Ecuador
* Correspondence: [email protected] (J.A.S.); [email protected] (J.F.T.); [email protected] (J.R.L.)
Abstract: The techniques for forecasting meteorological variables are highly studied since prior
knowledge of them allows for the efficient management of renewable energies, and also for other
applications of science such as agriculture, health, engineering, energy, etc. In this research, the
design, implementation, and comparison of forecasting models for meteorological variables have
been performed using different Machine Learning techniques as part of Python open-source software.
The techniques implemented include multiple linear regression, polynomial regression, random
forest, decision tree, XGBoost, and multilayer perceptron neural network (MLP). To identify the
best technique, the mean square error (RMSE), mean absolute percentage error (MAPE), mean
absolute error (MAE), and coefficient of determination (R2 ) are used as evaluation metrics. The most
efficient techniques depend on the variable to be forecasting, however, it is noted that for most of
them, random forest and XGBoost techniques present better performance. For temperature, the best
performing technique was Random Forest with an R2 of 0.8631, MAE of 0.4728 ◦ C, MAPE of 2.73%,
and RMSE of 0.6621 ◦ C; for relative humidity, was Random Forest with an R2 of 0.8583, MAE of
2.1380RH, MAPE of 2.50% and RMSE of 2.9003 RH; for solar radiation, was Random Forest with
an R2 of 0.7333, MAE of 65.8105 W/m2 , and RMSE of 105.9141 W/m2 ; and for wind speed, was
Random Forest with an R2 of 0.3660, MAE of 0.1097 m/s, and RMSE of 0.2136 m/s.
makes it impossible to optimally manage renewable energies and obtain a greater benefit
from them.
There are multiple scientific studies of modeling and prediction in order to forecast
future conditions of phenomena in various fields; among the most prominent are ARIMA,
Chaos Theory, and Neural Networks [6]. Forecasting models have evolved in recent
decades, from smart systems with formal rules and logical theories, to the emergence of
artificial intelligence techniques that allow us to propose alternatives in the treatment of
information [7].
Currently, forecasting models have a high impact and are used for several applica-
tions, such as management of energy units for renewable resources microgrids [8,9], load
estimation methods for isolated communities that do not receive energy or only receive it
for a limited time each day [10,11], the operation of energy systems [12,13], in agriculture
to predict the water consumption of plants and plan the irrigation sheet [14], in agriculture
4.0 for the prediction of variables that affect the quality of crops, for micronutrient analysis
and prediction of soil chemical parameters [15], optimization of agricultural procedures
and increasing productivity in the field, forecasting of SPI and Meteorological Drought
Based on the Artificial Neural Network and M5P Model Tree [16], and in controllers based
on forecasting models and predictive controllers. They are also used in the health field to
predict the solar radiation index and to obtain a correct assessment in people with skin
cancer [17], therefore, all the applications mentioned above need forecasting models that
have the lowest error rate for their effective operation.
Having a forecasting model system is costly because computer packages are used
in which licensing costs can be significant. On the other hand, free software is an option
to reduce costs. This research proposes a system based on free software (Python), which
is currently used at industrial level for its reliability, for example in applications such as
the following: Advanced Time Series: Application of Neural Networks for Time Series
Forecasting [18], Machine Learning in Python: main developments and technological
trends in data science, Machine Learning and artificial intelligence [19], Development of
an smart tool focused on artificial vision and neural networks for weed recognition in rice
plantations, using Python programming language [20], etc.
In this research, different prediction techniques were evaluated and compared—among
them, multiple linear regression, polynomial regression, random forest, decision tree, XG-
Boost, and multilayer perceptron neural network—in order to identify the best performing
strategy, using evaluation metrics such as the root mean square error (RMSE) and the
coefficient of determination (R2 ). The variables to be predicted are temperature, relative
humidity, solar radiation, and wind speed, from data taken from the weather station located
in Ecuador, Tungurahua province, Baños. The predicted variables will be the inputs for a
smart irrigation system and used for an energy management system of a microgrid based
on predictive control, therefore, models with high approximation to online measurements
are required.
The contributions of this work are as follows: (i) To design, validate, and compare
different machine learning techniques, and with them select the best technique that adapts
to climate variables for agriculture and energy applications, (ii) To develop a forecast system
for climate variables of low cost based in free software (Python), (iii) To generate forecasting
models that can be replicated for other types of variables applied to smart control systems
based on forecasting models.
network—multilayerperceptron.
network—multilayer perceptron. To
Toobtain
obtainthetheforecast
forecastofofmeteorological
meteorologicalvariables,
variables,the
the
designmethodology
design methodologyshown
shownininFigure
Figure11isisimplemented.
implemented.
Figure1.1. Flowchart
Figure Flowchart of
ofthe
themethodology
methodologyused to to
used obtain forecasting
obtain models
forecasting for meteorological
models vari-
for meteorological
ables.
variables.
2.1.Obtaining
2.1. Obtainingthe theDatabase
Database
Forthe
For theimplementation
implementationofofthe theforecasting
forecastingmodels,
models,information
informationwas wasobtained
obtainedfrom
from
thepage
the pageofof the
the Tungurahua
Tungurahua hydrometeorological network, network, where
wherethere thereareareseveral
severalmete-
me-
orological stations,
teorological stations,including
including the
the Baños
Baños family park, located in in Ecuador,
Ecuador,Tungurahua
Tungurahua
province,Baños,
province, coordinatesX𝑋==9,9,845,439,
Baños,coordinates 845, 439, Y 𝑌== 791,471 that counts
791, 471 that counts the
theparameters
parametersofof
precipitation
precipitation(mm (mm), temperature(◦(°C),
), temperature C ) , relative
relative humidity
humidity ((%),
% ) , wind
wind speed
speed (m/s),
(m/s ), wind
wind
◦
direction( (°), (W/m, ),
2
(mm).
direction ), solar radiation
solar radiationW/m and evapotranspiration
and evapotranspiration (mm ). ForFor
the design of theof
the design
models, only only
the models, the values of temperature,
the values of temperature,solar radiation, relativerelative
solar radiation, humidity, and wind
humidity, andspeed
wind
were
speed taken,
weresince after
taken, a previous
since analysis of
after a previous correlation
analysis between meteorological
of correlation variables,
between meteorological
the variables
variables, thewith lower correlation
variables with lowerwith the variable
correlation withtothe
be predicted
variable to arebediscarded.
predictedItare
is
Electronics 2023, 12, 1007 4 of 19
important to note that the values of temperature, solar radiation (net solar radiation at
surface), and relative humidity were measured at a distance of 2 m, while the wind speed
was measured at 10 m.
y = a + b1 X1 + b2 X2 + · · · + bn Xn (1)
y = a + b1 Xi + b2 Xi 2 + b3 Xi 3 + . . . + bn Xi n (2)
Electronics 2023, 12, 1007 5 of 19
Polynomial Regression
Degree of the
Predicted Variable Inputs Variables
Polynomial
Temperature Solar radiation, relative humidity, wind speed 4
Solar radiation Temperature, relative humidity, wind speed 5
Wind speed Temperature, solar radiation, relative humidity 6
Relative Humidity Temperature, solar radiation, wind speed 4
where: Pi,k : is the radio of class k instances among the training instances in the ith node,
m : number of class labels, and Gi (Gini impurity): represents the measure for constructing
decision trees.
After performing different heuristic tests and using sensitivity analysis for this forecast
technique, it is deduced that the best parameters for tuning are those described in Table 3.
Decision Tree
Predicted Variable Inputs Variables Max_Depth Min_Samples_Leaf
Temperature Solar radiation, relative humidity, wind speed 10 18
Solar radiation Temperature, relative humidity, wind speed 10 7
Wind speed Temperature, solar radiation, relative humidity 19 6
Relative Humidity Temperature, solar radiation, wind speed 9 16
Figure2.2.Algorithm
Figure Algorithmfor
formaking
makingpredictions
predictionsusing
usingrandom
randomforest.
forest.
In general,
This technique deep
hasdecision trees tend to
several parameters overfit,
that can bewhile random
configured, forests
such as theavoid this by
following:
N◦ estimators:
generating the number
random subsets of trees
of features andinusing
the forest. Max leaf
those subsets nodes:smaller
to build the maximum
trees. The
number of leaf nodes,
generalization thisrandom
error for hyperparameter
forests issets a condition
based for splitting
on the strength the individual
of the tree nodes andcon-
thus restricts
structed treesthe growth
and their of the tree. If[24].
correlation after splitting there are more terminal nodes than the
specified
Thisnumber,
technique thehas
splitting
several stops and the tree
parameters thatdoes notconfigured,
can be continue tosuchgrow,aswhich helps to
the following:
avoidN°overfitting.
estimators: AndtheMax features:
number the in
of trees maximum
the forest.number
Max leafof features
nodes: the that are evaluated
maximum num-
for
bersplitting at eachthis
of leaf nodes, node, increasing max_features
hyperparameter generally
sets a condition improves
for splitting themodel performance,
tree nodes and thus
since eachthe
restricts node now has
growth a greater
of the tree. Ifnumber of options
after splitting to are
there consider
more [23].
terminal nodes than the
After number,
specified performing thedifferent
splittingheuristic
stops and tests
theand
treeusing
doessensitivity
not continueanalysis for this
to grow, whichforecast
helps
technique, it is deduced that the best parameters for tuning are those
to avoid overfitting. And Max features: the maximum number of features that are evalu- described in Table 4.
ated for splitting at each node, increasing max_features generally improves model perfor-
Table 4. Tuning
mance, since eachparameters
node now for the
hasrandom forest
a greater technique.
number of options to consider [23].
After performing different heuristic tests and using sensitivity analysis for this fore-
Random Forest
cast technique, it is deduced that the best parameters for tuning are those described in
Predicted Variable TableInputs
4. Variables N◦ Estimators Max Leaf Nodes Max Features
Temperature Solar radiation, relative humidity, wind speed 100 3000 0.1
Table 4. Tuning parameters for the random forest technique.
Solar radiation Temperature, relative humidity, wind speed 100 3000 0.1
Wind speed Random
Temperature, solar radiation, relative Forest
humidity 100 2000 0.3
Predicted Variable
Relative Humidity Inputs Variables
Temperature, solar radiation, wind speed N° Estimators
100 Max Leaf
2000Nodes Max Features
0.2
Solar radiation, relative humidity,
Temperature 100 3000 0.1
wind speed
2.4.5. Extreme Gradient Boosting (XGboost)
Temperature, relative humidity,
Solar radiation The XGBoost algorithm is a scalable100 3000that can be used0.1
tree-boosting system for both
wind speed
classification and regression tasks. It performs a second-order Taylor expansion on the
Temperature,
loss function solar
andradiation, rela-
can automatically use100
multiple threads of the central processing
Wind speed 2000 0.3 unit
tive humidity
(CPU) for parallel computing. In addition, XGBoost uses a variety of methods to avoid
Temperature, solar radiation, wind
overfitting [25].
Relative Humidity 100 2000 0.2
Figure 3 speed
shows the XGBoost algorithm; decision trees are created sequentially (De-
cision Tree-1, Decision Tree-2, Decision Tree-N) and weights play an important role in
2.4.5. Extreme
XGBoost. Gradient
Weights Boosting
are assigned to (XGboost)
all independent variables, which are then entered into
the decision tree that predicts the outcomes (Result-1, Result-2,
The XGBoost algorithm is a scalable tree-boosting systemResult-N). The
that can be weights
used of
for both
variables incorrectly predicted by the tree are increased and these variables
classification and regression tasks. It performs a second-order Taylor expansion on the are then fed
into
lossthe secondand
function decision tree (Residual error).
can automatically These individual
use multiple threads ofpredictors areprocessing
the central then groupedunit
(Average) to give a strong and more accurate model (Prediction).
Figure 3 shows the XGBoost algorithm; decision trees are created sequentially (Deci-
sion Tree-1, Decision Tree-2, Decision Tree-N) and weights play an important role in
XGBoost. Weights are assigned to all independent variables, which are then entered into
the decision tree that predicts the outcomes (Result-1, Result-2, Result-N). The weights of
Electronics 2023, 12, 1007
variables incorrectly predicted by the tree are increased and these variables are then fed
7 of 19
into the second decision tree (Residual error). These individual predictors are then
grouped (Average) to give a strong and more accurate model (Prediction).
Figure3.3.Structure
Figure StructureofofananXGBoost
XGBoostalgorithm
algorithmfor
forregression.
regression.
After
Afterperforming
performing different heuristic
different tests
heuristic andand
tests using sensitivity
using analysis
sensitivity for this
analysis forecast
for this fore-
technique, it is deduced
cast technique, that thethat
it is deduced bestthe
parameters for its tuning
best parameters for itsare thoseare
tuning described in Table 5.
those described in
Table 5.
Table 5. Tuning parameters for the XGboost technique.
Table 5. Tuning parameters for the XGboost technique.
XGBoost
Predicted Variable Inputs Variables XGBoost Max Depth N◦ Estimators
Temperature SolarPredicted Variable
radiation, relative Inputs
humidity, wind speedVariables 2 Max Depth N° 100Estimators
Solar radiation, relative humidity,
Solar radiation
Temperaturerelative humidity,
Temperature,
2
2 20
100
wind speed wind speed
Temperature, relative humidity,
Temperature,
Solar radiationsolar radiation, 2
Wind speed wind speed 5 19 20
relative humidity
Temperature, Temperature, solar radiation,
Relative Humidity Wind speed solar radiation, 7 5 19 19
wind speed relative humidity
Temperature, solar radiation,
Relative Humidity 7 19
2.4.6. Neural Network—Multilayerwind speed
Perceptron
It is an effective and widely used model for modeling many real situations. The
2.4.6. Neural Network—Multilayer Perceptron
multilayer perceptron is a hierarchical structure consisting of several layers of fully inter-
connectedIt is an effective
neurons, and input
which widely used model
neurons for modeling
are outputs many real
of the previous situations.
layer. Figure 4The mul-
shows
tilayer perceptron is a hierarchical structure consisting of several layers
the structure of a multilayer perceptron neural network; the input layer is made up of r of fully intercon-
nected
units neurons,
(where which
r is the numberinputof neurons
external are outputs
inputs) of the previous
that merely distributelayer. Figure
the input 4 shows
signals to
thenext
the structure
layer; of
theahidden
multilayer layerperceptron
is made upneural network;
of neurons thatthe
haveinput layer is made
no physical of 𝑟
contactupwith
units
the (where
outside; the𝑟number
is the number
of hidden of layers
external inputs) that
is variable merely
(u); and the distribute
output layer theisinput
madesignals
up of
l to the next
neurons layer;l is
(where thethehidden
number layer is madeoutputs)
of external up of neurons that have
whose outputs no physical
constitute contact
the vector
of external outputs of the multilayer perceptron [26].
The training of the neural network consists of calculating the linear combination
from a set of input variables, with a bias term, applying an activation function, generally
the threshold or sign function, giving rise to the network output. Thus, the weights
of the network are adjusted by the method of supervised learning by error correction
(backpropagation), in such a way that the expected output is compared with the value of
the output variable to be obtained, the difference being the error or residual. Each neuron
behaves independently of the others: each neuron receives a set of input values (an input
vector), calculates the scalar product of this vector and the vector of weights, adds its own
bias, applies an activation function to the result, and returns the final result obtained [26].
Electronics 2023, 12, x FOR PEER REVIEW 8
with the outside; the number of hidden layers is variable (𝑢); and the output layer is
Electronics 2023, 12, 1007 up of 𝑙 neurons (where 𝑙 is the number of external outputs) whose 8outputs
of 19 cons
the vector of external outputs of the multilayer perceptron [26].
In general, all weights and biases will be different. The output of the multilayer
The training of the neural network consists of calculating the linear combination
perceptron neural network is defined by Equation (4). Where: yk is the output, f k activation
a set of input variables, with a bias term, applying an activation function, generall
function of output layer, θk0 bias of the output layer, Wij hidden layer weights, y0j output of
threshold or sign function, giving rise to the network output. Thus, the weights o
the hidden layer, f j0 activation function of the hidden layer, Xi neuron inputs, Wjk0 output
network are adjusted by the method of supervised learning by error correction (back
layer weights,agation),
θ j bias ofinhidden
such a layer,
way that r is the
the expected
number of inputs
output is for withj the
the neuron
compared from the of the o
value
hidden layer, and u is the number of inputs for the neuron k from the output layer
variable to be obtained, the difference being the error or residual. Each neuron beh [27].
independently of the others: each neuronreceives a set of input values (an input ve
r
0 0
= f j ∑ofXthis
y j product − θ j and the vector of weights, adds its own
calculates the scalar i Wij vector
i =1
applies an activation functionu to the result, ! and returns the final result obtained
(4) [26]
0 0 0
In general, all yweights
k = f k and ∑ ybiases
j Wjk − θk
will be different. The output of the multilaye
ceptron neural network is defined j =1 by equation (4). Where: 𝑦 is the output, 𝑓 activ
function of output layer, 𝜃 bias of the output layer, 𝑊 hidden layer weights, 𝑦 o
For this research, backpropagation was used as a training technique. After performing
of the hidden layer, 𝑓 activation function of the hidden layer, 𝑋 neuron inputs, 𝑊
different heuristic tests and using sensitivity analysis for this forecasting technique, it is
put best
deduced that the weights, 𝜃for
layerparameters bias
itsof hidden
tuning those𝑟described
arelayer, is the number of inputs
in Table 6. for the neuron 𝑗
the hidden layer, and 𝑢 is the number of inputs for the neuron 𝑘 from the output
Table 6. Tuning[27].
parameters for the multilayer perceptron neural network technique.
Temperature
Solar radiation, relative
3 5000 𝑦 =𝑓
128 𝑦 32
𝑊 −𝜃 Hidden: ReLU
humidity, wind speed Out: Sigmoid
Temperature, relative Hidden: ReLU
Solar radiation 3 research,5000
For this 128
backpropagation was used32as a training technique.
humidity, wind speed Out: Sigmoid After perf
ing different heuristic tests and using sensitivity analysis for this forecasting techniq
Temperature, solar radiation, Hidden: ReLU
Wind speed is deduced3 that the best
3000parameters128for its tuning32
are those described in Table 6.
relative humidity Out: Sigmoid
Relative Temperature, solar radiation,
Table 6. Tuning parameters Hidden: ReLU
3 5000 for the multilayer
128 perceptron
32 neural network technique.
Humidity wind speed Out: Sigmoid
Electronics 2023, 12, 1007 9 of 19
3. Results
3.1. Indicators for Assessing the Performance of Weather Forecasting Models
To measure the performance of the forecast techniques for each of the variables de-
scribed above, two types of metrics were used: to evaluate the forecast accuracy, the mean
square error RMSE is used, which allows comparing their results and defining the technique
with the lowest error, and therefore, the best method for each variable to be predicted. In
addition, to determine if the implemented models perform well in their training and to
define their predictive ability, the coefficient of determination is R2 .
where: yc : are the values taken by the target variable, ŷc : are the values of the prediction,
and y: is the mean value of the values taken by the target variable.
where: yc : are the values taken by the target variable, ŷc : are the values of the prediction,
and o: is the sample size.
1 o yc − ŷc
o c∑
MAPE = ∗ 100% (7)
=1 yc
where yc : are the values taken by the target variable, ŷc : are the values of the prediction,
and o: is the sample size.
Equation (7) helps to understand one of the important caveats when using MAPE,
since to calculate this metric, you need to divide the difference by the actual value. This
means that if you have actual values close to 0 or at 0, the MAPE score will receive a
division error by 0 or will be extremely high. Therefore, it is recommended not to use
MAPE when it has real values close to 0 [30].
the difference between the true value and the predicted value for the instance [16,31]. It is
given by Equation (8):
1 o
o c∑
MAE = | yc − ŷc | (8)
=1
where: yc : are the values taken by the target variable, ŷc : are the values of the prediction,
and o: is the sample size.
Table 7 shows that R2 obtained from the implemented algorithms converge to appro-
priate values, i.e., there is a correct approximation between the real temperature and the
predicted temperature, thus guaranteeing the good performance of the algorithm, which
Electronics 2023, 12, 1007 11 of 19
allows a comparison of the performance in terms of forecast error. Comparison of the root
mean square errors (RMSE), mean absolute percentage errors (MAPE), and mean absolute
errors (MAE), and analysis of the coefficient of determination R2 of the different techniques
implemented show that the best performing technique for forecasting the temperature
variable is Random Forest, with an R2 of 0.8631, MAE of 0.4728 ◦ C, MAPE of 2.73%, and
RMSE of 0.6621 ◦ C. This is followed by XGBoost, with an R2 of 0.8599, MAE of 0.5335 ◦ C,
MAPE of 3.09%, and RMSE of 0.7565 ◦ C.
Figure 5 shows the real (red) and prediction (blue) profiles using the different Machine
Learning techniques to predict the temperature variable: (a) Multiple linear regression
technique, (b) Polynomial regression technique, (c) Decision tree technique, (d) Random
Forest technique, (e) XGboost technique, (f) Multilayer perceptron neural network tech-
nique. Figure 5c,d, validate that the best performance corresponds to the Decision tree and
Random forest techniques.
Table 8 shows that R2 obtained from the implemented algorithms converge to appro-
priate values, i.e., there is a correct approximation between the real relative humidity and
the predicted relative humidity, thus guaranteeing the good performance of the algorithm,
which allows a comparison of the performance in terms of forecast error. Comparison
of the root mean square errors (RMSE), mean absolute percentage errors (MAPE), and
mean absolute errors (MAE), and analysis of the coefficient of determination R2 of the
different techniques implemented show that the best performing techniques for forecasting
the relative humidity variable are Random Forest, with an R2 of 0.8583, MAE of 2.1380 RH,
MAPE of 2.50%, and RMSE of 2.9003 RH; and XGBoost, with an R2 of 0.8597, MAE of
2.2907 RH, MAPE of 2.67%, and RMSE of 3.1444 RH.
Figure 6 shows the real (red) and prediction (blue) profiles using the different Machine
Learning techniques to predict the relative humidity variable: (a) Multiple linear regression
technique, (b) Polynomial regression technique, (c) Decision tree technique, (d) Random
forest technique, (e) XGboost technique, (f) Multilayer perceptron neural network technique.
Figure 6d and Figure 6c validate that the best performance corresponds to the Random
forest and Decision tree techniques.
Polynomial regression 0.8406 0.6097 3.51 0.8146
Decision tree 0.8593 0.5097 2.95 0.7333
Random forest 0.8631 0.4728 2.73 0.6621
XGboost 0.8599 0.5335 3.09 0.7565
Multilayer perceptron
Electronics 2023, 12, 1007 0.8226 0.9124 5.51 1.2498 12 of 19
(a) (b)
(c) (d)
(e) (f)
Figure 5. Temperature forecast techniques: (a) Multiple linear regression, (b) Polynomial regression,
Figure 5. Temperature forecast techniques: (a) Multiple linear regression, (b) Polynomial regression,
(c) Decision tree, (d) Random forest, (e) XGboost, (f) Multilayer perceptron neural network.
(c) Decision tree, (d) Random forest, (e) XGboost, (f) Multilayer perceptron neural network.
3.2.2. Relative Humidity Forecasting
Table 8 shows the results of the evaluation metrics: root mean square error (RMSE),
mean absolute percentage error (MAPE), mean absolute error (MAE), and coefficient of
determination (R ) for each of the techniques used for relative humidity forecasting. The
calculation of the root mean square error, mean absolute percentage error, and mean ab-
solute error was obtained by averaging the errors of the validation data (576 data), while
the calculation of the coefficient of determination (R ) used the data from the training set
and the test set (93,780 data).
Table 8 shows that R obtained from the implemented algorithms converge to ap-
propriate values, i.e., there is a correct approximation between the real relative humidity
and the predicted relative humidity, thus guaranteeing the good performance of the algo-
(MAPE) [%]
Multiple linear regression 0.7815 3.0900 3.56 3.7475
Polynomial regression 0.8420 2.2816 2.68 3.0163
Decision tree 0.8547 2.2685 2.65 3.2083
Random forest 0.8583 2.1380 2.50 2.9003
Electronics 2023, 12, 1007
XGboost 0.8597 2.2907 2.67 3.1444 13 of 19
Multilayer perceptron 0.8013 4.6055 5.64 5.5759
(a) (b)
(c) (d)
(e) (f)
Figure 6. Techniques for relative humidity forecasting: (a) Multiple linear regression, (b) Polynomial
Figure 6. Techniques for relative humidity forecasting: (a) Multiple linear regression, (b) Polynomial
regression, (c) Decision tree, (d) Random forest, (e) XGboost, (f) Multilayer perceptron neural net-
regression,
work. (c) Decision tree, (d) Random forest, (e) XGboost, (f) Multilayer perceptron neural network.
Coefficient of Determination Mean Absolute Error (MAE) Mean Square Error (RMSE)
Technique
(R2 ) [W/m2 ] [W/m2 ]
Multiple linear regression 0.6689 106.9741 164.7435
Polynomial regression 0.7394 76.6667 129.1836
Decision tree 0.7253 75.8177 127.3530
Random forest 0.7333 65.8105 105.9141
XGboost 0.7075 87.6137 145.0170
Multilayer perceptron 0.7423 88.5897 140.0681
Table 9 shows that R2 obtained from the implemented algorithms converge to appro-
priate values, i.e., there is a correct approximation between the real solar radiation and
the predicted solar radiation, thus guaranteeing the good performance of the algorithm,
which allows a comparison of the performance in terms of forecast error. Comparison of
the root mean square errors (RMSE), and mean absolute errors (MAE), and analysis of the
coefficient of determination R2 of the different techniques implemented show that the best
performing techniques for forecasting the solar radiation variable are Random Forest with
an R2 of 0.7333, MAE of 65.8105 W/m2 , and RMSE of 105.9141 W/m2 ; and Decision Tree
with an R2 of 0.7253, MAE of 75.8177 W/m2 , and RMSE of 127.3530 W/m2 .
Figure 7 shows the real (red) and prediction (blue) profiles using the different Machine
Learning techniques to predict the variable solar radiation: (a) Multiple linear regression
technique, (b) Polynomial regression technique, (c) Decision tree technique, (d) Random
forest technique, (e) XGboost technique, (f) Multilayer perceptron neural network technique.
Figure 7d validates that the best performance corresponds to the Random forest technique.
(a) (b)
(c) (d)
(e) (f)
Figure 7. Solar radiation forecast techniques: (a) Multiple linear regression, (b) Polynomial regression,
(c) Decision tree, (d) Random forest, (e) XGboost, (f) Multilayer perceptron neural network.
Electronics 2023, 12, 1007 16 of 19
Figure 8 shows the real (red) and prediction (blue) profiles using the different Machine
Learning techniques to predict the wind speed variable: (a) Multiple linear regression
technique, (b) Polynomial regression technique, (c) Decision tree technique, (d) Random
Electronics 2023, 12, x FOR PEER REVIEW 17 of 20
forest technique, (e) XGboost technique, (f) Multilayer perceptron neural network technique.
Figure 8d validates that the best performance corresponds to the Random forest technique.
(a) (b)
(c) (d)
Figure 8. Cont.
Electronics 2023, 12, 1007 17 of 19
(c) (d)
(e) (f)
Figure 8. Techniques for wind speed forecast: (a) Multiple linear regression, (b) Polynomial regres-
Figure 8. Techniques for wind speed forecast: (a) Multiple linear regression, (b) Polynomial regression,
sion, (c) Decision tree, (d) Random forest, (e) XGboost, (f) Multilayer perceptron neural network.
(c) Decision tree, (d) Random forest, (e) XGboost, (f) Multilayer perceptron neural network.
4. Conclusions
For the forecasting of meteorological variables in this research, information obtained
from the Parque de la Familia Baños meteorological station located in Ecuador was used
and the following prediction techniques were tested: multiple linear regression, polynomial
regression, decision tree, random forest, XGBoost, and multilayer perceptron neural net-
work. For forecasting the temperature variable, a better result is obtained by using Random
Forest with an R2 of 0.8631, MAE of 0.4728 ◦ C, MAPE of 2.73%, and RMSE of 0.6621 ◦ C.
In addition, XGBoost also performed well with an R2 of 0.8599, MAE of 0.5335 ◦ C, MAPE
of 3.09%, and RMSE of 0.7565 ◦ C. For forecasting the relative humidity variable, a better
result is obtained by using Random Forest with an R2 of 0.8583, MAE of 2.1380 RH, MAPE
of 2.50%, and RMSE of 2.9003 RH. In addition, XGBoost also performed well with an R2
of 0.8597, MAE of 2.2907 RH, MAPE of 2.67%, and RMSE of 3.1444 RH. For forecasting
the solar radiation variable, a better result is obtained by using Random Forest with an
R2 of 0.7333, MAE of 65.8105 W/m2 , and RMSE of 105.9141 W/m2 . In addition, Deci-
sion Tree also performed well with an R2 of 0.7253, MAE of 75.8177 W/m2 , and RMSE
of 127.3530 W/m2 . For forecasting the wind speed variable, a better result is obtained by
using Random Forest, with an R2 of 0.3660, MAE of 0.1097 m/s, and RMSE of 0.2136 m/s.
In addition, XGBoost also performed well, with an R2 of 0.3866, MAE of 0.1439 m/s, and
RMSE of 0.3131 m/s.
It can be observed that wind speed has the highest variability compared to the other
predicted variables, therefore, the results of the techniques implemented show that the
coefficient of determination R2 of this variable has a lower value. This is due to the type of
signal we are trying to predict; however, acceptable predictions were obtained.
The prediction of meteorological variables (temperature, solar radiation, wind speed,
and relative humidity) will allow future projects to be implemented in the study area, such
as intelligent agriculture to support food problems in that area and the implementation
of a microgrid based on renewable resources where prediction models will support the
planning and operation of the microgrid in real time, allowing clean energy to this locality,
contributing to the reduction in the use of fossil resources, which is the goal that different
countries have set as part of their policies.
Electronics 2023, 12, 1007 18 of 19
Author Contributions: Conceptualization, J.A.S., J.F.T., J.R.L. and D.R.R.; methodology, J.A.S., J.F.T.,
J.R.L. and D.R.R.; software J.A.S. and J.F.T.; validation, J.A.S. and J.F.T.; formal analysis, J.A.S., J.F.T.,
J.R.L. and D.R.R.; investigation, J.A.S., J.F.T. and J.R.L.; resources, J.A.S. and J.F.T.; data curation,
J.A.S. and J.F.T.; writing—original draft preparation, J.A.S., J.F.T., J.R.L. and D.R.R..; writing—review
and editing, J.A.S., J.F.T., J.R.L. and D.R.R.; visualization, J.A.S., J.F.T., J.R.L. and D.R.R.; supervision,
J.R.L. and D.R.R.; project administration, J.R.L.; funding acquisition, J.R.L. All authors have read and
agreed to the published version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: Not applicable.
Acknowledgments: This work was supported in part by the Universidad de las Fuerzas Armadas
ESPE through the Project “Optimal energy management systems for hybrid generation systems”,
under Project 2023-pis-03. In addition, the authors would like to thank to the project EE-GNP-0043-
2021-ESPE, REDTPI4.0-CYTED, Conv-2022-05-UNACH, “SISMO-ROSAS”–UPS.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Ayala, M.F. Analisis de la Dinamica Caoticapara la Series Temporales de Variables Meteorologicas en la Estacion Climatica de Chone;
Universidad de las Fuerzas Armadas ESPE: Sangolquí, Ecuador, 2017; Available online: http://repositorio.espe.edu.ec/handle/
21000/13629 (accessed on 24 November 2022).
2. Erdil, A.; Arcaklioglu, E. The prediction of meteorological variables using artificial neural network. Neural Comput. Appl. 2013, 22,
1677–1683. [CrossRef]
3. Ruiz-Ayala, D.C.; Vides-Herrera, C.A.; Pardo-García, A. Monitoreo de variables meteorológicas a través de un sistema inalámbrico
de adquisición de datos. Rev. Investig. Desarro. Innov. 2018, 8, 333–341. [CrossRef]
4. Inzunza, J.C. Meteorologia Descriptiva. Univ. Concepción Dep. Geofísica 2015, 1–34. Available online: http://www2.udec.cl/
~jinzunza/meteo/cap1.pdf (accessed on 24 November 2022).
5. Millán, H.; Kalauzi, A.; Cukic, M.; Biondi, R. Nonlinear dynamics of meteorological variables: Multifractality and chaotic
invariants in daily records from Pastaza, Ecuador. Theor. Appl. Climatol. 2010, 102, 75–85. [CrossRef]
6. Acurio, W.; Pilco, V. Técnicas Estadísticas para la Modelación y Predicción de la Temperatura y Velocidad del Viento en la Provincia de
Chimborazo; Escuela Superior Politénica de Chimborazo: Riobamba, Ecuador, 2019. Available online: http://dspace.espoch.edu.
ec/handle/123456789/10955 (accessed on 28 November 2022).
7. Tong, H. Non-Linear Time Series: A Dynamical System Approach; Oxford University Press: Oxford, UK, 1990.
8. Palma-Behnke, R.; Benavides, C.; Aranda, E.; Llanos, J.; Sáez, D. Energy management system for a renewable based microgrid
with a demand side management mechanism. In Proceedings of the IEEE Symposium on Computational Intelligence Applications
in Smart Grid 2011, Paris, France, 11–15 April 2011; pp. 1–8. [CrossRef]
9. Rodríguez, M.; Salazar, A.; Arcos-Aviles, D.; Llanos, J.; Martínez, W.; Motoasca, E. A Brief Approach of Microgrids Implementation
in Ecuador: A Review. In Lecture Notes in Electrical Engineering; Springer: Cham, Switzerland, 2021; Volume 762, pp. 149–163.
[CrossRef]
10. Llanos, J.; Morales, R.; Núñez, A.; Sáez, D.; Lacalle, M.; Marín, L.G.; Hernández, R.; Lanas, F. Load estimation for microgrid
planning based on a self-organizing map methodology. Appl. Soft Comput. 2017, 53, 323–335. [CrossRef]
11. Caquilpan, V.; Saez, D.; Hernandez, R.; Llanos, J.; Roje, T.; Nunez, A. Load estimation based on self-organizing maps and
Bayesian networks for microgrids design in rural zones. In Proceedings of the 2017 IEEE PES Innovative Smart Grid Technologies
Conference—Latin America (ISGT Latin America), Quito, Ecuador, 20–22 September 2017; pp. 1–6. [CrossRef]
12. Palma-Behnke, R.; Benavides, C.; Lanas, F.; Severino, B.; Reyes, L.; Llanos, J.; Saez, D. A microgrid energy management system
based on the rolling horizon strategy. IEEE Trans. Smart Grid 2013, 4, 996–1006. [CrossRef]
13. Rey, J.M.; Vera, G.A.; Acevedo-Rueda, P.; Solano, J.; Mantilla, M.A.; Llanos, J.; Sáez, D. A Review of Microgrids in Latin America:
Laboratories and Test Systems. IEEE Lat. Am. Trans. 2022, 20, 1000–1011. [CrossRef]
14. Javier, G.; Quevedo-Nolasco, A.; Castro-Popoca, M.; Arteaga-Ramírez, R.; Vázquez-Peña, M.A.; Zamora-Morales, B.P.; Aguado-
Rodríguez, G.J.; Quevedo-Nolasco, A.; Castro-Popoca, M.; Arteaga-Ramírez, R.; et al. Predicción de Variables Meteorológicas por
Medio de Modelos Arima. Agrociencia 2016, 50, 1–13.
15. Gulhane, V.A.; Rode, S.V.; Pande, C.B. Correlation Analysis of Soil Nutrients and Prediction Model Through ISO Cluster
Unsupervised Classification with Multispectral Data. Springer Link 2022, 82, 2165–2184. [CrossRef]
16. Pande, C.B.; Al-Ansari, N.; Kushwaha, N.L.; Srivastava, A.; Noor, R.; Kumar, M.; Moharir, K.N.; Elbeltagi, A. Forecasting of SPI
and Meteorological Drought Based on the Artificial Neural Network and M5P Model Tree. Land 2022, 11, 2040. [CrossRef]
17. Mora Cunllo, V.E. Diseño e Implementación de un Modelo Software Basado en Técnicas de Inteligencia Artificial, para Predecir el
índice de Radiación Solar en Riobamba-Ecuador. 2015. Available online: http://repositorio.espe.edu.ec/bitstream/21000/12216/
1/T-ESPEL-MAS-0027.pdf (accessed on 24 November 2022).
Electronics 2023, 12, 1007 19 of 19
18. Universitario, S.; Estad, E.N.; Aplicada, S.; Fern, R.A.; Javier, F.; Morales, A. Series Temporales Avanzadas: Aplicación de Redes
Neuronales para el Pronóstico de Series de Tiempo; Universidad de Granada: Granada, Spain, 2021.
19. Raschka, S.; Patterson, J.; Nolet, C. Machine learning in python: Main developments and technology trends in data science,
machine learning, and artificial intelligence. Information 2020, 11, 193. [CrossRef]
20. Carlos, J.; Rodriguez, M. Desarrollo de una Herramienta Inteligente Centrada en Visión Plantaciones de Arroz, Usando Lenguaje
de Programación Python. Ph.D. Thesis, Universidad de Guayaquil, Guayaquil, Ecuador, 2022.
21. Ben Bouallègue, Z.; Cooper, F.; Chantry, M.; Düben, P.; Bechtold, P.; Sandu, I. Statistical modelling of 2m temperature and 10m
wind speed forecast errors. Mon. Weather. Rev. 2022. Available online: https://journals.ametsoc.org/view/journals/mwre/aop/
MWR-D-22-0107.1/MWR-D-22-0107.1.xml (accessed on 18 January 2023). [CrossRef]
22. Montero Granados, R. Modelos de Regresión Lineal Múltiple; Technical Report; Documentos de Trabajo en Economía Aplicada;
Universidad de Granada: Granada, Spain, 2006.
23. Aurélien, G. Hands-on Machine Learning with Scikit-Learn & Tensorflow; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2017.
24. Elbeltagi, A.; Kumar, M.; Kushwaha, N.L.; Pande, C.B.; Ditthakit, P.; Vishwakarma, D.K.; Subeesh, A. Drought indicator analysis
and forecasting using data driven models: Case study in Jaisalmer, India. Stoch. Environ. Res. Risk Assess. 2022, 37, 113–131.
[CrossRef]
25. Luckner, M.; Topolski, B.; Mazurek, M. Application of XGBoost Algorithm. Data Anal. 2017, 10244, 661–671. [CrossRef]
26. Menacho Chiok, C.H. Modelos de regresión lineal con redes neuronales. An. Científicos 2014, 75, 253. [CrossRef]
27. Popescu, M.C.; Balas, V.E.; Perescu-Popescu, L.; Mastorakis, N. Multilayer perceptron and neural networks. WSEAS Trans.
Circuits Syst. 2009, 8, 579–588.
28. Soto-Bravo, F.; González-Lutz, M.I. Analysis of statistical methods to evaluate the performance of simulation models in horticul-
tural crops. Agron. Mesoam. 2019, 30, 517–534. [CrossRef]
29. Gopi, A.; Sharma, P.; Sudhakar, K.; Ngui, W.K.; Kirpichnikova, I.; Cuce, E. Weather Impact on Solar Farm Performance: A
Comparative Analysis of Machine Learning Techniques. Sustainability 2023, 15, 439. [CrossRef]
30. de Myttenaere, A.; Golden, B.; Le Grand, B.; Rossi, F. Mean Absolute Percentage Error for regression models. Neurocomputing
2016, 192, 38–48. [CrossRef]
31. Chai, T.; Draxler, R.R. Root mean square error (RMSE) or mean absolute error (MAE)? -Arguments against avoiding RMSE in the
literature. Geosci. Model Dev. 2014, 7, 1247–1250. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.