0% found this document useful (0 votes)
42 views17 pages

A New Hybrid Method For Predicting Univariate and Multivariate Time Series Based On Pattern Forecasting

This paper proposes a new hybrid method for predicting univariate and multivariate time series based on clustering patterns in the time series data, classifying new data into clusters, and using separate forecasting models trained on each cluster. The method is tested on electricity consumption and air pollution time series data, comparing results to other common forecasting techniques. The hybrid approach combines clustering, classification, and regression for time series forecasting in a flexible framework.

Uploaded by

Jc Olimpiada
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views17 pages

A New Hybrid Method For Predicting Univariate and Multivariate Time Series Based On Pattern Forecasting

This paper proposes a new hybrid method for predicting univariate and multivariate time series based on clustering patterns in the time series data, classifying new data into clusters, and using separate forecasting models trained on each cluster. The method is tested on electricity consumption and air pollution time series data, comparing results to other common forecasting techniques. The hybrid approach combines clustering, classification, and regression for time series forecasting in a flexible framework.

Uploaded by

Jc Olimpiada
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Information Sciences 586 (2022) 611–627

Contents lists available at ScienceDirect

Information Sciences
journal homepage: www.elsevier.com/locate/ins

A new hybrid method for predicting univariate and multivariate


time series based on pattern forecasting
M.A. Castán-Lascorz a, P. Jiménez-Herrera b, A. Troncoso b, G. Asencio-Cortés b,⇑
a
Department of Digital Industry & Industry and Energy Area, Fundación CIRCE, Parque Empresarial Dinamiza, ES-50018 Zaragoza, Spain
b
Data Science & Big Data Lab, Universidad Pablo de Olavide, ES-41013 Seville, Spain

a r t i c l e i n f o a b s t r a c t

Article history: Time series forecasting has become indispensable for multiple applications and industrial
Received 9 July 2021 processes. Currently, a large number of algorithms have been developed to forecast time
Received in revised form 29 November 2021 series, all of which are suitable depending on the characteristics and patterns to be inferred
Accepted 1 December 2021
in each case. In this work, a new algorithm is proposed to predict both univariate and mul-
Available online 08 December 2021
tivariate time series based on a combination of clustering, classification and forecasting
techniques. The main goal of the proposed algorithm is first to group windows of time ser-
Keywords:
ies values with similar patterns by applying a clustering process. Then, a specific forecast-
Time series forecasting
Machine learning
ing model for each pattern is built and training is only conducted with the time windows
Hybrid model corresponding to that pattern. The new algorithm has been designed using a flexible frame-
work that allows the model to be generated using any combination of approaches within
multiple machine learning techniques. To evaluate the model, several experiments are car-
ried out using different configurations of the clustering, classification and forecasting
methods that the model consists of. The results are analyzed and compared to classical pre-
diction models, such as autoregressive, integrated, moving average and Holt-Winters mod-
els, to very recent forecasting methods, including deep, long short-term memory neural
networks, and to well-known methods in the literature, such as k nearest neighbors, clas-
sification and regression trees, as well as random forest.
Ó 2021 The Author(s). Published by Elsevier Inc. This is an open access article under the CC
BY license (http://creativecommons.org/licenses/by/4.0/).

1. Introduction

Knowing the evolution of the future values of a time series can offer considerable advantages in many applications, as this
enables decisions to be made in advance and adapted appropriately to obtain better results. If the estimates are sufficiently
precise and reliable, it is possible to gain substantial improvements in many factors, such as savings in costs and consump-
tion, reduction of emissions or logistics optimization [1]. Currently, there is a wide variety of forecasting methods that ana-

Abbreviation: ACF, Autocorrelation function of the Time Series; ARFIMA, Autoregressive Fractionally Integrated Moving Average; ARIMA, Autoregressive
Integrated Moving Average; CART, Classification And Regression Trees; CNN, Convolutional Neural Networks; ETS, Exponential Smoothing; FTS, Fuzzy Time
Series; KNN, K-Nearest Neighbors; LSTM, Long Short-Term Memory; MAPE, Mean Absolute Percentage Error; MSE, Mean Squared Error; RF, Random Forest;
RNN, Recurrent Neural Networks; SARIMA, Seasonal Autoregressive Integrated Moving Average; SES, Simple Exponential Smoothing; VM, Support Vector
Machines; VAR, Self-regression vector; WNN, Wavelet Neural Networks.
⇑ Corresponding author.
E-mail addresses: [email protected] (M.A. Castán-Lascorz), [email protected] (P. Jiménez-Herrera), [email protected] (A. Troncoso), [email protected]
(G. Asencio-Cortés).

https://doi.org/10.1016/j.ins.2021.12.001
0020-0255/Ó 2021 The Author(s). Published by Elsevier Inc.
This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627

lyze time series using different approaches, such as statistical models, nearest neighbors or neural networks to identify rel-
evant patterns and obtain precise estimates of their future evolution [2–4]. Therefore, it is necessary to choose the most suit-
able one for each application depending on the characteristics of the time series to be analyzed and the requirements of the
model used, such as the training time or the prediction horizon.
In recent years, numerous works have been published analyzing the behavior and quality of the predictions made on time
series in different fields, such as in air pollution [5] or electricity consumption [6]. New models or variants of classic algo-
rithms have also been proposed, which normally exploit the increase in the volume of available data and current computa-
tional capacities, such as long short-term memory (LSTM) recurrent neural networks [7].
One of the most relevant properties when selecting a prediction method is the number of variables to be considered to
predict the time series, since most of them are influenced by external factors [8]. For example, electricity consumption will
depend on solar radiation if the installation contains photovoltaic panels. There are specific methods for univariate series,
which only consider the previous values of the variable itself to estimate its evolution. For example, the evolution of sales
of a product for a year can be estimated. Other forecasting algorithms are suitable for multivariate time series, as they ana-
lyze dependencies and interactions with other variables to predict future values [9]. These are the most common in real
applications, as most variables can be influenced by external factors such as car rentals depending on weather conditions
and petrol prices. However, these methods usually require longer computation and training times, and they may not always
be the most suitable solution, as the possible improvement in prediction accuracy does not compensate due to the increase
in the complexity of the model used.
For time series, open global competitions, known as the M competition, are organized with the intention of evaluating
and comparing the accuracy of different prediction methods proposed by researchers and companies. In the competition
held in 2018, the M4 competition, which consisted of the prediction of one hundred thousand time series with different mea-
surement frequencies and different time horizons. The method that obtained the best predictions was a hybrid prediction
method between a neural network and a statistical model proposed by Uber, and the second best method was a combination
between a machine learning model and a statistical model proposed by a researcher from the University of La Coruña. These
results obtained such a large number of time series from many different applications. Combined with the ability to learn
hidden relationships in the data that machine learning methods have demonstrated, the results prove that hybrid methods
work well in the applications and motivate the development of new hybrid models between different learning machine
approaches to obtain more effective predictive models.
In this paper, we propose a hybrid method for predicting multivariate and univariate time series based on clustering, clas-
sification and forecasting algorithms. In particular, during the training phase, a forecasting model is first generated for each
cluster obtained by the previously applied clustering technique. Then, in the prediction phase, a classification method is used
to assign the new instance to a specific cluster and to use the proper forecasting model corresponding to that cluster. The
proposed model is applied to six different time series, five univariate and one multivariate, to evaluate the results obtained
and compare them with other commonly used prediction models.
In summary, the main contributions of this work are:

1. We propose a new hybrid approach based on clustering, classification and regression for time series forecasting.
2. We develop a hybrid algorithm to forecast both multivariate and univariate time series and for any prediction horizon.
3. We conduct a wide experimentation to evaluate the sensitivity of the proposed hybrid algorithm with respect to the clas-
sification and regression methods that it consists of.
4. We address a comprehensive evaluation using real electricity data from the Spanish electricity market and surface ozone
data from Spain, measured every 15 min and every hour for ten years, respectively. The results show the potential of the
proposed hybrid model for forecasting. The proposed model outperforms forecasting algorithms when applied to the
whole time series without any type of clustering.
5. We evaluate and compare the prediction accuracy of the proposed algorithm when using forecasting approaches of dif-
ferent natures, such as the autoregressive integrated moving average (ARIMA), Holt-Winters and a deep LSTM neural net-
work, as well as different classifiers, such as Naïve Bayes, support vector machines (SVMs) and a random baseline.

The rest of the paper is structured as follows: In Section 2, a general overview of the current state-of-the-art prediction
models for time series is described. In Section 3 the proposed methodology is presented. In Section 4, the datasets used to
evaluate the model are analyzed along with the experimental setting and the results. Finally, in Section 5, the main conclu-
sions and future work are presented.

2. Related work

Traditional methods of time series forecasting are mainly based on statistical fundamentals, such as simple exponential
smoothing (SES) and exponential smoothing (ETS) [10], which are suitable when predicting time series without a trend. Con-
versely, Holt and Damped exponential smoothing [11] are commonly most appropriate for time series with trends. In addi-
tion, in [12] a set of 30 models based on the multiple seasonal Holt–Winters method were trained and tested for the very
short-term electricity demand of the Spanish national electricity market. The performance of the methodology was validated
612
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627

by out-of-sample forecast comparisons using real data from the operator of the Spanish electricity system, achieving a fore-
cast accuracy comparable to the best methods in the competitions.
Moreover, more accurate classic models, including the ARIMA [13] were developed, combining autoregressive, moving
averages and integrated parts. The quality of the results depends largely on the characteristics of the time series to be ana-
lyzed. In a stationary series with very marked patterns and few variations, it is possible to obtain very small prediction errors
using traditional methods, such as the ARIMA for univariate time series or the self-regression vector (VAR) for multivariate
time series [14]. The Seasonal Autoregressive Integrated Moving Average (SARIMA) is an ARIMA model that is commonly
applied to data that contains seasonal factors. The work published in [15] conducted an empirical study concluding that
the SARIMA model achieved accurate forecasts, as well as other more complicated prediction methods. In addition to the
SARIMA, another method that can be used to predict long-term time series is the autoregressive fractionally integrated mov-
ing average (ARFIMA) method. The work published in [16] used the ARFIMA to successfully predict different types of time
series with long memory patterns. Recently, [17] proposed a hybrid model that incorporates an empirical decomposition of
the model based on a family of the ARIMA models and a Taylor expansion for financial time series forecasting. Cappelli et al.
[18] added a mechanism to detect and give more importance to multiple breakpoints in time series values when forecasting.
Stationarity is the initial requirement of a time series analysis using statistical approaches. However, in real-world cases,
most time series data is not stationary, and the use of machine learning algorithms usually helps to improve forecasting
accuracy [19]. Machine learning approaches for time series forecasting are able to analyze the behavior of data evolution
over time without previous assumptions regarding the statistical distribution of data, extracting complex nonlinear patterns.
The SVM is a technique suitable to produce both classification and regression models based on the optimization of a sepa-
ration surface within a transformed space given a user-defined kernel function. Several SVM models have been used for pre-
dicting time series [20].
Fuzzy logic is a logic paradigm where the true value of variables may be any real number between 0 and 1 instead of a
discrete value (0 or 1). Fuzzy logic is a flexible logic in the sense that it is able to adjust to the changes and uncertainties of
the problem and to model complex nonlinear functions. Fuzzy logic methods for time series forecasting include the neural
fuzzy systems and the fuzzy time series (FTS). For example, [21] proposed a new hybrid fuzzy system with neural networks
to improve forecasting accuracy by combining the benefits of neural networks and fuzzy logic. Furthermore, [22] proposed
the FTS for forecasting different types of time series. Another relevant fuzzy logic-based neural network methodology for
time series forecasting was proposed in [23]. Specifically, two fuzzy integrators (type-1 and interval type-2 Mamdani) were
assembled and adapted to forecast a chaotic time series. Moreover, the membership function parameters of the fuzzy infer-
ence systems in each integrator were optimized by genetic algorithms. The proposal was validated with the Mackey–Glass
time series, and its results were promising. Related to that work, in [24] the authors proposed an ensemble of interval type-2
fuzzy neural network models, integrating its outputs to forecast the time series using a fuzzy integrator. Genetic algorithms
and particle swarm optimization were used for the optimization of the parameter values in the membership functions of the
fuzzy integrator. The methodology was validated on time series such as Mackey–Glass, Mexican Stock Exchange, Dow Jones
and NASDAQ, achieving very competitive results. In [25] a method based on modular neural networks was proposed for mul-
tiple time series prediction using many-input many-output fuzzy aggregation models. Several representative approaches
were proposed for fuzzy aggregation models involving adaptive neuro-fuzzy inference systems and Type-1 and interval
Type-2 fuzzy inference systems. The method was validated with the time series of the Mexican Stock Exchange, National
Association of Securities Dealers Automated Quotation and Taiwan Stock Exchange, obtaining promising results.
A comprehensive comparison between statistical models and machine learning approaches was performed in [26]. In
addition, another comparative analysis of both statistical and machine learning-based forecasting algorithms was published
in [27] for a univariate agrometeorological time series. A comparative study specific for pattern similarity-based machine
learning methods for electricity load forecasting was also published in [28].
Deep learning has gained popularity in recent years, especially in the field of computer vision and natural language pro-
cessing. Deep neural networks are able to learn complex data representations [29], decreasing the need for manual feature
engineering because of their internal embedding representation. On the one hand, convolutional neural networks (CNNs),
traditionally designed for image datasets, can extract local relationships that are invariant across spatial dimensions. To
adapt a CNN to time series data, multiple layers of causal convolutions can be used for only past information to predict future
values [30]. On the other hand, recurrent neural networks (RNNs) have been designed for sequence modeling. Given the nat-
ural interpretation of time series data as sequences of inputs and targets, many RNN-based architectures have been devel-
oped for applications of time series forecasting [31]. Due to the infinite lookback window, the first RNN architectures had
limitations in learning long-range dependencies in the data due to issues related to exploding and vanishing gradients.
Therefore, LSTM networks were developed to address these limitations by improving the gradient flow in the network. Li
et al. [32] proposed an ensemble of the ARIMA with the LSTM that improved the forecasting accuracy for high-frequency
financial time series. Niu et al. [33] proposed a two-stage deep learning framework focusing on feature selection for multi-
variate financial time series. Attention mechanisms for deep neural networks [34] are a recent technique to effectively learn
long-term dependencies. Attention layers aggregate temporal features using dynamically generated weights, allowing the
network to focus directly on crucial time steps in the past. Recent works have used attention mechanisms for different time
series forecasting applications, improving the prediction accuracy with respect to other RNN architectures [35]. A recent sur-
vey of deep learning methods for time series forecasting can be found in [36].

613
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627

There is a wide range of single models proposed for time series forecasting, but they are not promising approaches with
the desired performance for all situations. There are situations where hybrid methods are an appropriate alternative that can
achieve a higher performance compared to individual models [37]. The majority of hybrid models proposed for time series
forecasting can be classified into three main types: data preprocessing-based hybrid models, parameter optimization-based
hybrid models and component combination-based hybrid models. Data preprocessing-based hybrid models are the best-
known type of hybridization, consisting of the transformation of the time series into simpler data or dividing data into sev-
eral subsets. Nguyen and Novák [38] developed a preprocessing-based hybrid model to forecast seasonal time series. In this
work, the time series is decomposed into three main parts: trend-cycle, seasonal and irregular fluctuation parts. Then, each
part is modified by fuzzy natural logic techniques and a Box & Jenkins model. Parameter optimization-based hybrid models
determine the best parameters of different forecasting models through an optimization process, such as metaheuristic algo-
rithms. Metaheuristic algorithms have been successfully adopted due to their ability to address vast search spaces [39]. Ojha
et al. [40] provided a broad overview of hybrid models based on optimization using neural network models. Finally, compo-
nent combination-based hybrid models are mainly known as ensemble models. Recently, different ensemble models have
been proposed and widely used in numerous practical fields [41]. The objective of ensemble models is to increase the accu-
racy and reduce the variance. Ribeiro et al. [42] proposed an ensemble hybrid model using the wavelet neural networks
(WNNs), where wavelet functions were utilized in a hidden layer as an activation function, for short-term load forecasting.
Galicia et al. [43] presented an ensemble framework composed of three models, a decision tree, a gradient boosted tree and a
random forest, for big data time series. A recent survey of hybrid methodologies for time series forecasting can be read in
[37].
To summarize the main characteristics of the works cited in this section, Table 1 shows the forecasting algorithm family
and type used in each work, as well as the year in which they were proposed, along with their reference.

3. A hybrid model based prediction

In this section, the proposed algorithm for time series forecasting is described along with the prediction methods tested
that it consists of.

3.1. Description of the methodology

The main goal is to predict H next values (hereinafter called the prediction horizon) of a time series, expressed as
½x1 . . . ; xt , from W previous values (hereinafter called the historical data window). For that, each variable of the multivariate
time series (Var 1, Var 2,. . .) is transformed into a data matrix composed of instances and features, where the features are the
past W values and the next H values, as shown in Fig. 1. Note that the different windows present an interval of separation of
H steps. However, the algorithm can be extended using a general separation interval of S steps, simply adding a last step in
the prediction phase to solve the overlapping among the predictions obtained by the proposed algorithm.
Thus, the data for each variable Var of the time series can be expressed as follows:

Table 1
Summary of related work including the forecasting algorithm family and type used, as well as the year in which they were proposed, along with their reference.

Family Type Year Ref.


Statistical Exponential Smoothing 2002 [10]
Holt and Damped exponential smoothing 2006, 2011 [11]
Multiple seasonal Holt–Winters 2016 [12]
ARIMA 2008 [13]
SARIMA 2018 [15]
ARFIMA 2018 [16]
ARIMA and Taylor 2021 [17]
ARIMA with breakpoints 2021 [18]
KNN Nearest neighbors 2018 [19]
SVM Support vector machines 2011, 2015 [20]
Fuzzy logic Fuzzy time series 2016–2018 [21,22]
Fuzzy neural networks and genetic search 2014 [23]
Fuzzy neural networks, genetic and PSO search 2018 [24]
Modular fuzzy neural networks 2019 [25]
Deep learning Convolutional neural networks 2017, 2018 [30]
Recurrent neural networks 2018–2020 [31]
LSTM 2020 [32,33]
Attention mechanisms 2019–2020 [35]
Hybrid Data preprocessing-based 2019 [38]
Parameters optimization-based 2017, 2019 [40,39]
Component combination-based (ensembles) 2017–2019 [41–43]

614
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627

Fig. 1. Transformation for each variable of the time series.

DVar ¼ fðW 1 ; H1 Þ; ðW 2 ; H2 Þ . . . ; ðW N ; HN Þg ð1Þ


where N is the number of instances of the matrix and W i and Hi are vectors composed of past W and future H values,
respectively.
The methodology proposed for univariate and multivariate time series forecasting consists of splitting the time series into
windows of the same length and then grouping these fragments into k clusters, G1 . . ., Gk , according to their characteristics.
Then, a training set for each cluster Gj with j ¼ 1 . . . ; k is defined as follows:

TSj ¼ fðW i ; Hi Þ 2 Gj ; 8i ¼ 1 . . . ; Ng ð2Þ


A specific forecasting model, F j is then trained for each set TSj , using only the windows of the group itself. The goal is for
the time fragments of the time series of each group to present similar characteristics so that the learning process is special-
ized for the patterns of each set, improving the accuracy. Therefore, in the prediction phase, a multiclass classifier C is also
required to assign the corresponding cluster for each window, according to its characteristics. Thus, classifier C is trained
during the previous stage, along with the forecasting models, but using the training set as follows:

TSC ¼ fðW i ; lj ÞsuchthatðW i ; Hi Þ 2 Gj 8i ¼ 1 . . . ; N 8j ¼ 1 . . . ; kg ð3Þ

where lj is the label associated with cluster Gj .


Once the prediction models, F j , and classifier, C, have been trained, the final prediction from the W i past values, PðW i Þ, is
obtained as the prediction of the forecasting model, F j is associated with the cluster Gj predicted by the classifier C of F CðW i Þ .
Therefore, the prediction is computed by the equation as follows:
PðW i Þ ¼ F CðW i Þ ðW i Þ ð4Þ
The training process of the proposed algorithm is illustrated in Fig. 2. The training of the hybrid algorithm consists of four
steps. First, the variables of the time series are divided into windows of size W and H with a window spacing interval of S
steps, where H is the prediction horizon and W is an input parameter of the proposed model. Windows of size H are used to
train the forecasting models in a supervised way. In the figure, a window formed by three times the prediction horizon and a
step of half the length of the forecast horizon are considered. Next, the clustering algorithm is applied to the W-size windows
to create k groups, where k is the number of clusters. An optional step can be included before applying the clustering algo-
rithm, performing a previous preprocessing to the windows to obtain the most appropriate characteristics to establish the
different groups. Using the assigned labels as a class, a classifier is trained to predict the most adequate group for each time
window. Finally, univariate prediction models are trained separately for each cluster obtained by the previous clustering and
for each variable to be predicted in the multivariate scenario case.
It should be noted that the clustering, classification and prediction methods are configured using their own parameters.
Furthermore, since a specific model is trained for each group, it is also possible to combine different prediction methods or
variations of the same algorithm using different configuration parameters. In particular, the ARIMA and Holt-Winters models
use the same parameters but they have different coefficients for each cluster, and a combination of different ARIMA models
are tested in this work.
The proposed algorithm is designed to deal with multivariate time series. All of the variables of a time window are con-
sidered by clustering. However, the models used for the prediction of the time series are univariate and are trained and
applied separately for each variable. The proposed hybrid algorithm is defined in this way because there is a wider variety
of univariate than multivariate forecasting algorithms to evaluate the proposed approach.
Fig. 3 shows the prediction phase divided into four steps. First, the test set is divided into windows of size W and H similar
the training phase. Then, the classifier is used to predict the label of the cluster to when the time windows belongs so that
each window is assigned to a group based on its characteristics. In the next step, the univariate prediction model correspond-
615
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627

Fig. 2. Steps of the training for a multivariate time series composed of two variables.

ing to that cluster is applied to forecast each of the target variables. Finally, the predictions are reconstructed, solving the
overlaps, if they exist, with a specific method, such as the average of the values. Overlap between predictions appears only
when choosing a step S smaller than the prediction horizon H. Furthermore, preprocessing should also be applied in case it
was done before clustering during the training phase. Thus, the classification process is carried out using the same features.

3.2. Forecasting methods

The univariate forecasting algorithms forming the hybrid model and used to evaluate the model are described in this sec-
tion. Any forecasting method can be used within the hybrid model, but some of them cannot be applied directly. This is
because the input of the proposed hybrid algorithm is not a time series but a set of time windows that may have disconti-
nuities in the time sequence. This makes it necessary to adapt the algorithms used to learn a model from a set of time win-
dows considering how temporal discontinuities may affect each particular algorithm and application.
The forecasting methods developed to evaluate the performance of the hybrid forecasting method have been an ARIMA
model, a Holt-Winters model and an RNN, in particular an LSTM. These methods have been adapted to address temporal time
window sets rather than time series.
The ARIMA model on time windows is trained on each group and variable using all of the windows belonging to the group
to search for the set of parameters that obtain the best predictions for that cluster. The parameters of the algorithm are, on
the one hand, in the order of the different components ðp; q; dÞ of the ARIMA model and, on the other hand, the coefficients of
the model. Additionally, a function to compare the performance of the different combinations of parameters must be chosen,
since they can be evaluated according to different criteria, applying the most appropriate type of error. Once the model has
been trained, the coefficients are kept fixed and are not updated in the prediction phase.
Holt-Winters on time windows is based on using all of the windows of a group to obtain the model that provides the best
results in that cluster. The necessary parameters to define the model are the parameters for trend and stationary modeling.
In addition, the function to evaluate the results obtained using a validation set must also be provided to choose the best com-
bination of parameters.
The LSTM layers, which are applied in an RNN, allow the identification of long-term dependencies in a time series [44].
However, the time windows belonging to a particular cluster may have temporal discontinuities of different durations. Thus,
it is not appropriate to train the LSTM network by applying all of them at once, concatenated in a single sequence.
The solution proposed consists of obtaining larger time windows than the windows used in the ARIMA and Holt-Winters
methods described above to take advantage of the capacities of these layers. The LSTM is trained in different phases using
one time window at a time and accumulating the learning. Another possible option would be to concatenate all of the time
windows and include a variable with temporal instant information for each window to enable the network to infer the dura-
tion of the discontinuities and adapt to them. However, this approach would involve training a multivariate LSTM network,
and they would not be comparable to the other forecasting algorithms assessed. In addition to the architecture of the neural
616
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627

Fig. 3. Steps of the prediction for a multivariate time series composed of two variables.

network, the inputs of the proposed LSTM are the necessary parameters for its training, such as the loss function and the
optimization method.
In addition to the ARIMA, Holt-Winters and LSTM, our method has also been applied and tested to three well-known
machine learning-based regressors as forecasters: k nearest neighbors, classification and regression trees and random
forests.
The k nearest neighbors algorithm (KNN) [45] belongs to the family of case-based reasoning approaches and is based on
the similarity in the feature space between the test sample and the samples that form the training set. The algorithm simply
determines the k closest training instances to the test sample and infers its class using the average or mode from those clos-
est instances.
However, the classification and regression trees algorithm (CART) [46] builds a decision or regression tree based on fea-
ture goodness metrics, such as Gini’s impurity index, as the splitting criterion to form the tree. CART produces an n-ary tree
by splitting its nodes into two or more child nodes repeatedly. The algorithm finds the best feature according to the goodness
metric for each split. For each feature with m different values, there exist m possible splits. For regression tasks, the leaf
nodes contain a numeric value that represents the average of the samples covered by this node. Once the tree is built, it
can be used to estimate the class of the test sample by navigating from the root of the tree to a certain leaf node.
The random forest algorithm (RF) [47] is an ensemble of decision trees based on the bagging technique using bootstrap
aggregation. The aim is to avoid overfitting considering multiple trees and using all of them to perform predictions. Each tree
inside the RF model is trained to fit to a random sample of rows and columns of the original data. For regression tasks, the
mean of each tree response is used. RF is nonlinear and it is robust for noisy data. Moreover, the algorithm was designed to
reduce the variance of errors with a minimal increase in its bias.

4. Results

This section presents the results obtained by the proposed hybrid algorithm using the electricity demand of two different
datasets, one univariate and one multivariate, which includes temperature, and four univariate datasets containing O3 and
PM10 measurements at 2 different sites. First, these time series are described in Section 4.1. Next, Section 4.2 presents the
experimental setting of the forecasting algorithms described in Section 3.2. Finally, Section 4.3 shows the performance of the
proposed algorithm when using different combinations of classification and prediction algorithms in addition to a compar-
ison with other prediction algorithms, which are applied to the whole time series instead of the different clusters.

4.1. Datasets description

Six time series are used to evaluate the effectiveness of the proposed algorithm, five univariate and one multivariate.
None of the series presents missing data, outliers or noisy values, and thus, it is not necessary to apply preprocessing tech-
niques to guarantee the quality and integrity of the data. For both time series, normalization was applied to establish the
617
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627

range of the variables in the interval ½0; 1. This does not affect the results obtained, since after making the predictions, the
inverse process is applied to enable the variables to return to the original scale.

4.1.1. Univariate time series


The first univariate time series used contains ten-minute measurements of Spanish electricity demand from January 2007
to June 2016 and was obtained from the Spanish electricity network website [48].
Fig. 4B) shows an extract of the time series, particularly in the month of January 2007. It shows a clear pattern in daily
consumption, which is repeated during the week. The dependence between working days and weekends, where consump-
tion decreases, is also observed. Therefore, there is daily and weekly stationarity, which is reflected in the Autocorrelation
function of the Time Series (ACF) shown in Fig. 4B) and Fig. 4D), respectively. Note that one day is equivalent to 144 samples
and one week to 1008 samples.
To evaluate the model, the time series was separated into a training set from January 1, 2007, to December 31, 2012, and a
test set from January 1, 2013, to June 31, 2016. Thus, the ratio between the two sets is approximately 70% to 30%, and there-
fore, the results of the models are evaluated with respect to the last three and a half years of the time series.
In addition to that time series, another four univariate series of air quality measurements are also used to validate our
proposal. Specifically, ozone (O3) and PM10 time series data is taken from [49]. O3 data used in this work is collected from
two urban sites (Asomadilla and Torneo) from air quality monitoring networks in Andalusia, Spain. Therefore, four univariate
series are used (O3 and PM10 in both Asomadilla and Torneo). To validate our method, each of the last three years (2013, 2014
and 2015) of this series are used as a test, and the rest of the samples are used as training. The data was collected from the
Institute of Statistics and Cartography from Andalusia (last accessed 2017). In [49] more details about data can be found.

4.1.2. Multivariate time series


This time series contains hourly data on electricity demand and temperature between January 2007 and December 2014
from Uruguay [50].
Figs. 5B) and 6B) show the consumption and temperature variables of the time series in January 2007. It is clear that both
demand and temperature show daily patterns, as in the case of the univariate time series. From the ACF values in Figs. 5C), it
is evident that there is a high correlation between daily and weekly values of the electricity demand. The temperature also
has a high correlation with the values of previous days, although slightly lower, but there seems to be no weekly seasonality,
as shown in Figs. 6C) and 6D).
This time series is divided into training and test sets on December 31, 2012. Thus, the ratio is 75% to train and 25% to test.

Fig. 4. Univariate electricity demand time series: A) Complete time series; B) January 2007; C) ACF for an interval of approximately 18 days; D) ACF for
30 h.

618
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627

Fig. 5. Consumption variable of the multivariate time series: A) Complete series; B) January 2007; C) ACF for an 18-day interval; D) ACF for 30 h.

4.2. Setting experimental

In this section, the values of the parameters for each forecasting method described in Section 3.2 are presented. It is worth
noting that statistical methods, such as the ARIMA and Holt-Winters have been adapted to train using time windows instead
of the whole time series.

1. The ARIMA on time windows: The prediction model based on the ARIMA is applied using different combinations of values
for the components ðp; q; dÞ. In particular, models with fixed values for q and d have been chosen, namely, 0 and 1, and
values equal to 3, 5 and 7 for p. The coefficients of the model are optimized for each cluster. The function to select the
optimal coefficients of the models is the mean absolute percentage error (MAPE) when predicting a validation set ran-
domly composed of 33% of the training set.
2. Holt-Winters on time windows: In this case, two Holt-Winters models are analyzed, both modeling the seasonal compo-
nent with a multiplicative model, which offers better results, but considering a daily or weekly period. The coefficients of
the model are optimized for each cluster. As in the case of the ARIMA models, the function MAPE, when predicting a val-
idation set randomly composed of 33% of the training set, is minimized to obtain the optimal coefficients of the models.
3. LSTM: In this case, the LSTM network is trained using the different time windows of a group and accumulating the learn-
ing obtained for each group. The designed network architecture is composed of a 200-neuron LSTM layer with a rectified
linear unit (ReLU) activation function, then a dense 100-neuron layer with a hyperbolic tangent activation function, and
finally, the output layer with a sigmoid activation function with a size equal to the number of values to be predicted, the H
values. A dropout factor of 30% is also added between the LSTM layer and the dense layer to reduce the overfitting of the
model. To train the model, the Adam optimization function is applied, using the mean squared error (MSE) as the loss
function. In addition, an early stopping strategy has also been applied to stop the training if the results do not improve.
4. KNN: The k nearest neighbors algorithm is used with k ¼ 5, the Euclidean distance and without neighbor weighting.
5. CART: The algorithm is set to minimize the mean absolute error when building the tree model. Moreover, the minimum
number of samples to split nodes is set to 2. The maximum depth of the tree is limited in a way that causes nodes to
expand until all leaves are pure or until all contain fewer than 2 samples. Both settings are purposed to avoid model
overfitting.
6. RF: For the random forest algorithm, 10 trees are used. As in CART, the algorithm is set to minimize the mean absolute
error when building the tree models. The minimum number of samples to split nodes is set to 2. The maximum depth of
the tree is limited in such a way that nodes are expanded until all leaves are pure or until all contain fewer than 2
samples.

All combinations of clustering, classifiers and prediction methods making up the hybrid model have been applied in a
systematic way, and therefore, the results obtained do not necessarily have to be optimal for each combination of models.
Better results would probably be obtained if a joint hyperparameter search were carried out. However, the W and H param-

619
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627

Fig. 6. Temperature variable of the multivariate time series: A) Complete series; B) January 2007; C) ACF for an 18-day interval; D) ACF for 30 h.

eters to divide the time series into windows have been adapted depending on the characteristics of each prediction algo-
rithm. In addition, S ¼ H is chosen to avoid overlaps in the estimations.

4.3. Analysis of the results

This section reports the experimental results obtained by the proposed hybrid model when k-means is used as the ref-
erence clustering technique, and different classifiers and forecasting algorithms are selected to build the model.

4.3.1. Clustering
First, an analysis of the results obtained by k-means is carried out to determine the optimal number of clusters for both
univariate and multivariate time series. Given that k-means results may vary for different executions, since the positions of
the centroids are randomly initialized, an analysis is performed to choose the number of clusters providing more stable
results. For that, the hybrid algorithm is composed of a classifier based on the distance to the nearest centroid and the ARIMA
(5,0,1) model as a reference classifier and prediction method.
Figs. 7 and 8 show the variations of the MAPE obtained by the proposed hybrid algorithm for 10 executions for each value
of k from 2 to 10 in both the univariate and multivariate electricity demand time series. As can be observed in these figures,
k ¼ 7 and k ¼ 5 are chosen for the univariate and multivariate time series, respectively, because they provide the most stable
results for both cases. For the multivariate time series, the colors associated with the two variables in the series are: tem-
perature (orange) and electricity demand (blue).
Fig. 9 displays the different groups identified by clustering for the month of January 2007 for the univariate time series.
Analyzing this figure, it can be observed that the time windows of each group present common characteristics. Therefore, the
training of specific predictors that exploit the patterns of each group allows us to obtain models that provide better results.
Fig. 10 shows the MAPE obtained by the proposed hybrid algorithm for each group when the reference classifier and the
ARIMA(5,0,1) are trained using only the time windows assigned to each group. It can be observed that the quality of the
results varies depending on the cluster. The best results are obtained for Cluster 2, as its instances have less variability
because most of them belong to weekdays, which have more regular electricity consumption, as shown in Fig. 11.
Fig. 12 shows the points composed of the mean and standard deviation for each cluster obtained by the k-means for the
univariate electricity demand time series. The mean of the instances is plotted on the x-axis, and the standard deviation is
plotted on the y-axis. In addition, centroids of the clusters are shown with the same color, but they are also shown with a
larger marker and wider borders. It can be observed that the instances in Cluster 3, where most of them start during the
weekend, are clearly separated from the instances in the other clusters.
To evaluate the sensitivity of the proposed hybrid algorithm to the clustering algorithm, other clustering techniques are
applied, obtaining similar results. However, most of the data is concentrated in only one cluster using spectral clustering.
Thus, in this case, the results are very similar to those when applying the prediction methods directly to the whole time ser-
620
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627

Fig. 7. MAPE for each number of clusters for the univariate electricity demand time series.

Fig. 8. MAPE for each number of clusters for the multivariate time series.

Fig. 9. Time windows grouped by clusters for the month of January 2007.

621
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627

Fig. 10. MAPE obtained by the proposed hybrid algorithm for each cluster for the univariate time series.

Fig. 11. Number of instances (time windows) belonging to each cluster depending on the day of the week (weekdays, Saturdays and Sundays).

ies. Consequently, in real-world applications, it is necessary to analyze in detail the results obtained by the clustering before
applying the proposed model, as it can greatly influence the results.

Fig. 12. Points composed of the mean and standard deviation for the time windows belonging to each cluster.

622
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627

4.3.2. Comparison between different prediction algorithms


The results obtained using different prediction methods in the proposed hybrid model are compared in this section. The
k-means algorithm with the optimal values for the number of clusters (k ¼ 7 and k ¼ 5 for univariate electricity demand and
multivariate time series, respectively) and a classifier based on the distance to the nearest centroid are used.
Tables 2 and 3 show the MAPE when predicting the test set by the proposed model using different prediction methods for
the univariate and multivariate time series, respectively. The MAPE has been used because it does not depend on the range of
the absolute values. The ARIMA combination method consists of the best ARIMA for each cluster, from among the ARIMA
(3,0,1), the ARIMA(5,0,1) and the ARIMA(7,0,1), when evaluated using a validation set. The average of all of the models is
also shown, without considering the method of the ARIMA combinations, since it would be equivalent to one of the three
ARIMA models when applied to the whole time series.
The proposed method reduces the MAPE in most cases, but it does not represent a substantial improvement. Neverthe-
less, the error obtained by the Holt-Winters weekly method for the univariate time series is similar to that of the LSTM neu-
ral network. Using a longer forecast horizon, this model is particularly suitable for stationary series. However, the trained
LSTM network is relatively simple, and other architectures with a larger number of layers could obtain better results.
Due to the stationary nature of the time series, it is possible to define a baseline replicating the same values of the pre-
vious day to evaluate the performance of the proposed algorithm. A MAPE equal to 6:54% is obtained by the baseline method
for the univariate electricity demand time series and 6:8% and 21:3% for consumption and temperature, respectively, for the
multivariate time series.
With respect to the baseline, a reduction in MAPE ranging from 2% to 3:5% is obtained using the proposed hybrid model
for univariate time series. For multivariate time series, the results by the Holt-Winters models and the LSTM reduce the
MAPE from 3:4% to 4% for the consumption variable and from 12:3% to 12:6% for the temperature. However, the results
obtained by the proposed algorithm when the ARIMA models are selected as forecasting algorithms do not improve the
results of the baseline method for the consumption variable, but the model do for temperature.
For example, Fig. 13 shows the best and worst prediction made by the proposed algorithm when selecting the ARIMA
(5,0,1) model as the forecasting method using time windows of 192 samples, equivalent to 32 h, and a forecast horizon
of 24 samples, i.e., 4 h. For the worst prediction, it is likely that the time window has been incorrectly assigned to a cluster
by the classifier used.
To compare the performance of the hybrid algorithm against other forecasting methods, it has been applied on the quality
measurement datasets used with a prediction horizon of 24 h, as considered in [49]. Table 4 shows the comparison between
the results reported in [49] and those obtained from the hybrid algorithm when using Holt-Winters, RF, KNN and CART at the
Torneo and Asomadilla sites. For the clustering algorithm, the number of clusters based on a validation consistency analysis
is k ¼ 2 and k ¼ 3 for the Torneo and Asomadilla sites, respectively. The classifier is based on the distance to the nearest cen-
troid, as in the previous tables. In this case, the prediction errors are computed using MAE and RMSE as they were the ones
applied in [49].
The results show that it is possible to improve the PSF results by applying any prediction method in the hybrid algorithm,
except with the Arima(3, 0, 1). The prediction errors obtained by RF and the algorithm proposed in Losada et al. 2018 are
similar for the Torneo dataset, obtaining better results for some years and for the average RMSE. For the Asomadilla dataset,
the best results of the hybrid algorithm are obtained when using KNN and Holt-Winters, but the performance is not as strong
as that reported in [49].
Table 5 shows the performance comparison between different prediction methods when they are used in the hybrid algo-
rithm to predict the PM-10 variable. This prediction is not reported in Losada et al. 2018, and therefore, a comparison cannot
be established. As shown in Table 4, the best results for the hybrid algorithm are obtained when using RF in Torneo and KNN
in Asomadilla.

Table 2
Comparison of different predictions methods for the univariate electricity demand time series.

Prediction model Time window Prediction horizon MAPE (%)


hours samples hours samples Hybrid model (k = 7) Whole series
ARIMA(3, 0, 1) 32 192 4 24 4.6 4.6
ARIMA(5, 0, 1) 32 192 4 24 4.3 4.4
ARIMA(7, 0, 1) 32 192 4 24 4.7 4.8
ARIMA combinations 32 192 4 24 4.1 –
Holt-Winters daily 336 2016 24 144 3.5 3.8
Holt-Winters weekly 336 2016 24 144 3.3 3.5
LSTM 840 5040 4 24 3.0 3.1
Average – – – – 3.9 4.03

623
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627

Table 3
Comparison of different predictions methods for the multivariate time series.

Prediction model Time window Prediction horizon MAPE (%)


hours samples hours samples Hybrid model (k = 5) Whole series
Cons. Temp. Cons. Temp.
ARIMA(3, 0, 1) 32 32 4 4 6.8 13.1 6.7 13.0
ARIMA(5, 0, 1) 32 32 4 4 6.6 13.6 6.6 13.9
ARIMA(7, 0, 1) 32 32 4 4 6.9 13.5 7.0 13.4
ARIMA combinations 32 32 4 4 6.6 13.7 – –
Holt-Winters daily 336 336 24 24 3.0 9.0 3.4 8.9
Holt-Winters weekly 336 336 24 24 2.8 8.7 3.0 8.8
LSTM 840 840 4 4 3.2 8.7 3.3 9.0
Average – – – – 4.88 11.10 5.0 11.17

Fig. 13. Best and worst prediction for next 4 h by the proposed algorithm.

Table 4
Performance comparison of PSF, Losada et al. 2018 and proposed algorithm with different prediction methods to forecast hourly O3 concentration (in lg/m3) at
two studied sites from 2013 to 2015.

MAE RMSE
2013 2014 2015 Average 2013 2014 2015 Average
Torneo k = 2 PSF 13.1 13.9 13.3 13.4 17.1 18.1 17.2 17.5
Losada et al. 2018 8.1 12.3 12.5 11.0 10.7 15.7 16.0 14.1
ARIMA(3, 0, 1) 18.5 18.0 18.9 18.5 21.8 21.4 22.7 22.0
Holt-Winters daily 12.7 12.7 13.9 13.1 15.2 15.2 16.6 15.6
RF 11.4 11.6 11.9 11.6 13.7 14.1 14.3 14.0
KNN 12.7 12.9 13.2 12.9 15.2 15.5 15.7 15.5
CART 12.2 12.6 12.5 12.4 14.7 15.3 15.1 15.1
Asomadilla k = 3 PSF 14.8 15.4 14.8 15.0 18.9 19.7 18.9 19.2
Losada et al. 2018 12.8 14.2 13.4 13.5 16.4 17.7 17.0 17.0
ARIMA(3, 0, 1) 20.4 19.8 22.2 20.8 24.4 23.6 26.6 24.9
Holt-Winters daily 14.0 15.1 14.9 14.7 16.4 17.9 17.6 17.3
RF 15.5 16.0 16.2 15.9 18.3 18.8 19.0 18.7
KNN 14.2 14.8 14.8 14.6 17.0 17.8 17.7 17.5
CART 15.5 16.0 16.2 15.9 18.3 18.8 19.0 18.7

4.3.3. Comparison between different classifiers


The results obtained using different classification algorithms in the proposed hybrid model are compared in this section.
The k-means algorithm with the optimal values for the number of clusters (k ¼ 7 and k ¼ 5 for the univariate electricity
demand and multivariate time series, respectively) is used.
Tables 6 and 7 show the MAPE when predicting the test set by the proposed model using different prediction and clas-
sification methods for the univariate and multivariate time series. In particular, the classifiers used are naive Bayes and SVM
with a linear kernel. In addition, a random classifier is also applied, which assigns the time windows to a group at random
without considering the characteristics of the time windows.
The results show that the proposed hybrid model is robust with respect to the classifier, as the MAPE obtained by them,
even if they are not too complex, is similar to that of the classifier based on the distance to the nearest centroid. However, it
can be observed that the results of the random classifier are consistently worse. Therefore, it can be concluded that the mod-
els generated in the hybrid method adapt to the characteristics of the time windows belonging to each group, improving the
learning of the temporal patterns.

624
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627

Table 5
Performance comparison of different prediction methods with the proposed algorithm to forecast hourly PM10 (in lm) concentration at two studied sites from
2013 to 2015.

MAE RMSE
2013 2014 2015 Average 2013 2014 2015 Average
Torneo k = 2 ARIMA(3, 0, 1) 9.0 9.9 10.1 9.7 11.0 11.8 12.3 11.7
Holt-Winters daily 8.0 7.8 8.6 8.2 9.7 9.4 10.2 9.8
RF 7.9 7.6 8.1 7.9 9.6 9.2 9.8 9.5
KNN 8.7 8.6 9.0 8.7 10.5 10.3 10.8 10.5
CART 8.2 7.9 8.4 8.2 10.0 9.6 10.1 9.9
Asomadilla k = 3 ARIMA(3, 0, 1) 13.3 5.0 5.7 8.0 26.4 5.8 6.6 12.9
Holt-Winters daily 24.8 5.5 4.8 11.7 46.5 6.4 5.6 19.5
RF 14.3 10.3 7.1 10.6 26.6 20.3 7.9 18.3
KNN 12.9 8.5 6.2 9.2 25.3 18.4 7.1 16.9
CART 14.3 10.5 7.4 10.7 26.6 20.7 8.2 18.5

Table 6
Comparison of different classifiers and prediction methods for the univariate electricity demand time series.

Prediction model Time window Prediction horizon MAPE (%)


hours samples hours samples Naive Bayes SVM Random
ARIMA(3, 0, 1) 32 192 4 24 4.6 4.5 5.9
ARIMA(5, 0, 1) 32 192 4 24 4.1 4.0 5.8
ARIMA(7, 0, 1) 32 192 4 24 4.8 4.75 5.8
ARIMA combinations 32 192 4 24 4.2 4.0 6.0
Holt-Winters daily 336 2016 24 144 3.4 3.6 5.1
Holt-Winters weekly 336 2016 24 144 3.5 3.3 5.0
LSTM 840 5040 4 24 3.0 3.2 4.5
Average – – – – 3.94 3.91 5.44

Table 7
Comparison of different classifiers and prediction methods for the multivariate time series.

Prediction model Time window Prediction horizon MAPE (%)


hours samples hours samples Naive Bayes SVM Random
Cons. Temp. Cons. Temp. Cons. Temp.
ARIMA(3, 0, 1) 32 32 4 24 6.8 13.6 6.8 13.1 7.3 14.0
ARIMA(5, 0, 1) 32 32 4 24 6.7 13.7 6.6 13.6 7.1 13.9
ARIMA(7, 0, 1) 32 32 4 24 7.0 13.3 6.9 13.5 7.2 13.8
ARIMA combinations 32 32 4 24 6.8 13.4 6.6 13.7 7.0 13.7
Holt-Winters daily 336 336 24 144 2.9 9.0 3.0 9.0 4.8 13.8
Holt-Winters weekly 336 336 24 144 2.7 8.8 2.8 8.7 4.5 14.3
LSTM 840 840 4 4 3.2 8.6 3.2 8.7 4.4 13.6
Average – – – – 5.48 11.97 5.45 11.93 6.32 13.92

5. Conclusions

In this work, a hybrid method is composed of three components, clustering, classification and prediction. This hybrid
method is proposed for time series forecasting. The first component is a clustering technique to group time windows of sim-
ilar patterns. The second component is a multiclass classifier, which is trained using the labels obtained by clustering as a
class. The classifier is used to assign a cluster in the prediction phase. Finally, the prediction component generates a forecast-
ing algorithm for each cluster. An experimental analysis is carried out to evaluate different aspects of the algorithm and its
characteristics, rather than aiming to find the optimal prediction model. In particular, different classifiers and forecasting
algorithms have been tested to show their influence on the proposed hybrid model. The results using six time series, five
univariate and one multivariate, have been reported, improving in several cases the prediction accuracy when compared
to predictions obtained by different forecasting algorithms that are applied to the whole time series without any type of
grouping. In addition, it has also been observed that the proposed algorithm is robust regarding the classifier used as a com-
ponent. It can be concluded that the models generate to compose the proposed hybrid method adapt well to the character-
istics of the time windows belonging to each group, improving the learning for the specific temporal patterns of the time
series.

625
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627

In future work, new features that allow us to capture dependencies and interactions among the variables of multivariate
time series will be included as input to discover new pattern groups that could improve the prediction accuracy. Moreover,
transfer learning will be applied, beginning from a generic model for all clusters and then specializing the model for each
group as the domains to transfer the knowledge. In addition, the hyperparameters of both the classifier and the regression
algorithms of the hybrid method will be optimized using internal time-based validation over the training set and an opti-
mization algorithm based on metaheuristics.

CRediT authorship contribution statement

M.A. Castán-Lascorz: Methodology, Investigation, Data curation, Visualization, Software, Writing - original draft, Writing
- review & editing. P. Jiménez-Herrera: Visualization, Investigation, Conceptualization, Methodology, Writing - original
draft, Writing - review & editing. A. Troncoso: Conceptualization, Methodology, Writing - original draft, Investigation, Super-
vision, Writing - review & editing. G. Asencio-Cortés: Conceptualization, Methodology, Writing - original draft, Investigation,
Supervision, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have
appeared to influence the work reported in this paper.

Acknowledgements

The authors would like to thank the Spanish Ministry of Science, Innovation and Universities for the support under pro-
ject TIN2017-8888209C2-1-R. Funding for open access publishing: Universidad Pablo de Olavide/CBUA.

References

[1] P. Chatigny, J.M. Patenaude, S. Wang, Spatiotemporal adaptive neural network for long-term forecasting of financial time series, Int. J. Approxim.
Reason. 132 (2021) 70–85.
[2] R.L. Talavera-Llames, R. Pérez-Chacón, M. Martínez-Ballesteros, A. Troncoso, and F. Martínez-Álvarez. A nearest neighbours-based algorithm for big
time series data forecasting. In Hybrid Artificial Intelligent Systems, 2016, pp. 174–185..
[3] J.F. Torres, A. Troncoso, I. Koprinska, Z. Wang, F. Martínez-Álvarez, Deep learning for big data time series forecasting applied to solar power, in:
International Conference on Soft Computing Models in Industrial and Environmental Applications, 2019, pp. 123–133.
[4] A. Galicia, J.F. Torres, F. Martínez-Álvarez, A. Troncoso, A novel spark-based multi-step forecasting algorithm for big data time series, Inf. Sci. 467 (2018)
800–818.
[5] J. Aznarte-Mellado, J. Benitez-Sanchez, D. Nieto-Lugilde, et al, Forecasting airborne pollen concentration time series with neural and neuro-fuzzy
models, Expert Syst. Appl. 32 (4) (2007) 1218–1225.
[6] J.F. Torres, D. Hadjout, A. Sebaa, F. Martínez-Álvarez, A. Troncoso, Deep learning for time series forecasting: a survey, Big Data 9 (1) (2021).
[7] Z. Huang, W. Xu, K. Yu, Bidirectional LSTM-CRF models for sequence tagging. Computing Research Repository, abs/1508.01991, 2015..
[8] J.F. Torres, A. Troncoso, I. Koprinska, Z. Wang, F. Martínez-Álvarez, Big data solar power forecasting based on deep learning and multiple data sources,
Exp. Syst. 36 (4) (2019) e12394.
[9] R. Talavera-Llames, R. Pérez-Chacón, A. Troncoso, F. Martínez-Álvarez, Mv-kwnn: a novel multivariate and multi-output weighted nearest neighbours
algorithm for big data time series forecasting, Neurocomputing 353 (2019) 56–73.
[10] R.J. Hyndman, A.B. Koehler, R.D. Snyder, S. Grose, A state space framework for automatic forecasting using exponential smoothing methods, Int. J.
Forecast. 18 (3) (2002) 439–454.
[11] A.M. De Livera, R.J. Hyndman, R.D. Snyder, Forecasting time series with complex seasonal patterns using exponential smoothing, J. Am. Stat. Assoc. 106
(496) (2011) 1513–1527.
[12] J.C. García-Díaz, O. Trull, Competitive models for the spanish short-term electricity demand forecasting, in: I. Rojas, H. Pomares (Eds.), Time Series
Analysis and Forecasting: Selected Contributions from the ITISE Conference, Springer International Publishing, Cham, 2016, pp. 217–231.
[13] G. Box, G. Jenkins, Time series analysis: Forecasting and control, John Wiley and Sons, Hoboken, NJ, USA, 2008.
[14] H.Y. Toda, P.C. Phillips, Vector autoregression and causality: a theoretical overview and simulation study, Econom. Rev. 13 (2) (1994) 259–285.
[15] L. Zhang, J. Lin, R. Qiu, X. Hu, H. Zhang, Q. Chen, H. Tan, D. Lin, J. Wang, Trend analysis and forecast of pm2.5 in fuzhou, china using the arima model,
Ecol. Ind. 95 (2018) 702–710.
[16] P. Arumugam, R. Saranya, Outlier detection and missing value in seasonal arima model using rainfall data*, Mater. Today: Proc. 5 (2018) 1791–1799.
[17] Z. Luo, W. Guo, Q. Liu, Z. Zhang, A hybrid model for financial time-series forecasting based on mixed methodologies, Exp. Syst. 38 (2) (2021) e12633.
[18] C. Cappelli, R. Cerqueti, P. D’Urso, F. Di Iorio, Multiple breaks detection in financial interval-valued time series, Expert Syst. Appl. 164 (113775) (2021)
1–9.
[19] R.L. Talavera-Llames, R. Pérez-Chacón, A. Troncoso, F. Martínez-Álvarez, Big data time series forecasting based on nearest neighbours distributed
computing with spark, Knowl.-Based Syst. 161 (2018) 12–25.
[20] M.W. Li, D.F. Han, W.L. Wang, Vessel traffic flow forecasting by rsvr with chaotic cloud simulated annealing genetic algorithm and kpca,
Neurocomputing 157 (2015) 243–255.
[21] B. Sarıca, E. Eğrioğlu, B. Asßıkgil, A new hybrid method for time series forecasting: Ar-anfis, Neural Comput. Appl. 29 (3) (2018) 749–760.
[22] O. Cagcag Yolcu, H.K. Lam, A combined robust fuzzy time series method for prediction of time series, Neurocomputing 247 (2017) 87–101.
[23] J. Soto, P. Melin, O. Castillo, Time series prediction using ensembles of anfis models with genetic optimization of interval type-2 and type-1 fuzzy
integrators, Int. J. Hybrid Intell. Syst. 11 (3) (2014) 211–226.
[24] J. Soto, P. Melin, O. Castillo, A new approach for time series prediction using ensembles of it2fnn models with optimization of fuzzy integrators, Int. J.
Fuzzy Syst. 20 (3) (2018) 701–728.
[25] J. Soto, O. Castillo, P. Melin, W. Pedrycz, A new approach to multiple time series prediction using mimo fuzzy aggregation models with modular neural
networks, Int. J. Fuzzy Syst. 21 (5) (2019) 1629–1648.
[26] S. Makridakis, E. Spiliotis, V. Assimakopoulos, Statistical and machine learning forecasting methods: Concerns and ways forward, PLOS ONE 13 (3)
(2018) 1–26.

626
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627

[27] S. Suradhaniwar, S. Kar, S.S. Durbha, A. Jagarlapudi, Time series forecasting of univariate agrometeorological data: A comparative performance
evaluation via one-step and multi-step ahead forecasting strategies, Sensors 21 (2430) (2021) 1–34.
[28] G. Dudek, P. Pełka, Pattern similarity-based machine learning methods for mid-term load forecasting: A comparative study, Appl. Soft Comput. 104
(107223) (2021) 1–14.
[29] Y. Bengio, A. Courville, P. Vincent, Representation learning: A review and new perspectives, IEEE Trans. Pattern Anal. Mach. Intell. 35 (8) (2013) 1798–
1828.
[30] S. Bai, J. Zico Kolter, V. Koltun, An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. CoRR, abs/1803.01271,
2018..
[31] B. Lim, S. Zohren, S. Roberts, Recurrent neural filters: Learning independent bayesian filtering steps for time series prediction, in: 2020 International
Joint Conference on Neural Networks (IJCNN), 2020, pp. 1–8.
[32] Z. Li, J. Han, Y. Song, On the forecasting of high-frequency financial time series based on arima model improved by deep learning, J. Forecasting 39 (7)
(2020) 1081–1097.
[33] T. Niu, J. Wang, H. Lu, W. Yang, P. Du, Developing a deep learning framework with two-stage feature selection for multivariate financial time series
forecasting, Expert Syst. Appl. 148 (2020) 113237.
[34] S. Garg, S. Peitz, U. Nallasamy, M. Paulik, Jointly learning to align and translate with transformer models, in: Proceedings of the 2019 Conference on
Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2019, pp. 4453–4462.
[35] B. Lim, S.O. Arik, N. Loeff, T. Pfister, Temporal fusion transformers for interpretable multi-horizon time series forecasting. arXiv, 1912.09363, 2020..
[36] B. Lim, S. Zohren, Time-series forecasting with deep learning: a survey, Philos. Trans. R. Soc. A: Math., Phys. Eng. Sci. 379 (2194) (2021) 20200209.
[37] Z. Hajirahimi, M. Khashei, Hybrid structures in time series modeling and forecasting: A review, Eng. Appl. Artif. Intell. 86 (2019) 83–106.
[38] L. Nguyen, V. Novák, Forecasting seasonal time series based on fuzzy techniques, Fuzzy Sets Syst. 361 (2019) 114–129.
[39] Z. Qian, Y. Pei, H. Zareipour, N. Chen, A review and discussion of decomposition-based hybrid models for wind energy forecasting applications, Appl.
Energy 235 (2019) 939–953.
[40] V.K. Ojha, A. Abraham, V. Snášel, Metaheuristic design of feedforward neural networks: A review of two decades of research, Eng. Appl. Artif. Intell. 60
(2017) 97–116.
[41] J. Chen, G.Q. Zeng, W. Zhou, W. Du, K.D. Lu, Wind speed forecasting using nonlinear-learning ensemble of deep learning time series prediction and
extremal optimization, Energy Convers. Manage. 165 (2018) 681–695.
[42] G. Trierweiler Ribeiro, V. Cocco Mariani, L. Dos Santos Coelho, Enhanced ensemble structures using wavelet neural networks applied to short-term load
forecasting, Eng. Appl. Artif. Intell. 82 (2019) 272–281.
[43] A. Galicia, R. Talavera-Llames, A. Troncoso, I. Koprinska, F. Martínez-Álvarez, Multi-step forecasting for big data time series based on ensemble
learning, Knowl.-Based Syst. 163 (2019) 830–841.
[44] F.A. Gers, J. Schmidhuber, F. Cummins, Learning precise timing with LSTM, J. Mach. Learn. Res. 3 (1) (2000) 115–143.
[45] D.W. Aha, D. Kibler, M.K. Albert, Instance-based learning algorithms, Mach. Learn. 6 (1) (1991) 37–66.
[46] L. Breiman. Classification and Regression Trees. (The Wadsworth statistics/probability series). Wadsworth International Group, 1984..
[47] L. Breiman, Random forests, Mach. Learn. 45 (1) (2001) 5–32.
[48] Red Eléctrica de Espaóa. [online]. available: www.ree.es..
[49] A. Gómez-Losada, G. Asencio-Cortés, F. Martínez-Álvarez, J.C. Riquelme, A novel approach to forecast urban surface-level ozone considering
heterogeneous locations and limited information, Environ. Modell. Software 110 (2018) 52–61.
[50] F. Martínez-Álvarez, A. Schmutz, G. Asencio-Cortés, J. Jacques, A novel hybrid algorithm to forecast functional time series based on pattern sequence
similarity with application to electricity demand, Energies 12 (1) (2019) 94.

627

You might also like