A New Hybrid Method For Predicting Univariate and Multivariate Time Series Based On Pattern Forecasting
A New Hybrid Method For Predicting Univariate and Multivariate Time Series Based On Pattern Forecasting
Information Sciences
journal homepage: www.elsevier.com/locate/ins
a r t i c l e i n f o a b s t r a c t
Article history: Time series forecasting has become indispensable for multiple applications and industrial
Received 9 July 2021 processes. Currently, a large number of algorithms have been developed to forecast time
Received in revised form 29 November 2021 series, all of which are suitable depending on the characteristics and patterns to be inferred
Accepted 1 December 2021
in each case. In this work, a new algorithm is proposed to predict both univariate and mul-
Available online 08 December 2021
tivariate time series based on a combination of clustering, classification and forecasting
techniques. The main goal of the proposed algorithm is first to group windows of time ser-
Keywords:
ies values with similar patterns by applying a clustering process. Then, a specific forecast-
Time series forecasting
Machine learning
ing model for each pattern is built and training is only conducted with the time windows
Hybrid model corresponding to that pattern. The new algorithm has been designed using a flexible frame-
work that allows the model to be generated using any combination of approaches within
multiple machine learning techniques. To evaluate the model, several experiments are car-
ried out using different configurations of the clustering, classification and forecasting
methods that the model consists of. The results are analyzed and compared to classical pre-
diction models, such as autoregressive, integrated, moving average and Holt-Winters mod-
els, to very recent forecasting methods, including deep, long short-term memory neural
networks, and to well-known methods in the literature, such as k nearest neighbors, clas-
sification and regression trees, as well as random forest.
Ó 2021 The Author(s). Published by Elsevier Inc. This is an open access article under the CC
BY license (http://creativecommons.org/licenses/by/4.0/).
1. Introduction
Knowing the evolution of the future values of a time series can offer considerable advantages in many applications, as this
enables decisions to be made in advance and adapted appropriately to obtain better results. If the estimates are sufficiently
precise and reliable, it is possible to gain substantial improvements in many factors, such as savings in costs and consump-
tion, reduction of emissions or logistics optimization [1]. Currently, there is a wide variety of forecasting methods that ana-
Abbreviation: ACF, Autocorrelation function of the Time Series; ARFIMA, Autoregressive Fractionally Integrated Moving Average; ARIMA, Autoregressive
Integrated Moving Average; CART, Classification And Regression Trees; CNN, Convolutional Neural Networks; ETS, Exponential Smoothing; FTS, Fuzzy Time
Series; KNN, K-Nearest Neighbors; LSTM, Long Short-Term Memory; MAPE, Mean Absolute Percentage Error; MSE, Mean Squared Error; RF, Random Forest;
RNN, Recurrent Neural Networks; SARIMA, Seasonal Autoregressive Integrated Moving Average; SES, Simple Exponential Smoothing; VM, Support Vector
Machines; VAR, Self-regression vector; WNN, Wavelet Neural Networks.
⇑ Corresponding author.
E-mail addresses: [email protected] (M.A. Castán-Lascorz), [email protected] (P. Jiménez-Herrera), [email protected] (A. Troncoso), [email protected]
(G. Asencio-Cortés).
https://doi.org/10.1016/j.ins.2021.12.001
0020-0255/Ó 2021 The Author(s). Published by Elsevier Inc.
This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627
lyze time series using different approaches, such as statistical models, nearest neighbors or neural networks to identify rel-
evant patterns and obtain precise estimates of their future evolution [2–4]. Therefore, it is necessary to choose the most suit-
able one for each application depending on the characteristics of the time series to be analyzed and the requirements of the
model used, such as the training time or the prediction horizon.
In recent years, numerous works have been published analyzing the behavior and quality of the predictions made on time
series in different fields, such as in air pollution [5] or electricity consumption [6]. New models or variants of classic algo-
rithms have also been proposed, which normally exploit the increase in the volume of available data and current computa-
tional capacities, such as long short-term memory (LSTM) recurrent neural networks [7].
One of the most relevant properties when selecting a prediction method is the number of variables to be considered to
predict the time series, since most of them are influenced by external factors [8]. For example, electricity consumption will
depend on solar radiation if the installation contains photovoltaic panels. There are specific methods for univariate series,
which only consider the previous values of the variable itself to estimate its evolution. For example, the evolution of sales
of a product for a year can be estimated. Other forecasting algorithms are suitable for multivariate time series, as they ana-
lyze dependencies and interactions with other variables to predict future values [9]. These are the most common in real
applications, as most variables can be influenced by external factors such as car rentals depending on weather conditions
and petrol prices. However, these methods usually require longer computation and training times, and they may not always
be the most suitable solution, as the possible improvement in prediction accuracy does not compensate due to the increase
in the complexity of the model used.
For time series, open global competitions, known as the M competition, are organized with the intention of evaluating
and comparing the accuracy of different prediction methods proposed by researchers and companies. In the competition
held in 2018, the M4 competition, which consisted of the prediction of one hundred thousand time series with different mea-
surement frequencies and different time horizons. The method that obtained the best predictions was a hybrid prediction
method between a neural network and a statistical model proposed by Uber, and the second best method was a combination
between a machine learning model and a statistical model proposed by a researcher from the University of La Coruña. These
results obtained such a large number of time series from many different applications. Combined with the ability to learn
hidden relationships in the data that machine learning methods have demonstrated, the results prove that hybrid methods
work well in the applications and motivate the development of new hybrid models between different learning machine
approaches to obtain more effective predictive models.
In this paper, we propose a hybrid method for predicting multivariate and univariate time series based on clustering, clas-
sification and forecasting algorithms. In particular, during the training phase, a forecasting model is first generated for each
cluster obtained by the previously applied clustering technique. Then, in the prediction phase, a classification method is used
to assign the new instance to a specific cluster and to use the proper forecasting model corresponding to that cluster. The
proposed model is applied to six different time series, five univariate and one multivariate, to evaluate the results obtained
and compare them with other commonly used prediction models.
In summary, the main contributions of this work are:
1. We propose a new hybrid approach based on clustering, classification and regression for time series forecasting.
2. We develop a hybrid algorithm to forecast both multivariate and univariate time series and for any prediction horizon.
3. We conduct a wide experimentation to evaluate the sensitivity of the proposed hybrid algorithm with respect to the clas-
sification and regression methods that it consists of.
4. We address a comprehensive evaluation using real electricity data from the Spanish electricity market and surface ozone
data from Spain, measured every 15 min and every hour for ten years, respectively. The results show the potential of the
proposed hybrid model for forecasting. The proposed model outperforms forecasting algorithms when applied to the
whole time series without any type of clustering.
5. We evaluate and compare the prediction accuracy of the proposed algorithm when using forecasting approaches of dif-
ferent natures, such as the autoregressive integrated moving average (ARIMA), Holt-Winters and a deep LSTM neural net-
work, as well as different classifiers, such as Naïve Bayes, support vector machines (SVMs) and a random baseline.
The rest of the paper is structured as follows: In Section 2, a general overview of the current state-of-the-art prediction
models for time series is described. In Section 3 the proposed methodology is presented. In Section 4, the datasets used to
evaluate the model are analyzed along with the experimental setting and the results. Finally, in Section 5, the main conclu-
sions and future work are presented.
2. Related work
Traditional methods of time series forecasting are mainly based on statistical fundamentals, such as simple exponential
smoothing (SES) and exponential smoothing (ETS) [10], which are suitable when predicting time series without a trend. Con-
versely, Holt and Damped exponential smoothing [11] are commonly most appropriate for time series with trends. In addi-
tion, in [12] a set of 30 models based on the multiple seasonal Holt–Winters method were trained and tested for the very
short-term electricity demand of the Spanish national electricity market. The performance of the methodology was validated
612
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627
by out-of-sample forecast comparisons using real data from the operator of the Spanish electricity system, achieving a fore-
cast accuracy comparable to the best methods in the competitions.
Moreover, more accurate classic models, including the ARIMA [13] were developed, combining autoregressive, moving
averages and integrated parts. The quality of the results depends largely on the characteristics of the time series to be ana-
lyzed. In a stationary series with very marked patterns and few variations, it is possible to obtain very small prediction errors
using traditional methods, such as the ARIMA for univariate time series or the self-regression vector (VAR) for multivariate
time series [14]. The Seasonal Autoregressive Integrated Moving Average (SARIMA) is an ARIMA model that is commonly
applied to data that contains seasonal factors. The work published in [15] conducted an empirical study concluding that
the SARIMA model achieved accurate forecasts, as well as other more complicated prediction methods. In addition to the
SARIMA, another method that can be used to predict long-term time series is the autoregressive fractionally integrated mov-
ing average (ARFIMA) method. The work published in [16] used the ARFIMA to successfully predict different types of time
series with long memory patterns. Recently, [17] proposed a hybrid model that incorporates an empirical decomposition of
the model based on a family of the ARIMA models and a Taylor expansion for financial time series forecasting. Cappelli et al.
[18] added a mechanism to detect and give more importance to multiple breakpoints in time series values when forecasting.
Stationarity is the initial requirement of a time series analysis using statistical approaches. However, in real-world cases,
most time series data is not stationary, and the use of machine learning algorithms usually helps to improve forecasting
accuracy [19]. Machine learning approaches for time series forecasting are able to analyze the behavior of data evolution
over time without previous assumptions regarding the statistical distribution of data, extracting complex nonlinear patterns.
The SVM is a technique suitable to produce both classification and regression models based on the optimization of a sepa-
ration surface within a transformed space given a user-defined kernel function. Several SVM models have been used for pre-
dicting time series [20].
Fuzzy logic is a logic paradigm where the true value of variables may be any real number between 0 and 1 instead of a
discrete value (0 or 1). Fuzzy logic is a flexible logic in the sense that it is able to adjust to the changes and uncertainties of
the problem and to model complex nonlinear functions. Fuzzy logic methods for time series forecasting include the neural
fuzzy systems and the fuzzy time series (FTS). For example, [21] proposed a new hybrid fuzzy system with neural networks
to improve forecasting accuracy by combining the benefits of neural networks and fuzzy logic. Furthermore, [22] proposed
the FTS for forecasting different types of time series. Another relevant fuzzy logic-based neural network methodology for
time series forecasting was proposed in [23]. Specifically, two fuzzy integrators (type-1 and interval type-2 Mamdani) were
assembled and adapted to forecast a chaotic time series. Moreover, the membership function parameters of the fuzzy infer-
ence systems in each integrator were optimized by genetic algorithms. The proposal was validated with the Mackey–Glass
time series, and its results were promising. Related to that work, in [24] the authors proposed an ensemble of interval type-2
fuzzy neural network models, integrating its outputs to forecast the time series using a fuzzy integrator. Genetic algorithms
and particle swarm optimization were used for the optimization of the parameter values in the membership functions of the
fuzzy integrator. The methodology was validated on time series such as Mackey–Glass, Mexican Stock Exchange, Dow Jones
and NASDAQ, achieving very competitive results. In [25] a method based on modular neural networks was proposed for mul-
tiple time series prediction using many-input many-output fuzzy aggregation models. Several representative approaches
were proposed for fuzzy aggregation models involving adaptive neuro-fuzzy inference systems and Type-1 and interval
Type-2 fuzzy inference systems. The method was validated with the time series of the Mexican Stock Exchange, National
Association of Securities Dealers Automated Quotation and Taiwan Stock Exchange, obtaining promising results.
A comprehensive comparison between statistical models and machine learning approaches was performed in [26]. In
addition, another comparative analysis of both statistical and machine learning-based forecasting algorithms was published
in [27] for a univariate agrometeorological time series. A comparative study specific for pattern similarity-based machine
learning methods for electricity load forecasting was also published in [28].
Deep learning has gained popularity in recent years, especially in the field of computer vision and natural language pro-
cessing. Deep neural networks are able to learn complex data representations [29], decreasing the need for manual feature
engineering because of their internal embedding representation. On the one hand, convolutional neural networks (CNNs),
traditionally designed for image datasets, can extract local relationships that are invariant across spatial dimensions. To
adapt a CNN to time series data, multiple layers of causal convolutions can be used for only past information to predict future
values [30]. On the other hand, recurrent neural networks (RNNs) have been designed for sequence modeling. Given the nat-
ural interpretation of time series data as sequences of inputs and targets, many RNN-based architectures have been devel-
oped for applications of time series forecasting [31]. Due to the infinite lookback window, the first RNN architectures had
limitations in learning long-range dependencies in the data due to issues related to exploding and vanishing gradients.
Therefore, LSTM networks were developed to address these limitations by improving the gradient flow in the network. Li
et al. [32] proposed an ensemble of the ARIMA with the LSTM that improved the forecasting accuracy for high-frequency
financial time series. Niu et al. [33] proposed a two-stage deep learning framework focusing on feature selection for multi-
variate financial time series. Attention mechanisms for deep neural networks [34] are a recent technique to effectively learn
long-term dependencies. Attention layers aggregate temporal features using dynamically generated weights, allowing the
network to focus directly on crucial time steps in the past. Recent works have used attention mechanisms for different time
series forecasting applications, improving the prediction accuracy with respect to other RNN architectures [35]. A recent sur-
vey of deep learning methods for time series forecasting can be found in [36].
613
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627
There is a wide range of single models proposed for time series forecasting, but they are not promising approaches with
the desired performance for all situations. There are situations where hybrid methods are an appropriate alternative that can
achieve a higher performance compared to individual models [37]. The majority of hybrid models proposed for time series
forecasting can be classified into three main types: data preprocessing-based hybrid models, parameter optimization-based
hybrid models and component combination-based hybrid models. Data preprocessing-based hybrid models are the best-
known type of hybridization, consisting of the transformation of the time series into simpler data or dividing data into sev-
eral subsets. Nguyen and Novák [38] developed a preprocessing-based hybrid model to forecast seasonal time series. In this
work, the time series is decomposed into three main parts: trend-cycle, seasonal and irregular fluctuation parts. Then, each
part is modified by fuzzy natural logic techniques and a Box & Jenkins model. Parameter optimization-based hybrid models
determine the best parameters of different forecasting models through an optimization process, such as metaheuristic algo-
rithms. Metaheuristic algorithms have been successfully adopted due to their ability to address vast search spaces [39]. Ojha
et al. [40] provided a broad overview of hybrid models based on optimization using neural network models. Finally, compo-
nent combination-based hybrid models are mainly known as ensemble models. Recently, different ensemble models have
been proposed and widely used in numerous practical fields [41]. The objective of ensemble models is to increase the accu-
racy and reduce the variance. Ribeiro et al. [42] proposed an ensemble hybrid model using the wavelet neural networks
(WNNs), where wavelet functions were utilized in a hidden layer as an activation function, for short-term load forecasting.
Galicia et al. [43] presented an ensemble framework composed of three models, a decision tree, a gradient boosted tree and a
random forest, for big data time series. A recent survey of hybrid methodologies for time series forecasting can be read in
[37].
To summarize the main characteristics of the works cited in this section, Table 1 shows the forecasting algorithm family
and type used in each work, as well as the year in which they were proposed, along with their reference.
In this section, the proposed algorithm for time series forecasting is described along with the prediction methods tested
that it consists of.
The main goal is to predict H next values (hereinafter called the prediction horizon) of a time series, expressed as
½x1 . . . ; xt , from W previous values (hereinafter called the historical data window). For that, each variable of the multivariate
time series (Var 1, Var 2,. . .) is transformed into a data matrix composed of instances and features, where the features are the
past W values and the next H values, as shown in Fig. 1. Note that the different windows present an interval of separation of
H steps. However, the algorithm can be extended using a general separation interval of S steps, simply adding a last step in
the prediction phase to solve the overlapping among the predictions obtained by the proposed algorithm.
Thus, the data for each variable Var of the time series can be expressed as follows:
Table 1
Summary of related work including the forecasting algorithm family and type used, as well as the year in which they were proposed, along with their reference.
614
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627
Fig. 2. Steps of the training for a multivariate time series composed of two variables.
ing to that cluster is applied to forecast each of the target variables. Finally, the predictions are reconstructed, solving the
overlaps, if they exist, with a specific method, such as the average of the values. Overlap between predictions appears only
when choosing a step S smaller than the prediction horizon H. Furthermore, preprocessing should also be applied in case it
was done before clustering during the training phase. Thus, the classification process is carried out using the same features.
The univariate forecasting algorithms forming the hybrid model and used to evaluate the model are described in this sec-
tion. Any forecasting method can be used within the hybrid model, but some of them cannot be applied directly. This is
because the input of the proposed hybrid algorithm is not a time series but a set of time windows that may have disconti-
nuities in the time sequence. This makes it necessary to adapt the algorithms used to learn a model from a set of time win-
dows considering how temporal discontinuities may affect each particular algorithm and application.
The forecasting methods developed to evaluate the performance of the hybrid forecasting method have been an ARIMA
model, a Holt-Winters model and an RNN, in particular an LSTM. These methods have been adapted to address temporal time
window sets rather than time series.
The ARIMA model on time windows is trained on each group and variable using all of the windows belonging to the group
to search for the set of parameters that obtain the best predictions for that cluster. The parameters of the algorithm are, on
the one hand, in the order of the different components ðp; q; dÞ of the ARIMA model and, on the other hand, the coefficients of
the model. Additionally, a function to compare the performance of the different combinations of parameters must be chosen,
since they can be evaluated according to different criteria, applying the most appropriate type of error. Once the model has
been trained, the coefficients are kept fixed and are not updated in the prediction phase.
Holt-Winters on time windows is based on using all of the windows of a group to obtain the model that provides the best
results in that cluster. The necessary parameters to define the model are the parameters for trend and stationary modeling.
In addition, the function to evaluate the results obtained using a validation set must also be provided to choose the best com-
bination of parameters.
The LSTM layers, which are applied in an RNN, allow the identification of long-term dependencies in a time series [44].
However, the time windows belonging to a particular cluster may have temporal discontinuities of different durations. Thus,
it is not appropriate to train the LSTM network by applying all of them at once, concatenated in a single sequence.
The solution proposed consists of obtaining larger time windows than the windows used in the ARIMA and Holt-Winters
methods described above to take advantage of the capacities of these layers. The LSTM is trained in different phases using
one time window at a time and accumulating the learning. Another possible option would be to concatenate all of the time
windows and include a variable with temporal instant information for each window to enable the network to infer the dura-
tion of the discontinuities and adapt to them. However, this approach would involve training a multivariate LSTM network,
and they would not be comparable to the other forecasting algorithms assessed. In addition to the architecture of the neural
616
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627
Fig. 3. Steps of the prediction for a multivariate time series composed of two variables.
network, the inputs of the proposed LSTM are the necessary parameters for its training, such as the loss function and the
optimization method.
In addition to the ARIMA, Holt-Winters and LSTM, our method has also been applied and tested to three well-known
machine learning-based regressors as forecasters: k nearest neighbors, classification and regression trees and random
forests.
The k nearest neighbors algorithm (KNN) [45] belongs to the family of case-based reasoning approaches and is based on
the similarity in the feature space between the test sample and the samples that form the training set. The algorithm simply
determines the k closest training instances to the test sample and infers its class using the average or mode from those clos-
est instances.
However, the classification and regression trees algorithm (CART) [46] builds a decision or regression tree based on fea-
ture goodness metrics, such as Gini’s impurity index, as the splitting criterion to form the tree. CART produces an n-ary tree
by splitting its nodes into two or more child nodes repeatedly. The algorithm finds the best feature according to the goodness
metric for each split. For each feature with m different values, there exist m possible splits. For regression tasks, the leaf
nodes contain a numeric value that represents the average of the samples covered by this node. Once the tree is built, it
can be used to estimate the class of the test sample by navigating from the root of the tree to a certain leaf node.
The random forest algorithm (RF) [47] is an ensemble of decision trees based on the bagging technique using bootstrap
aggregation. The aim is to avoid overfitting considering multiple trees and using all of them to perform predictions. Each tree
inside the RF model is trained to fit to a random sample of rows and columns of the original data. For regression tasks, the
mean of each tree response is used. RF is nonlinear and it is robust for noisy data. Moreover, the algorithm was designed to
reduce the variance of errors with a minimal increase in its bias.
4. Results
This section presents the results obtained by the proposed hybrid algorithm using the electricity demand of two different
datasets, one univariate and one multivariate, which includes temperature, and four univariate datasets containing O3 and
PM10 measurements at 2 different sites. First, these time series are described in Section 4.1. Next, Section 4.2 presents the
experimental setting of the forecasting algorithms described in Section 3.2. Finally, Section 4.3 shows the performance of the
proposed algorithm when using different combinations of classification and prediction algorithms in addition to a compar-
ison with other prediction algorithms, which are applied to the whole time series instead of the different clusters.
Six time series are used to evaluate the effectiveness of the proposed algorithm, five univariate and one multivariate.
None of the series presents missing data, outliers or noisy values, and thus, it is not necessary to apply preprocessing tech-
niques to guarantee the quality and integrity of the data. For both time series, normalization was applied to establish the
617
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627
range of the variables in the interval ½0; 1. This does not affect the results obtained, since after making the predictions, the
inverse process is applied to enable the variables to return to the original scale.
Fig. 4. Univariate electricity demand time series: A) Complete time series; B) January 2007; C) ACF for an interval of approximately 18 days; D) ACF for
30 h.
618
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627
Fig. 5. Consumption variable of the multivariate time series: A) Complete series; B) January 2007; C) ACF for an 18-day interval; D) ACF for 30 h.
In this section, the values of the parameters for each forecasting method described in Section 3.2 are presented. It is worth
noting that statistical methods, such as the ARIMA and Holt-Winters have been adapted to train using time windows instead
of the whole time series.
1. The ARIMA on time windows: The prediction model based on the ARIMA is applied using different combinations of values
for the components ðp; q; dÞ. In particular, models with fixed values for q and d have been chosen, namely, 0 and 1, and
values equal to 3, 5 and 7 for p. The coefficients of the model are optimized for each cluster. The function to select the
optimal coefficients of the models is the mean absolute percentage error (MAPE) when predicting a validation set ran-
domly composed of 33% of the training set.
2. Holt-Winters on time windows: In this case, two Holt-Winters models are analyzed, both modeling the seasonal compo-
nent with a multiplicative model, which offers better results, but considering a daily or weekly period. The coefficients of
the model are optimized for each cluster. As in the case of the ARIMA models, the function MAPE, when predicting a val-
idation set randomly composed of 33% of the training set, is minimized to obtain the optimal coefficients of the models.
3. LSTM: In this case, the LSTM network is trained using the different time windows of a group and accumulating the learn-
ing obtained for each group. The designed network architecture is composed of a 200-neuron LSTM layer with a rectified
linear unit (ReLU) activation function, then a dense 100-neuron layer with a hyperbolic tangent activation function, and
finally, the output layer with a sigmoid activation function with a size equal to the number of values to be predicted, the H
values. A dropout factor of 30% is also added between the LSTM layer and the dense layer to reduce the overfitting of the
model. To train the model, the Adam optimization function is applied, using the mean squared error (MSE) as the loss
function. In addition, an early stopping strategy has also been applied to stop the training if the results do not improve.
4. KNN: The k nearest neighbors algorithm is used with k ¼ 5, the Euclidean distance and without neighbor weighting.
5. CART: The algorithm is set to minimize the mean absolute error when building the tree model. Moreover, the minimum
number of samples to split nodes is set to 2. The maximum depth of the tree is limited in a way that causes nodes to
expand until all leaves are pure or until all contain fewer than 2 samples. Both settings are purposed to avoid model
overfitting.
6. RF: For the random forest algorithm, 10 trees are used. As in CART, the algorithm is set to minimize the mean absolute
error when building the tree models. The minimum number of samples to split nodes is set to 2. The maximum depth of
the tree is limited in such a way that nodes are expanded until all leaves are pure or until all contain fewer than 2
samples.
All combinations of clustering, classifiers and prediction methods making up the hybrid model have been applied in a
systematic way, and therefore, the results obtained do not necessarily have to be optimal for each combination of models.
Better results would probably be obtained if a joint hyperparameter search were carried out. However, the W and H param-
619
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627
Fig. 6. Temperature variable of the multivariate time series: A) Complete series; B) January 2007; C) ACF for an 18-day interval; D) ACF for 30 h.
eters to divide the time series into windows have been adapted depending on the characteristics of each prediction algo-
rithm. In addition, S ¼ H is chosen to avoid overlaps in the estimations.
This section reports the experimental results obtained by the proposed hybrid model when k-means is used as the ref-
erence clustering technique, and different classifiers and forecasting algorithms are selected to build the model.
4.3.1. Clustering
First, an analysis of the results obtained by k-means is carried out to determine the optimal number of clusters for both
univariate and multivariate time series. Given that k-means results may vary for different executions, since the positions of
the centroids are randomly initialized, an analysis is performed to choose the number of clusters providing more stable
results. For that, the hybrid algorithm is composed of a classifier based on the distance to the nearest centroid and the ARIMA
(5,0,1) model as a reference classifier and prediction method.
Figs. 7 and 8 show the variations of the MAPE obtained by the proposed hybrid algorithm for 10 executions for each value
of k from 2 to 10 in both the univariate and multivariate electricity demand time series. As can be observed in these figures,
k ¼ 7 and k ¼ 5 are chosen for the univariate and multivariate time series, respectively, because they provide the most stable
results for both cases. For the multivariate time series, the colors associated with the two variables in the series are: tem-
perature (orange) and electricity demand (blue).
Fig. 9 displays the different groups identified by clustering for the month of January 2007 for the univariate time series.
Analyzing this figure, it can be observed that the time windows of each group present common characteristics. Therefore, the
training of specific predictors that exploit the patterns of each group allows us to obtain models that provide better results.
Fig. 10 shows the MAPE obtained by the proposed hybrid algorithm for each group when the reference classifier and the
ARIMA(5,0,1) are trained using only the time windows assigned to each group. It can be observed that the quality of the
results varies depending on the cluster. The best results are obtained for Cluster 2, as its instances have less variability
because most of them belong to weekdays, which have more regular electricity consumption, as shown in Fig. 11.
Fig. 12 shows the points composed of the mean and standard deviation for each cluster obtained by the k-means for the
univariate electricity demand time series. The mean of the instances is plotted on the x-axis, and the standard deviation is
plotted on the y-axis. In addition, centroids of the clusters are shown with the same color, but they are also shown with a
larger marker and wider borders. It can be observed that the instances in Cluster 3, where most of them start during the
weekend, are clearly separated from the instances in the other clusters.
To evaluate the sensitivity of the proposed hybrid algorithm to the clustering algorithm, other clustering techniques are
applied, obtaining similar results. However, most of the data is concentrated in only one cluster using spectral clustering.
Thus, in this case, the results are very similar to those when applying the prediction methods directly to the whole time ser-
620
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627
Fig. 7. MAPE for each number of clusters for the univariate electricity demand time series.
Fig. 8. MAPE for each number of clusters for the multivariate time series.
Fig. 9. Time windows grouped by clusters for the month of January 2007.
621
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627
Fig. 10. MAPE obtained by the proposed hybrid algorithm for each cluster for the univariate time series.
Fig. 11. Number of instances (time windows) belonging to each cluster depending on the day of the week (weekdays, Saturdays and Sundays).
ies. Consequently, in real-world applications, it is necessary to analyze in detail the results obtained by the clustering before
applying the proposed model, as it can greatly influence the results.
Fig. 12. Points composed of the mean and standard deviation for the time windows belonging to each cluster.
622
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627
Table 2
Comparison of different predictions methods for the univariate electricity demand time series.
623
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627
Table 3
Comparison of different predictions methods for the multivariate time series.
Fig. 13. Best and worst prediction for next 4 h by the proposed algorithm.
Table 4
Performance comparison of PSF, Losada et al. 2018 and proposed algorithm with different prediction methods to forecast hourly O3 concentration (in lg/m3) at
two studied sites from 2013 to 2015.
MAE RMSE
2013 2014 2015 Average 2013 2014 2015 Average
Torneo k = 2 PSF 13.1 13.9 13.3 13.4 17.1 18.1 17.2 17.5
Losada et al. 2018 8.1 12.3 12.5 11.0 10.7 15.7 16.0 14.1
ARIMA(3, 0, 1) 18.5 18.0 18.9 18.5 21.8 21.4 22.7 22.0
Holt-Winters daily 12.7 12.7 13.9 13.1 15.2 15.2 16.6 15.6
RF 11.4 11.6 11.9 11.6 13.7 14.1 14.3 14.0
KNN 12.7 12.9 13.2 12.9 15.2 15.5 15.7 15.5
CART 12.2 12.6 12.5 12.4 14.7 15.3 15.1 15.1
Asomadilla k = 3 PSF 14.8 15.4 14.8 15.0 18.9 19.7 18.9 19.2
Losada et al. 2018 12.8 14.2 13.4 13.5 16.4 17.7 17.0 17.0
ARIMA(3, 0, 1) 20.4 19.8 22.2 20.8 24.4 23.6 26.6 24.9
Holt-Winters daily 14.0 15.1 14.9 14.7 16.4 17.9 17.6 17.3
RF 15.5 16.0 16.2 15.9 18.3 18.8 19.0 18.7
KNN 14.2 14.8 14.8 14.6 17.0 17.8 17.7 17.5
CART 15.5 16.0 16.2 15.9 18.3 18.8 19.0 18.7
624
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627
Table 5
Performance comparison of different prediction methods with the proposed algorithm to forecast hourly PM10 (in lm) concentration at two studied sites from
2013 to 2015.
MAE RMSE
2013 2014 2015 Average 2013 2014 2015 Average
Torneo k = 2 ARIMA(3, 0, 1) 9.0 9.9 10.1 9.7 11.0 11.8 12.3 11.7
Holt-Winters daily 8.0 7.8 8.6 8.2 9.7 9.4 10.2 9.8
RF 7.9 7.6 8.1 7.9 9.6 9.2 9.8 9.5
KNN 8.7 8.6 9.0 8.7 10.5 10.3 10.8 10.5
CART 8.2 7.9 8.4 8.2 10.0 9.6 10.1 9.9
Asomadilla k = 3 ARIMA(3, 0, 1) 13.3 5.0 5.7 8.0 26.4 5.8 6.6 12.9
Holt-Winters daily 24.8 5.5 4.8 11.7 46.5 6.4 5.6 19.5
RF 14.3 10.3 7.1 10.6 26.6 20.3 7.9 18.3
KNN 12.9 8.5 6.2 9.2 25.3 18.4 7.1 16.9
CART 14.3 10.5 7.4 10.7 26.6 20.7 8.2 18.5
Table 6
Comparison of different classifiers and prediction methods for the univariate electricity demand time series.
Table 7
Comparison of different classifiers and prediction methods for the multivariate time series.
5. Conclusions
In this work, a hybrid method is composed of three components, clustering, classification and prediction. This hybrid
method is proposed for time series forecasting. The first component is a clustering technique to group time windows of sim-
ilar patterns. The second component is a multiclass classifier, which is trained using the labels obtained by clustering as a
class. The classifier is used to assign a cluster in the prediction phase. Finally, the prediction component generates a forecast-
ing algorithm for each cluster. An experimental analysis is carried out to evaluate different aspects of the algorithm and its
characteristics, rather than aiming to find the optimal prediction model. In particular, different classifiers and forecasting
algorithms have been tested to show their influence on the proposed hybrid model. The results using six time series, five
univariate and one multivariate, have been reported, improving in several cases the prediction accuracy when compared
to predictions obtained by different forecasting algorithms that are applied to the whole time series without any type of
grouping. In addition, it has also been observed that the proposed algorithm is robust regarding the classifier used as a com-
ponent. It can be concluded that the models generate to compose the proposed hybrid method adapt well to the character-
istics of the time windows belonging to each group, improving the learning for the specific temporal patterns of the time
series.
625
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627
In future work, new features that allow us to capture dependencies and interactions among the variables of multivariate
time series will be included as input to discover new pattern groups that could improve the prediction accuracy. Moreover,
transfer learning will be applied, beginning from a generic model for all clusters and then specializing the model for each
group as the domains to transfer the knowledge. In addition, the hyperparameters of both the classifier and the regression
algorithms of the hybrid method will be optimized using internal time-based validation over the training set and an opti-
mization algorithm based on metaheuristics.
M.A. Castán-Lascorz: Methodology, Investigation, Data curation, Visualization, Software, Writing - original draft, Writing
- review & editing. P. Jiménez-Herrera: Visualization, Investigation, Conceptualization, Methodology, Writing - original
draft, Writing - review & editing. A. Troncoso: Conceptualization, Methodology, Writing - original draft, Investigation, Super-
vision, Writing - review & editing. G. Asencio-Cortés: Conceptualization, Methodology, Writing - original draft, Investigation,
Supervision, Writing - review & editing.
The authors declare that they have no known competing financial interests or personal relationships that could have
appeared to influence the work reported in this paper.
Acknowledgements
The authors would like to thank the Spanish Ministry of Science, Innovation and Universities for the support under pro-
ject TIN2017-8888209C2-1-R. Funding for open access publishing: Universidad Pablo de Olavide/CBUA.
References
[1] P. Chatigny, J.M. Patenaude, S. Wang, Spatiotemporal adaptive neural network for long-term forecasting of financial time series, Int. J. Approxim.
Reason. 132 (2021) 70–85.
[2] R.L. Talavera-Llames, R. Pérez-Chacón, M. Martínez-Ballesteros, A. Troncoso, and F. Martínez-Álvarez. A nearest neighbours-based algorithm for big
time series data forecasting. In Hybrid Artificial Intelligent Systems, 2016, pp. 174–185..
[3] J.F. Torres, A. Troncoso, I. Koprinska, Z. Wang, F. Martínez-Álvarez, Deep learning for big data time series forecasting applied to solar power, in:
International Conference on Soft Computing Models in Industrial and Environmental Applications, 2019, pp. 123–133.
[4] A. Galicia, J.F. Torres, F. Martínez-Álvarez, A. Troncoso, A novel spark-based multi-step forecasting algorithm for big data time series, Inf. Sci. 467 (2018)
800–818.
[5] J. Aznarte-Mellado, J. Benitez-Sanchez, D. Nieto-Lugilde, et al, Forecasting airborne pollen concentration time series with neural and neuro-fuzzy
models, Expert Syst. Appl. 32 (4) (2007) 1218–1225.
[6] J.F. Torres, D. Hadjout, A. Sebaa, F. Martínez-Álvarez, A. Troncoso, Deep learning for time series forecasting: a survey, Big Data 9 (1) (2021).
[7] Z. Huang, W. Xu, K. Yu, Bidirectional LSTM-CRF models for sequence tagging. Computing Research Repository, abs/1508.01991, 2015..
[8] J.F. Torres, A. Troncoso, I. Koprinska, Z. Wang, F. Martínez-Álvarez, Big data solar power forecasting based on deep learning and multiple data sources,
Exp. Syst. 36 (4) (2019) e12394.
[9] R. Talavera-Llames, R. Pérez-Chacón, A. Troncoso, F. Martínez-Álvarez, Mv-kwnn: a novel multivariate and multi-output weighted nearest neighbours
algorithm for big data time series forecasting, Neurocomputing 353 (2019) 56–73.
[10] R.J. Hyndman, A.B. Koehler, R.D. Snyder, S. Grose, A state space framework for automatic forecasting using exponential smoothing methods, Int. J.
Forecast. 18 (3) (2002) 439–454.
[11] A.M. De Livera, R.J. Hyndman, R.D. Snyder, Forecasting time series with complex seasonal patterns using exponential smoothing, J. Am. Stat. Assoc. 106
(496) (2011) 1513–1527.
[12] J.C. García-Díaz, O. Trull, Competitive models for the spanish short-term electricity demand forecasting, in: I. Rojas, H. Pomares (Eds.), Time Series
Analysis and Forecasting: Selected Contributions from the ITISE Conference, Springer International Publishing, Cham, 2016, pp. 217–231.
[13] G. Box, G. Jenkins, Time series analysis: Forecasting and control, John Wiley and Sons, Hoboken, NJ, USA, 2008.
[14] H.Y. Toda, P.C. Phillips, Vector autoregression and causality: a theoretical overview and simulation study, Econom. Rev. 13 (2) (1994) 259–285.
[15] L. Zhang, J. Lin, R. Qiu, X. Hu, H. Zhang, Q. Chen, H. Tan, D. Lin, J. Wang, Trend analysis and forecast of pm2.5 in fuzhou, china using the arima model,
Ecol. Ind. 95 (2018) 702–710.
[16] P. Arumugam, R. Saranya, Outlier detection and missing value in seasonal arima model using rainfall data*, Mater. Today: Proc. 5 (2018) 1791–1799.
[17] Z. Luo, W. Guo, Q. Liu, Z. Zhang, A hybrid model for financial time-series forecasting based on mixed methodologies, Exp. Syst. 38 (2) (2021) e12633.
[18] C. Cappelli, R. Cerqueti, P. D’Urso, F. Di Iorio, Multiple breaks detection in financial interval-valued time series, Expert Syst. Appl. 164 (113775) (2021)
1–9.
[19] R.L. Talavera-Llames, R. Pérez-Chacón, A. Troncoso, F. Martínez-Álvarez, Big data time series forecasting based on nearest neighbours distributed
computing with spark, Knowl.-Based Syst. 161 (2018) 12–25.
[20] M.W. Li, D.F. Han, W.L. Wang, Vessel traffic flow forecasting by rsvr with chaotic cloud simulated annealing genetic algorithm and kpca,
Neurocomputing 157 (2015) 243–255.
[21] B. Sarıca, E. Eğrioğlu, B. Asßıkgil, A new hybrid method for time series forecasting: Ar-anfis, Neural Comput. Appl. 29 (3) (2018) 749–760.
[22] O. Cagcag Yolcu, H.K. Lam, A combined robust fuzzy time series method for prediction of time series, Neurocomputing 247 (2017) 87–101.
[23] J. Soto, P. Melin, O. Castillo, Time series prediction using ensembles of anfis models with genetic optimization of interval type-2 and type-1 fuzzy
integrators, Int. J. Hybrid Intell. Syst. 11 (3) (2014) 211–226.
[24] J. Soto, P. Melin, O. Castillo, A new approach for time series prediction using ensembles of it2fnn models with optimization of fuzzy integrators, Int. J.
Fuzzy Syst. 20 (3) (2018) 701–728.
[25] J. Soto, O. Castillo, P. Melin, W. Pedrycz, A new approach to multiple time series prediction using mimo fuzzy aggregation models with modular neural
networks, Int. J. Fuzzy Syst. 21 (5) (2019) 1629–1648.
[26] S. Makridakis, E. Spiliotis, V. Assimakopoulos, Statistical and machine learning forecasting methods: Concerns and ways forward, PLOS ONE 13 (3)
(2018) 1–26.
626
M.A. Castán-Lascorz, P. Jiménez-Herrera, A. Troncoso et al. Information Sciences 586 (2022) 611–627
[27] S. Suradhaniwar, S. Kar, S.S. Durbha, A. Jagarlapudi, Time series forecasting of univariate agrometeorological data: A comparative performance
evaluation via one-step and multi-step ahead forecasting strategies, Sensors 21 (2430) (2021) 1–34.
[28] G. Dudek, P. Pełka, Pattern similarity-based machine learning methods for mid-term load forecasting: A comparative study, Appl. Soft Comput. 104
(107223) (2021) 1–14.
[29] Y. Bengio, A. Courville, P. Vincent, Representation learning: A review and new perspectives, IEEE Trans. Pattern Anal. Mach. Intell. 35 (8) (2013) 1798–
1828.
[30] S. Bai, J. Zico Kolter, V. Koltun, An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. CoRR, abs/1803.01271,
2018..
[31] B. Lim, S. Zohren, S. Roberts, Recurrent neural filters: Learning independent bayesian filtering steps for time series prediction, in: 2020 International
Joint Conference on Neural Networks (IJCNN), 2020, pp. 1–8.
[32] Z. Li, J. Han, Y. Song, On the forecasting of high-frequency financial time series based on arima model improved by deep learning, J. Forecasting 39 (7)
(2020) 1081–1097.
[33] T. Niu, J. Wang, H. Lu, W. Yang, P. Du, Developing a deep learning framework with two-stage feature selection for multivariate financial time series
forecasting, Expert Syst. Appl. 148 (2020) 113237.
[34] S. Garg, S. Peitz, U. Nallasamy, M. Paulik, Jointly learning to align and translate with transformer models, in: Proceedings of the 2019 Conference on
Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2019, pp. 4453–4462.
[35] B. Lim, S.O. Arik, N. Loeff, T. Pfister, Temporal fusion transformers for interpretable multi-horizon time series forecasting. arXiv, 1912.09363, 2020..
[36] B. Lim, S. Zohren, Time-series forecasting with deep learning: a survey, Philos. Trans. R. Soc. A: Math., Phys. Eng. Sci. 379 (2194) (2021) 20200209.
[37] Z. Hajirahimi, M. Khashei, Hybrid structures in time series modeling and forecasting: A review, Eng. Appl. Artif. Intell. 86 (2019) 83–106.
[38] L. Nguyen, V. Novák, Forecasting seasonal time series based on fuzzy techniques, Fuzzy Sets Syst. 361 (2019) 114–129.
[39] Z. Qian, Y. Pei, H. Zareipour, N. Chen, A review and discussion of decomposition-based hybrid models for wind energy forecasting applications, Appl.
Energy 235 (2019) 939–953.
[40] V.K. Ojha, A. Abraham, V. Snášel, Metaheuristic design of feedforward neural networks: A review of two decades of research, Eng. Appl. Artif. Intell. 60
(2017) 97–116.
[41] J. Chen, G.Q. Zeng, W. Zhou, W. Du, K.D. Lu, Wind speed forecasting using nonlinear-learning ensemble of deep learning time series prediction and
extremal optimization, Energy Convers. Manage. 165 (2018) 681–695.
[42] G. Trierweiler Ribeiro, V. Cocco Mariani, L. Dos Santos Coelho, Enhanced ensemble structures using wavelet neural networks applied to short-term load
forecasting, Eng. Appl. Artif. Intell. 82 (2019) 272–281.
[43] A. Galicia, R. Talavera-Llames, A. Troncoso, I. Koprinska, F. Martínez-Álvarez, Multi-step forecasting for big data time series based on ensemble
learning, Knowl.-Based Syst. 163 (2019) 830–841.
[44] F.A. Gers, J. Schmidhuber, F. Cummins, Learning precise timing with LSTM, J. Mach. Learn. Res. 3 (1) (2000) 115–143.
[45] D.W. Aha, D. Kibler, M.K. Albert, Instance-based learning algorithms, Mach. Learn. 6 (1) (1991) 37–66.
[46] L. Breiman. Classification and Regression Trees. (The Wadsworth statistics/probability series). Wadsworth International Group, 1984..
[47] L. Breiman, Random forests, Mach. Learn. 45 (1) (2001) 5–32.
[48] Red Eléctrica de Espaóa. [online]. available: www.ree.es..
[49] A. Gómez-Losada, G. Asencio-Cortés, F. Martínez-Álvarez, J.C. Riquelme, A novel approach to forecast urban surface-level ozone considering
heterogeneous locations and limited information, Environ. Modell. Software 110 (2018) 52–61.
[50] F. Martínez-Álvarez, A. Schmutz, G. Asencio-Cortés, J. Jacques, A novel hybrid algorithm to forecast functional time series based on pattern sequence
similarity with application to electricity demand, Energies 12 (1) (2019) 94.
627