Springer Lecture Notes in Computer Science
Springer Lecture Notes in Computer Science
1 Introduction
Data is recorded and stored over time in a wide range of domains. These ob-
servations lead to a collection of organized statistics called time-series data, a
set of data points ordered in time [1, 2]. Time-series data analysis and forecast-
ing are significant for many applications in business and industry, such as the
stock market and exchange, weather forecasting, electricity management, fuel
consumption, etc.[3]. There are two types of time-series data: regular time series
and irregular time series. Regular time series is the most common type and refers
to observations that occur at regular intervals of time [4]. This kind of data is
recorded in intervals of various granularity, like year, month, day, hour, or even
minute. Oppositely, irregular time series refer to observations on irregular time
intervals. An example of this type is the collection of data related to a patient’s
2 Y. Chen et al.
medical tests. Such data may be obtained only if the patient heads to a clinic and
carries out a test, an event which may not happen in regular time intervals [4].
The analysis of time series has inherent complexity. First, most time series ex-
hibit seasonality or elaborate cyclical patterns. Second, time-series data is often
affected by external factors that should be considered during analysis. Moreover,
the forecasting of time-series data usually relies on previous time points, so it is
sensitive to the variation in time. For these reasons, the analysis and forecast-
ing of time-series data has become vital but challenging. Nevertheless, several
methods for time-series data analysis have been proposed, such as ARIMA [5],
Prophet [6], as well as Deep Learning (DL) models. Specifically, Facebook re-
leased the source of Prophet—a forecasting tool used in Python and R languages
in 2017[7]. It is developed to meet the demand of the business level[7]. Moreover,
Deep Learning models are also used to do time series predicting such as Long
Short-Term Memory (LSTM)[6] and Gated Recurrent Unit (GRU)[8].
In order to perform forecasting with ML models, it is necessary to imple-
ment a type of model appropriate for the characteristics of the input datasets.
However, this is a challenging task since there is a wide variety of ML models
to choose from and, furthermore, users may not know the characteristics of the
input time-series dataset. Thus, the selection of the appropriate model can be
time-consuming, or, even, inaccurate.
Proposed Solution To solve the problem of model selection for time-series
data forecasting, we explore the association of the suitability of models with
the features of input time-series datasets. Such association of the suitability of
models with time-series features can lead the design of techniques select the
appropriate model in an automated manner, given an estimation of the char-
acteristics of the time-series data. Toward this end, we have designed a series
of experiments that consider various models that are used for time-series data
forecasting and have selected the most appropriate one based on the character-
istics of the input datasets. Our exploratory experimental analysis leads to the
formation of specific outcomes that can be used as guidelines for the appropri-
ate selection of models. We showcase how the outcomes can be used towards the
creation of an automated model-selection technique by designing a decision tree
that employs them.
In the rest of this paper, Section 2 summarizes related work and Section 3
gives an overview of our methodology for the creation of our exploratory ex-
perimental study. Section 4 describes the evaluation of the ML/DL models, and
Section 5 presents details of the implementation of the models. Section 6 presents
the experimental results, and Section 7 describes the proposed outcomes and dis-
cusses their possible application, showcasing it in the design of a decision tree.
Section 8 concludes the paper and discusses the direction for future work.
2 Related Work
Model selection is a topic in ML research that has attracted a lot of interest, since
it promises to provide both convenience and efficiency in using ML models. In
Title Suppressed Due to Excessive Length 3
1994, Yumi Iwasaki and Alon Y. Levy proposed an algorithm for selecting model
fragments automatically for simulation [9]. They designed the algorithm based
on relevance reasoning, which is used to determine which phenomenon can affect
the query [9]. This algorithm helps in choosing the required model fragments for
simulation [9]. In addition, an R package glmulti which aimed at selecting gen-
eralized linear models automatically was designed and implemented by Vincent
Calcagno in 2010 [10]. In 2016, Gustavo Malkomes et al. [11] employed Bayesian
optimization for automated model selection. They constructed a novel kernel
between models to explain a given dataset which helps them achieve the goal of
finding a model for the given dataset without human assistance [11]. Moreover,
Lars Kotthoff et al. [12] released the source of Auto-WEKA which is the addi-
tion of automatic selection technology to the original platform. They used the
Bayesian optimization method too, to help users identify the best approach for
their particular datasets. In recent years, Abdelhak Bentaleb et al. [13] proposed
a kind of Automated Model for Prediction which is used for predicting network
bandwidth. In 2020, Yuanxiangyin et al. [14] presented an automated model
selection framework to find the most suitable model for time series anomaly de-
tection [14]. They achieved this by invoking a pre-trained model selector and
a parameter estimator [14]. In 2022, Chunnan Wang et al. proposed an algo-
rithm, AutoTS, which is used for designing the suitable forecasting model for
the given time-series dataset [15]. Firstly, they constructed a search space by
decomposing time-series forcasting models into 7 modules. Then they employed
a two-stage purning and a knowledge graph analysis method to prune the search
space and understand each component [15]. In 2023, a framework for automated
model selection in natural language processing was created by Shehan Saleem
and Sapna Kumarapathirage [16]. Shehan Saleem et al. conducted trials on 2
models (BOWRF and FastText) to select the best-performing models and eval-
uated the performance by F1 marco and time [16]. Chengxin Gong and Jinwen
Ma proposed a Bayesian Ying-Yang annealing learning algorithm for the param-
eter learning of the TMGPFR model with automated model selection [17]. In
the same year, Amazon Web Services released AutoGluton-TimeSeries which is
a part of AutoGluton framework [18]. It combines classic statistic models and
deep learning models with ensembling technique and help users achieve time-
series forecasting more efficient and simple.
Although several model selection techniques have been proposed, they tar-
get various areas but not specifically time-series data. Moreover, all of these
techniques do not consider the characteristics of time-series datasets, and they
only train the models without this information. In this work, we fill this gap by
conducting a series of experiments and acquiring several outcomes that can be
applied to time-series data forecasting.
3 Methodology of Experiments
– Review of models. There are various models that can be used for time-series
data forecasting. The Autoregressive and Moving Average (ARMA) model is an
important way to study time series [19]. Based on this, one of the most popu-
lar algorithms in time-series data prediction was proposed, the Autoregressive
Integrated Moving Average (ARIMA) [5]. Moreover, a time-series forecasting
tool called Prophet, which was developed to meet the demand of the business
level data was released by Facebook [7]. Prophet took not only holidays or break
intervals, but also trends, outlier detection, and missing data into considera-
tion [7]. One traditional method, Exponential Smoothing, which was created
by Brown [20] in the 1950s to develop a tracking model for fire control informa-
tion on the location of submarines, can also be used for time-series data. Besides,
Deep Learning models are available for time-series data forecasting as well. Deep
Learning (DL) is a subset of ML that focuses on building models on the neural
network architecture [2]. Some DL models, such as LSTM, GRU, and Convolu-
tional Neural Network (CNN), are used in time-series. LSTM and GRU are 2
kinds of sub-types of Recurrent Neural Network (RNN). The special character-
istics of RNN is that it can keep information from past elements of a sequence
and use it to process the next element by calculating a hidden state [2]. Different
from RNN, CNN applies convolutional operation which allows it to create a re-
duced set of features [2]. CNN is the main architecture behind some algorithms
for image classification and segmentation, and it can also be used for time-series
data. There are also some variations of these models like Bidirectional LSTM,
Bidirectional GRU, and CNN LSTM which are an upgrade of traditional ones.
– Collection and review of datasets. Beyond the models, we took into con-
sideration the possible characteristics of time-series datasets. Besides character-
istics that also occur in other types of datasets, such as linearity, time-series
have some special characteristics. A prevalent one is related to whether the
dataset is stationary or not. Some models like ARIMA can only be used for sta-
tionary datasets. Also, another important feature of time-series datasets is that
they may exhibit seasonality. We collected different kinds of datasets. After a
thorough search, we obtained 6 complete time-series datasets: AEP hourly [21],
Air Passengers [2], Steel industry data [22], Canadian climate history [23], mi-
crodata [2], and DailyDelhiClimate [24]. AEP hourly is hourly power consump-
tion data from PJM Interconnection LLC’s website, recorded from 1998 to 2001.
Air Passengers reflects the number of passengers in each month from December
1948 to December 1960. Steel industry data is the amount of energy consump-
Title Suppressed Due to Excessive Length 5
tion of a company which produces coils, steel plates, and iron plates. Related
information on energy consumption is stored on the website of the Korea Electric
Power Corporation. Canadian climate history and DailyDelhiClimate are both
climate information in Canada and Delhi. The last one, microdata, is the real
gross domestic product of the United States from 1959 to 2009 [2].
– Selection of models and datasets. The next step of our methodology was
to choose some models and datasets to conduct an experimental evaluation. We
chose several types of DL models as these can be effective for the processing of
non-stationary data. Compared to other models, DL ones have the advantage
that they tend to perform better as more data is available, so they are appro-
priate for large complex datasets [2]. We considered 3 categories of such models,
based on their architecture: normal DL networks, bidirectional networks, and hy-
brid networks. Due to their architectural differences, these categories can have
a complementary performance on various time-series data. Concerning normal
DL networks, we selected LSTM and GRU. LSTM is a powerful tool for solving
sequence data and is particularly good at tasks with long-term dependencies,
which is suitable for time-series data. GRU is a modification of LSTM but with
a simpler structure. Furthermore, we selected the corresponding bidirectional
networks of these 2 models, so that we can do a 1-1 comparison of their per-
formance. Finally, CNN-LSTM represents the hybrid neural networks category.
CNN-LSTM is one of the most popular hybrid networks since it combines the
capabilities of CNN for spatial feature extraction and LSTM for processing time-
series data. Besides models, we also selected 2 of the 6 datasets that we have
explored. Since we focus on time-series datasets, we needed the datasets to ex-
hibit the basic and most important characteristics of time-series datasets, i.e.
stationarity and seasonality. After conducting the ADF Test (Test of station-
arity) [2] and decomposition on all datasets, we chose DailyDelhaClimate and
Steel industry data. Beyond exhibiting the aforementioned characteristics, these
2 datasets are most appropriate because they involve data about climate and
energy consumption, which may be affected by external factors. Thus, they give
us the opportunity to explore if training with more features related to external
factors is positive to the performance of models.
In this section we describe the datasets and the metrics we have used in our
experimental evaluation, as well as the design of the experiments.
4.1 Dataset
There are 11 features in the dataset. The date is the continuous time from
01/01/2018 00:15, and it was recorded every 15 minutes. Usage kWh (Usage),
Lagging Current reactive(L C R.P1), Leading Current reactive(L C R.P2), CO2,
Lagging Current Power Factor(L C P F1), and Leading Current Power Factor
all are different elements of steel energy evaluation. NSM means ‘Number of
Seconds’ from midnight. WeekStatus and Day of Week both represent the date
status. Finally Load Type is a category of Light Load, Medium Load, Maximum
Load. Steel industry data dataset is stationary since the ADF test [2] results in
a p-value of 0.0, indicates its stationarity.
The DailyDelhiClimateTrain dataset provides users with the climate in Delhi
climate from January 1st 2013 to April 24th, 2017. This dataset is collected from
Weather Undergroud API [24]. There are 5 features in the dataset which is shown
in Table 2.
From Table 2, we can see that this dataset records 4 elements of climate in
Delhi: mean temperature, humidity, wind speed, and mean pressure. This is a
representative dataset that exhibits seasonality. The climate data in a city is
a little bit regular each year, and the seasonal composition in STL [2] of the
dataset also shown in Fig. 1.
4.2 Metrics
In our study we used metrics to evaluate the models in terms of 2 aspects: time
and accuracy. In this work we use 2 metrics of accuracy: the Mean Absolute Per-
centage Error (MAPE) and the Mean Squared Error (MSE). MAPE calculates
the relative error between predicted and actual values to assess the accuracy of
a model, and the expression of it is a percentage which is simple to understand
the meaning of the result. However, the calculation of MAPE may be affected
if there is a zero in the input data since the denominator should be a non-zero
value. MSE is the square of the difference which means it is always a non-negative
Title Suppressed Due to Excessive Length 7
number. One limitation of it is that it will magnify errors, especially for that
large one because it is the square of the difference. In the same way, MSE may
be sensitive to the outliers. MAPE and MSE are employed in both the training
and the test phases to measure the accuracy of the results and help us under-
stand the suitability of models for the input datasets. Beyond these 2 metrics,
we also measured the epochs in each training process of models. The latter is a
convenient metric to measure the training time, and it shows how many epochs
the models trained, which is a measurement of processing time. It can help with
the discovery of the efficiency of the models.
Models Experiments
1 feature + horizon (1)
1 external feature + horizon (5)
LSTM
1 external feature + date features + horizon (1)
Bi-LSTM
1 external feature + date features + horizon (5)
GRU
2 external features + horizon (1)
Bi-GRU
2 external features + horizon (5)
CNN-LSTM
2 external features + date features + horizon (1)
2 external features + date features + horizon (5)
In Table 3, the ‘horizon’ is the length of time for which forecasts are gen-
erated. We create experiments for a short and a long horizon so that we can
8 Y. Chen et al.
see how this influences the results. Another factor we consider is ‘date’ feature
which encapsulates the date and time of the data collection. The ‘date’ feature
can give us important information about time patterns. For example, the con-
sumption of electricity may present some kinds of seasonality and have higher
values in summer and lower values in spring. DateTime feature engineering is
the process of creating new features from date and time information in order to
improve predictive accuracy in ML models[25]. In this work, the ‘date’ feature
includes several subdivisions of time, like year, month, day, day of week, and day
of year. The latter can signify special days in a year, like the day of Christmas.
We have created the values for these subdivisions based on the original times-
tamps of the data. Moreover, employing ‘external features’ while training the
model may have a positive impact on the prediction. For instance, in a factory,
consumption of energy may result in a change of temperature inside, so the data
of temperature contributes to the prediction of the amount of energy. Therefore,
our experiments take also either 1 or 2 ‘external features’ into consideration.
The 8 types of experiments are implemented for each of the 5 models shown in
Table 3 in order to understand the impact of these features on the training and
the prediction. All experiments for all models are implemented for both selected
datasets.
and 2 features is defining several scalers: one for the target feature and others
for some other features. In this way, it will be right and understandable while
doing the inversing of the scaled data. Normalization is usually performed on
the training data in order to normalize the range of independent variables or
features of data. After that, it is needed to convert the scaled data back to the
original data range for subsequent analysis and interpretation. This process is
called inversing which usually happens after predicting and before evaluation of
the model. An important point is that the scaler for scaling before should be
used while doing inversing to guarantee the predictions will be transformed to the
original range. In the first experiment, there is only one feature which is scaled, so
only one scaler can be used in inversing the predictions. However, the situation is
different in the second experiment. There are 2 kinds of features that are scaled at
the beginning: one scaler called scaler date is for date features, and another one
called scaler target is for the target values. Therefore, the scaler target should
be used in inversing the predictions instead of the scaler date. After that, it is
uncomplicated to finish the following evaluation step.
5.2 Window
6 Experimental Results
We present our experimental results for the two selected datasets, namely the
DailyDelhaClimate and the Steel Industry datasets.
The results of the first dataset DailyDelhaClimate is shown in Fig.2. In the figure
the results are marked with different colors, with red denoting the smallest, and
green the largest, values. In general, the more accurate model is BiGRU which
achieves the lowest error rate in experiments 1, 4, 8, and there are more low error
rates in BiGRU, indicated by red areas. In contrast, LSTM is not as precise as
other models since there are more yellow cells in the results for LSTM in the
result table. Conversely, LSTM has the fewest epochs of training. From the table,
we can see that the number of epochs is the smallest for LSTM in experiments 1,
6, and 8, and this means it is efficient, as training time is short. There are some
Title Suppressed Due to Excessive Length 11
general observations that we can make based on the results of all 8 experiments.
All highest errors occurred in experiment 3, for which the horizon value is 1 and
which uses date as an additional feature. The reason for this is that the models
are more likely to rely on the recent observations when the horizon is 1, and
date features may not provide direct information about upcoming changes but
may introduce some unwanted noise into the model, especially if these features
have no obvious correlation with the target variable. Thus, we can see that in
most of experiments where the horizon is 1, the results are better if there are
no date features. But if the horizon is 5 which means it is long-term forecasting,
the results are better with date features. This is because long-term predicting
can make use of date features and it may have a positive impact on the results.
Additionally, the results are more accurate when the horizon is 1 than when the
horizon is 5 in the most of experiments. Usually, it is easier and more accurate
to predict points that are close in the future than those that are farther away,
since near-term data are more reflective of the current patterns and trends and
the farther a data point is in the future, the less it may be affected by such
patterns and trends and the more it may be affected by other factors. The more
long-term a prediction is the higher the risk of error ,because each prediction
of the model relies on the predictions of the previous time step. Concerning the
number of features, we observe that the higher the number of features, the more
accurate a prediction is, since the model can learn the data in a more holistic
manner. At the same time, the results of experiments with date features are
better than those without these features except for experiment 3. To a certain
extent, date features help models understand and capture the seasonality and
trends of this dataset, and models can know about the data clearly by leveraging
date features.
7.1 Outcomes
proposed several phenomenons that how these Outcomes can be used correctly.
We designed a decision tree that outputs a recommendation of the most suitable
model based on the characteristics of the input dataset. It achieves selecting a
suitable model with several conditions. Moreover, we also proposed that these
Outcomes can be used as meta-information in the training process to design an
Automated Model Selection Technique.
We continue working on the Outcomes we got. We intend to employ more
datasets to acquire more accurate Outcomes. At the same time, more models
can be considered to increase the generalization of our results.
References
1. Esling, P., & Agon, C. (2012). Time-series data mining. ACM Computing Surveys
(CSUR), 45(1), 1-34.
2. Peixeiro, M. (2022). Time series forecasting in python. Simon and Schuster.
3. Mahalakshmi, G., Sridevi, S., & Rajaram, S. (2016, January). A survey on forecast-
ing of time series data. In 2016 International Conference on Computing Technologies
and Intelligent Data Engineering (ICCTIDE’16) (pp. 1-8). IEEE.
4. Joseph, M. (2022). Modern Time Series Forecasting with Python: Explore industry-
ready time series forecasting using modern machine learning and deep learning.
Packt Publishing Ltd.
5. Box, G. E. P., & Tiao, G. C. (1975). Intervention Analysis with Applications to
Economic and Environmental Problems. Journal of the American Statistical Asso-
ciation, 70(349), 70–79.
6. Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE
tra nsactions on Signal Processing, 45(11), 2673-2681.
7. Jha, B. K., & Pande, S. (2021, April). Time series forecasting model for supermar-
ket sales using FB-prophet. In 2021 5th International Conference on Computing
Methodologies and Communication (ICCMC) (pp. 547-554). IEEE.
8. Cho, Kyunghyun; van Merrienboer, Bart; Bahdanau, DZmitry; Bougares, Fethi;
Schwenk, Holger; Bengio, Yoshua (2014). ”Learning Phrase Representations using
RNN Encoder-Decoder for Statistical Machine Translation”
9. Iwasaki, Y., & Levy, A. Y. (1994, August). Automated model selection for simula-
tion. In AAAI (pp. 1183-1190).
10. Calcagno, V., & de Mazancourt, C. (2010). glmulti: an R package for easy auto-
mated model selection with (generalized) linear models. Journal of statistical soft-
ware, 34, 1-29.
11. Malkomes, G., Schaff, C., & Garnett, R. (2016). Bayesian optimization for auto-
mated model selection. Advances in neural information processing systems, 29.
12. Kotthoff, L., Thornton, C., Hoos, H. H., Hutter, F., & Leyton-Brown, K. (2017).
Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in
WEKA. Journal of Machine Learning Research, 18(25), 1-5.
13. Bentaleb, A., Begen, A. C., Harous, S., & Zimmermann, R. (2020). Data-driven
bandwidth prediction models and automated model selection for low latency. IEEE
Transactions on Multimedia, 23, 2588-2601.
14. Ying, Y., Duan, J., Wang, C., Wang, Y., Huang, C., & Xu, B. (2020). Automated
model selection for time-series anomaly detection. arXiv preprint arXiv:2009.04395.
15. Wang, C., Chen, X., Wu, C., Wang, H. (2022). AutoTS: Automatic time series fore-
casting model design based on two-stage pruning. arXiv preprint arXiv:2203.14169.
16 Y. Chen et al.
16. Saleem, S., & Kumarapathirage, S. (2023, June). AutoNLP: A Framework for
Automated Model Selection in Natural Language Processing. In 2023 18th Iberian
Conference on Information Systems and Technologies (CISTI) (pp. 1-4). IEEE.
17. Gong, C., & Ma, J. (2023). Automated Model Selection of the Two-Layer Mixtures
of Gaussian Process Functional Regressions for Curve Clustering and Prediction.
Mathematics, 11(12), 2592.
18. Shchur, O., Turkmen, A. C., Erickson, N., Shen, H., Shirkov, A., Hu, T., Wang,
B. (2023, December). AutoGluon–TimeSeries: AutoML for probabilistic time series
forecasting. In International Conference on Automated Machine Learning (pp. 9-1).
PMLR.
19. Mondal, P., Shit, L., & Goswami, S. (2014). Study of effectiveness of time series
modeling (ARIMA) in forecasting stock prices. International Journal of Computer
Science, Engineering and Applications, 4(2), 13.
20. Gardner Jr, E. S. (1985). Exponential smoothing: The state of the art. Journal of
forecasting, 4(1), 1-28.
21. Hour Energy Consumption, https://www.kaggle.com/datasets/robikscube/hourly-
energy-consumption, Last accessed 2023/12/20
22. Steel Industry Energy Consumption, https://www.kaggle.com/datasets/csafrit2/steel-
industry-energy-consumption, Last accessed 2023/12/21
23. Medium-Ds-Unsupervised-Anomaly-Detection-Deepant-Lstmae,
https://github.com/bmonikraj/medium-ds-unsupervised-anomaly-detection-
deepant-lstmae, Last accessed 2023/12/21
24. Daily Climate Time Series Data, https://www.kaggle.com/datasets/sumanthvrao/daily-
climate-time-series-data, Last accessed 2023/12/21
25. Practical Guide for Feature Engineering of Time Series Data,
https://dotdata.com/blog/practical-guide-for-feature-engineering-of-time-series-
data, Last accessed 2024/02/02
26. When to Perform a Feature Scaling, https://www.atoti.io/articles/when-to-
perform-a-feature-scaling, Last accessed 2024/01/04
27. Why Scaling Your Data Is Important, https://medium.com/codex/why-scaling-
your-data-is-important, Last accessed 2024/02/05