0% found this document useful (0 votes)

28 views8 pages

An Automated Forecasting Framework Based On Method Recommendation For Seasonal Time Series

This document proposes an automated forecasting framework based on method recommendation for seasonal time series. The framework (1) extracts characteristics from a time series, (2) selects the best machine learning forecasting method based on recommendation rules, and (3) performs the forecast. It aims to address the "No Free Lunch" theorem, which states that no single forecasting method works best for all time series. The framework decomposes time series and uses a statistical method to forecast trends while a machine learning method predicts complex patterns. An evaluation shows the framework achieves the best forecasting accuracy compared to relying on a single method.

Uploaded by

2707SAKSHI TONDE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views8 pages

An Automated Forecasting Framework Based On Method Recommendation For Seasonal Time Series

Uploaded by

2707SAKSHI TONDE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

SESSION 2: Performance Learning ICPE '20, April 20–24, 2020, Edmonton, AB, Canada

An Automated Forecasting Framework based on Method

Recommendation for Seasonal Time Series
André Bauer, Marwin Züfle, Johannes Grohmann, Norbert Schmitt, Nikolas Herbst, Samuel Kounev
University of Würzburg, Germany
[email protected]

ABSTRACT forecasting into the decision-making process allows to proactively

Due to the fast-paced and changing demands of their users, com- face those changes that cause overheads for their execution.
puting systems require autonomic resource management. To enable Indeed, there are different methods, such as statistical methods
proactive and accurate decision-making for changes causing a par- or machine learning approaches, that support accurate forecasting
ticular overhead, reliable forecasts are needed. In fact, choosing the results. Due to the variety of methods in question, the choice and
best performing forecasting method for a given time series scenario configuration of the best performing method for a given time series
is a crucial task. Taking the "No-Free-Lunch Theorem" into account, remain to be a mandatory expert task to avoid trial and error. Thus,
there exists no forecasting method that performs best on all types the question arises if there is a single method that forecasts best
of time series. To this end, we propose an automated approach that for all time series. The "No-Free-Lunch Theorem" [19], initially
(i) extracts characteristics from a given time series, (ii) selects the formulated for optimization problems, denies the possibility of
best-suited machine learning method based on recommendation, such a method. It states that improving the performance of one
and finally, (iii) performs the forecast. Our approach offers the ben- aspect leads typically to a degradation in performance for some
efit of not relying on a single method with its possibly inaccurate other aspect.
forecasts. In an extensive evaluation, our approach achieves the In fact, various types of hybrid methods have been introduced in
best forecasting accuracy. recent years to tackle the "No-Free-Lunch Theorem". While statisti-
cal models have their difficulties with complex patterns, machine-
CCS CONCEPTS learning-based methods struggle with non-stationary data (i.e., high
variance and trend). To face these weaknesses, we pose ourselves the
• General and reference → Experimentation; Performance; •
research question RQ1: How to build a generic and hybrid forecasting
Theory of computation → Unsupervised learning and clus-
framework for seasonal time series that dynamically minimizes the
tering; • Applied computing → Forecasting.
disadvantages of each component?". The core idea is to decompose
the time series into multiple parts. The trend is forecast separately
KEYWORDS by a statistical method while a machine learning method predicts
Forecasting, Recommendation, Machine Learning, Feature Engi- the complex pattern and then assembles the time series. While
neering, Comparative studies developing the generic and hybrid forecasting method, we limit
ACM Reference Format:
ourselves to univariate time series. In fact, correlated/external data
André Bauer, Marwin Züfle, Johannes Grohmann, Norbert Schmitt, Nikolas can be used for each time series to improve the forecast. However,
Herbst, Samuel Kounev. 2020. An Automated Forecasting Framework based the selection and preprocessing of such additional information re-
on Method Recommendation for Seasonal Time Series. In Proceedings of the quires domain knowledge. Also, this knowledge of domain-specific
2020 ACM/SPEC International Conference on Performance Engineering (ICPE feature engineering cannot yet be fully automated. Consequently,
’20), April 20–24, 2020, Edmonton, AB, Canada. ACM, New York, NY, USA, our method would have to be tailored to a particular domain and
8 pages. https://doi.org/10.1145/3358960.3379123 contradict the goal of a more generic approach.
Keeping the "No-Free-Lunch Theorem" in mind, it is not rec-
1 INTRODUCTION ommended relying on a specific method. Thus, we target a recom-
mendation system that suggests the best-suited machine learning
Nowadays, computing systems are pushed to their limits by the
approach for a given time series. Therefore, we pose ourselves the
fast living and changing requirements of their users. To this end,
question RQ2: "What are suitable time series characteristics for the
the autonomic management of these systems is needed. Based on
recommendation?". Consequently, the question arises RQ3: "What
the collected information, these systems can only react to changing
are suitable approaches for the recommendation system?"
requirements with an inherent delay. Thus, integrating time series
Towards addressing the questions above, our contribution in
this paper is four-fold: (i) We propose an automated forecasting
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed framework for seasonal time series that decomposes a given time
for profit or commercial advantage and that copies bear this notice and the full citation series and extracts characteristics (see Section 3.3), recommends
on the first page. Copyrights for components of this work owned by others than the the best-suited machine learning method, which is selected on dy-
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission namically learned rules, and finally, assembles the components and
and/or a fee. Request permissions from [email protected]. performs the forecast (see Section 3.1). (ii) To build the recommen-
ICPE ’20, April 20–24, 2020, Edmonton, AB, Canada
dation rules dynamically, a knowledge base is built upon a set of
© 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-6991-6/20/04. . . $15.00 historical time series. (iii) For the recommendation, we introduce
https://doi.org/10.1145/3358960.3379123

48
SESSION 2: Performance Learning ICPE '20, April 20–24, 2020, Edmonton, AB, Canada

three different approaches and propose our own time series charac- a specific statistical noise distribution. It is also considered as the
teristics (see Section 3.2.1). (iv) In a broad evaluation (see Section 4), residual time series after all other components have been removed.
we analyze the different approaches, investigate the impact of the
time series generation, and compare our forecast framework with 2.3 Fourier Terms & Frequency Detection
state-of-the-art forecasting methods.
In many fields, especially for forecasting, it is helpful to know the
Without our framework, a simple and straight-forward approach
frequencies, i.e., the lengths of the seasonal patterns. For instance,
for choosing the best-suited method for a given time series would be
if the most dominant frequency is unknown for a given time series,
based on trial and error, or the consultation of an expert. However,
the time series cannot be decomposed by the method explained
both possibilities are expensive, time-consuming, or error-prone.
above. By dominant, we mean the most common period, i.e., the
That is, through the automation of choosing the best method in
seasonal pattern such as days in a year. An established approach
conjunction with the hybrid approaches leads to good forecasting
for frequency analysis is the Fourier transform, which allows to
results and helps saving time and costs.
determine the distribution of frequencies or the spectral density of
the time series. As a time series can be represented as a weighted
2 BACKGROUND sum of sinusoidal components, the found frequencies can be used
Before explaining our approach in detail, we outline some back- to retrieves these components, also referred to as Fourier terms.
ground concepts. Thus, Section 2.1 gives a short introduction to
time series. Afterward, the time series decomposition is explained. 3 APPROACH
Finally, the frequency detection and Fourier terms are outlined. As our approach is two-fold, we first introduce the automatic de-
composition, feature extraction, and forecasting of a time series. In
2.1 Time Series Section 3.2, we explain the recommendation system for selecting
the most suitable machine learning approach. Afterward, the con-
A univariate time series is an ordered collection of values of a quan-
sidered time series characteristics are presented. Finally, the used
tity obtained over a specific period or from a certain point in time.
machine learning methods are highlighted.
In general, observations are recorded in successive and equidistant
time steps (e.g., hours). Typically, internal patterns exist, such as
autocorrelation, trend, or seasonal variation. 3.1 Automatic Time Series Forecasting
One of the essential characteristics of a time series is the station- The assumption of data stationarity is an inherent limitation for
arity. Hence, most statistical forecasting methods have the assump- time series forecasting. Any time series property that eludes station-
tion that the time series is either stationary or can be “stationar- arity, such as non-constant mean (trend), seasonality, non-constant
ized” through a transformation. The statistical properties (such as variance, or multiplicative effect, poses a challenge for the proper
mean, variance, auto-correlation) of a stationary time series do not model building. Consequently, we design an automated time series
change over time. Therefore, a stationary time series is easier to forecasting method that addresses these issues. Figure 1 shows
model and forecast. In practice, however, time series are usually the work-flow of the automatic time series forecasting part. The
showing a mix of trend or/and seasonal patterns and are thus non- blue rectangle boxes reflect actions, the green trapezoids machine
stationary [1]. To this end, time series are transformed, seasonally learning features, the grey rounded boxes the target for the ma-
adjusted, made trend-stationary by removing the trend, or made chine learning, and the rounded white boxes everything else. The
difference-stationary by possibly repeated differencing. functioning can be grouped into four steps (dashed red boxes):
(i) preprocessing, (ii) recommendation, (iii) forecasting, and (iv)
postprocessing. Each part is described in the following.
2.2 Time Series Decomposition
As a time series consists of different components, a common ap- 3.1.1 Preprocessing. This step is responsible for preparing the time
proach is to break down the time series into its components. These series and extracting the intrinsic features for the machine learning
parts can either be used for modifying the data (e.g., removing the algorithm. The first step consists of the frequency estimation. If
trend or the seasonality), or they can be used as intrinsic features the time series has a certain frequency, this frequency is chosen.
(e.g., modeling different recurring patterns). Otherwise, the most dominant frequency is estimated. Next, if
A common method for decomposing a time series is STL (Sea- the time series has multiplicative effects, the logarithm is used to
sonal and Trend decomposition using Loess) [5]. STL can handle transform the time series. The Fourier terms (the sine and cosine
any type of seasonality, allows the seasonal pattern to change over pair) for the most dominant frequency are determined and used
time, and disassembles the given time series into the components as intrinsic features later on. Although most forecasting methods
trend T , season S, and irregular I (also called remainder). The long- assume stationary time series, many time series exhibit trend or/and
term development in a time series (i.e., upwards, downwards, or seasonal patterns. To tackle the non-stationarity, our approach
stagnate) is called trend. Usually, the trend is a monotone function decomposes the time series and then handles each part separately.
unless external events trigger a break and cause a change in the di- To this end, the time series is decomposed by STL (see Section 2.2)
rection. The presence of recurring patterns within a regular period into season, trend, and remainder. The seasonal component is used
in the time series is called seasonality. These patterns are caused as an intrinsic feature later on. The remainder is ignored since it is
by climate, customs, or traditional habits. The unpredictable part irregular and hard to predict and therefore correlated with a high
of a time series is called irregular component, possibly following error rate. Finally, the trend is removed from the time series to

49
SESSION 2: Performance Learning ICPE '20, April 20–24, 2020, Edmonton, AB, Canada

Forecasting
Legend Action Feature Target Artifacts Forecast Fourier Postprocessing
Terms
Machine Learning: Forecast
Repeat Pattern
Predict Detrended TS
Time Series Forecast Season

Fourier Series Fourier Terms Add Trend

Machine Learning:
Model &
Train
Re-Transformation
Season
Forecast Trend
Frequency
Estimation & Decomposition Trend Forecast Time
Transformation Series
Forecast
Remainder

Detrended Time Recommend Machine Learning

Detrend
Series Machine Learning Method

Preprocessing Recommendation

Figure 1: Overview of forecasting process of our framework.

make the time series trend-stationary. The detrended time series is 3.1.4 Postprocessing. In this last step, the forecast trend is ap-
the target value for model building. pended to the forecast detrended time series to assemble the fore-
cast time series. Moreover, if the time series was multiplicative, the
3.1.2 Recommendation. The detrended time series is passed from forecast time series is re-transformed with the exponential function.
the preprocessing step and is the basis for the recommendation. Finally, the forecast time series is returned.
The recommendation selects which machine learning algorithm is
best suited to model the detrended time series. Thus, time series 3.2 Machine Learning Recommendation
characteristics are extracted from the detrended time series. Based
To tackle the problem that arises with the "No-Free-Lunch Theo-
on these characteristics, a suitable machine learning method is
rem", we employ a recommendation system for machine learning
selected. The detailed recommendation is explained in Section 3.2.
approaches. The idea is to choose the best suitable method based on
3.1.3 Forecasting. To build a suitable forecast model that takes the the time series characteristics. Figure 2 shows the recommendation
features derived in the previous step into account, we use the ma- work-flow. The blue rectangle boxes reflect actions, the green trape-
chine learning algorithm recommended by the last step. To reduce zoids reflect machine learning features, the grey rounded boxes the
the model error and later the forecast error, we exclude the trend machine learning target, and the rounded white boxes everything
and the remainder as features. The trend was removed during the else. The functioning can be grouped into two phases (dashed red
first step to make the time series trend-stationary. The remainder boxes): (i) an offline phase and (ii) an online phase. Both phases are
of the time series is not explicitly considered a feature. That is, the described in the following.
machine learning method notices a difference that is missing to 3.2.1 Offline Phase. The offline phase learns the rules for recom-
fully recreate the target value. In other words, this difference is mendation a specific method based on time series characteristics,
the remainder and is learned implicitly as the machine learning during the start or if no forecast is currently conducted. To this end,
method tries to explain this difference. Consequently, the consid- our approach requires an initial set of time series that are stored in
ered features include the season and the Fourier terms, and the the associated storage. To have a broad training set independent
target value corresponds to the detrended time series. Although of the amount of original time series, the first step in this phase is
seasonality can also violate stationarity, time series models usually to create new time series based on the original time series in the
explicitly take seasonality into account. Also, machine learning storage. For this purpose, three different methods are used:
methods are suitable for pattern recognition. To this end, we keep (i) The first method splits time series into smaller parts to have
the seasonality as a feature. a more diverse set of time series with different lengths. The length
To forecast the time series, each feature and the trend has to be of a split is the maximum between a freely configurable length and
forecast separately. As the season and the Fourier terms are recur- 10% of the original length. (ii) The core idea of the second method is
ring patterns per definition, these features can merely be continued. to decompose the time series, modify one component, and assemble
Based on the trend component, an ARIMA1 model [11] without the modified component and the two remaining parts to a new time
seasonality is determined that forecasts the future trend of the series. More precisely, this method modifies each component one
time series. Simultaneously, the forecast patterns of the season and after the other and creates, therefore, three new time series. For the
Fourier terms, in combination with the model, are used to predict modification, the divisors of the frequency of the time series are
the detrended time series. determined. For each divisor, the components are modified with the
proportion of the frequency and the divisor differently: The trend
1 We select ARIMA as it is able to estimate the trend even from a few points, and we is getting steeper; the season is compressed, i.e., the period length
use an automatic version that selects the most suited model [10]. becomes shorter; the remainder is stretched. (iii) The third method

50
SESSION 2: Performance Learning ICPE '20, April 20–24, 2020, Edmonton, AB, Canada

Online Phase
Legend Action Feature Target Artifacts
Characteristics Characterics
Extraction
Ofﬂine Phase

Time Series Creating New Machine Learning Method

Best Methods
Storage Time Series Method Evaluation Recommendation
New Time Series
Learning Recommendation
Recommendation Rules Machine Learning
Extended Time Method
Series Storage
Time Series
Characteristics Characteristics
Extraction Forecasting Forecast Time
Approach Series

Figure 2: Overview of the recommendation process of our forecasting framework.

also decomposes the time series. More precisely, it combines each error. (ii) The core idea of the second approach AR is to learn how
component of each time series with each component of the other much each method is worse than the best method. In more detail,
time series. The length of the resulting time series is equal to the the approach calculates for each method how much worse this
shortest component that was used. method is compared to the method with the lowest forecast error
Due to the limitations of STL, which requires at least two full for given time series characteristics. Then, a random forest is used
periods, only new time series with a length greater than two times as a regressor for each machine learning method in question for the
the period plus one are considered valid. Created time series that do selection. In other words, the random forest tries to find a function
not fulfill this requirement are considered invalid and are discarded. that learns how much worse the method is in comparison to the
This method is able to create a huge training set (including the orig- best method based on the time series characteristics. After each
inal time series) with a high diversity of time series characteristics. method has estimated how worse the forecast will be for a new
The rough number of the training set is the number of original time time series, the method with the lowest value is chosen. (iii) The
series to the power of three. third method is a hybrid approach AH that combines the first two
After the training set is generated, the time series characteristics approaches. More specifically, a random forest regressor is used
(see Section 3.3) of each time series are extracted. As the machine for each machine learning method available to estimate how much
learning methods have to handle the detrended time series, the char- worse the method is in comparison to the method with the lowest
acteristics are also calculated on the detrended time series. At the error. Then, another random forest is used as a classifier to map the
same time, the machine learning method evaluation is conducted. estimation of how worse the forecast will be to the best method. The
During the evaluation, each method (see Section 3.4) performs a idea is to minimize the regression error of each method. For example,
forecast for each time series. To this end, the time series is split into if one method always claims to have the lowest degradation, but it
history (the first 80% of the time series) and in future (the remaining does not perform as well, the classification shall learn this behavior.
20%). For the forecasting, each method gets, as explained in Section
3.1, the Fourier terms, and the season as input while the detrended
time series is the target. Then, for each time series and each method, 3.2.2 Online Phase. This phase takes place when a forecast for a
the forecast error, in this case, the mean absolute error (MAPE), is given time series is conducted. First, the characteristics of the time
calculated: series are extracted. Then, the recommendation rules are applied
n
100% Õ yt − ft to the characteristics, and a machine learning method is selected.
MAPE := | |. (1) Afterward, the forecasting approach (see Section 3.1) performs the
n t =1 yt
forecast. Finally, the time series is saved within the time series
In this equation, n is the forecast horizon, yt the actual value, and ft storage, and new time series can be generated, as explained in
the forecast value. To have a comparable forecast measure among Section 3.2.1.
all time series, we normalize for each time series the forecast error
with the lowest error. This normalization results in values ≥ 1 for
each time series. Further, the best method has a value of 1. We
define these values as forecast accuracy degradation ϑ showing how 3.3 Time Series Characteristics
much worse the forecast accuracy is compared to the best method. To train a machine learning method for choosing the best method,
For instance, a forecast accuracy degradation of 1.05 means that the suitable features are required. Thus, we calculate for each time
method is 5% worse. Based on the forecast accuracy degradation, series a set of characteristics. These characteristics contain infor-
the best method for each time series is determined. mation about the time series, statistical measures, characteristics
Based on the time series characteristics and the best method for proposed by Wang et al. [18], characteristics proposed by Lemke
each time series, the recommendation rules can be learned. For this and Gabrys [13], and characteristics we propose in this work. The
purpose, we envision three different approaches: used time series characteristics and the associated calculation in-
(i) The first approach AC is a classification task. That is, a random structions are listed in Table 1. In contrast to the work of Wang et
forest is used to map the time series characteristics for the given al., we use the raw values of the characteristics to avoid arbitrary
time series to the machine learning method with the lowest forecast normalization factors.

51
SESSION 2: Performance Learning ICPE '20, April 20–24, 2020, Edmonton, AB, Canada

Table 1: Overview of the considered time series characteristics

Characteristic Description Formula

Frequency⋄ The frequency is the length of the most dominant recurring patterns within the time series. f = frequency(Y)
Length⋄ The total number of observations included in the time series. n = length(Y)
Standard deviation§ σ = var (Y )
p
The standard deviations measures the amount of variations within the time series.
1 Ín
Skewness§ The skewness measures the symmetry of the value distribution of the time series. (Yt − Y )3
n · σ 3 k=1
1 Ín
Kurtosis § Kurtosis is a measure of the tailedness of the value distribution of the time series. (Yt − Y )4
n · σ 4p k=1
Remainder SD§ The stand deviation of the remainder. σR = var (R)
1 Ín
Remainder skewness§ The skewness of the remainder. (R t − R)3
n · σR3 k =1
1 Ín
Remainder kurtosis§ The kurtosis of the remainder. (R t − R)4
n · σR4 k =1
QR(R)
Proportion remainder♯ This characteristic reflects how strongly the remainder is prominent in the time series.
QR(R) + QR(S)
QR(S)
Proportion season♯ This characteristic reflects how strongly the season is prominent in the time series.
QR(R) + QR(S)
This characteristic quantifies the regularity and unpredictability of fluctuations of the time series. 1 Í ⌊n/f ⌋
Mean period entropy♯ m= k =1
EA(pi )
p n
For this purpose, the approximate entropy of each period is calculated and then averaged.
Coefficient of entropy
The coefficient measures the standardized entropy distribution over all periods. var (EA(pi ))
variation♯ m
Í ⌈n/f ⌉ Í ⌈n/f ⌉ pi · p j
♯ This characteristic describes how similar all periods are. To this end, the cosine similarity of each 2 i=1 j=1;j,i ||p || · ||p ||
Mean cosine similarity i j
pair of periods is calculated. Then, the average similarity is determined.
Í(⌈n/f ⌉ − 1)2 + (⌈n/f ⌉ − 1)
n (e − e 2
This characteristic quantifies how well the seasonal pattern can be approximated by a sinus wave. t =2 t t −1 )
Durbinwatson♯ ÍT
t =1 e t
The Durbin-Watson test is used to check the auto-correlation of the fitter errors. 2
var (R)
Seasonality† This characteristic reflects the strength of the season within the time series. 1−
var (Y )
† This characteristic describes the correlation of the time series with itself to an earlier time. To Íf
Serial correlation n · k =1 r k (Y )2
determine the serial correlation, the Box-Pierce statistics is used.
Íf
Remainder serial corr.† The serial correlation of the remainder. n · k =1 r k (R)2
This characteristic describes how badly the time series can be written as a linear combination of
Non-linearity† R.t. Wang et al. [18, p. 18](∗)
unknown variables or functions.
Remainder non-lin.† The non-linearity of the remainder. R.t. Wang et al. [18, p. 18](∗)
Self-Similarity† This characteristic describes how similar an object is to a part of itself by the Hurst exponent. R.t. Wang et al. [18, p. 20](∗)
Chaos† The chaos calculates the randomness within the time series via the Lyapunov exponent. R.t. Wang et al. [18, p. 20](∗)
2nd freq‡ The second dominant frequency of the time series. –
3rd freq‡ The third dominant frequency of the time series. –
Max spec‡ The maximal spectral value of the spectral density. –
This characteristic reflects how many strong recurring patterns the time series has. To this end,
Num peaks‡ –
the number of peaks in the spectral density that have at least 60% of the maximum value.
Let Y be the time series without trend, S be the season component of the time series, R be the remainder of the time series, r (x)k be the auto-correlation
function with lag k, QR(x) := Q 0.95 (x) − Q 0.05 (x) be the range between the 95% percentile and 5% percentile, pi be the i-th period of Y , EA(x) be the
approximated entropy [14, p.3], et the fitting error at time t.
(∗) We modify the approach and use the time series without trend for the calculation.
⋄ time series information, § statistical measure, ♯ proposed by this work, † proposed by Wang et al. [18], ‡ proposed by Lemke and Gabrys [13]

3.4 Machine Learning Methods additional corrections as described by Quinlan [16]. (iii) Evtree im-
For the forecasting task, we only consider machine learning meth- plements an evolutionary algorithm for learning globally optimal
ods in this paper as statistical methods such as ARIMA can typically classification and regression trees [9]. (iv) NNetar is a feed-forward
only process the time series without additional information. This neural network is trained with lagged values of the time series [10].
means that the extracted features (see Section 3.1) cannot be used (v) Random Forest (RF) uses bagging for generating samples from
by such methods. In addition, ML methods can handle any num- the data set used for learning [2]. (vi) Rpart trains a regression
ber of features. That is, for a possible extension of our approach tree using recursive partitioning, based on the CART algorithm
with external information, these features can be added. The used by Breiman et al. [3]. (vii) Support Vector Regression (SVR) uses the
machine learning methods (see Section 3.1) are listed in the follow- same principles as SVM for classification [8]. (viii) XGBoost uses
ing: (i) Catboost applies gradient boosting of decision trees [15]. (ii) gradient tree boosting where trees are generated sequentially. That
Cubist is a regression model that combines the ideas of M5 with is, each tree is grown with knowledge from the last trained tree [4].

52
SESSION 2: Performance Learning ICPE '20, April 20–24, 2020, Edmonton, AB, Canada

4 EVALUATION method in split), (ii) has on average the lowest forecast accuracy
Before discussing the evaluation, we introduce the used data set degradation in each split (on avg. lowest error in split), and (iii)
in Section 4.1. Then, we explain the methodology and the evalua- is over all time series the best method (total best method). We
tion metrics in Section 4.2. Afterwards, we analyze how well the report the respective percentages in Table 2 showing these three
different machine learning methods perform on the data set. Based observations for the training data and test data for each method.
on this information, we evaluate our recommendation approaches While the distribution of percentages of which method is the best
in Section 4.4. In Section 4.5, we investigate how the diversity of over all time series is almost similar for the training and test data,
the data set is increased by the time series generation. Finally, we the distributions per split differ considerably. While Nnetar was in
compare our forecasting framework with state-of-the-art methods. every split the method achieving the best training forecast accuracy
the most often, it reaches only in 73% of the test data splits the same
4.1 Data Set performance. Cubist had in 55% of the training splits on average the
lowest forecast accuracy degradation. In the test data, Cubist has
To have a sound and broad evaluation of our approach, a highly
in only 17% of the splits on average the lowest forecast accuracy
heterogeneous data set that covers different domains and charac-
degradation.
teristics is required. Indeed, there are numerous data sets available
In a nutshell, we see from these results that the dynamic choice
online: competitions (e.g., NN32 , M33 , and M4), kaggle, R packages,
of the best performing method is a crucial task with significant po-
and many more. Although, for instance, the M4 competition set
tential. Even choosing a method based on straight-forward metrics
contains 100,000 time series, these time series have low frequencies
(for instance, choosing the method which was on average the best
(1, 4, 12, and 24) and short forecasting horizons (6 to 48 data points).
method in the training data) based on the training data may lead to
Further, the median length of a time series is 106. That is, we assume
a bad performance.
that if the data set is used alone, it is not suitable for benchmarking
forecasting methods for all kinds of domains.
To this end, our data set4 consists of 150 real-world and pub- 4.4 Evaluation of the Recommendation
licly available time series. The time series are collected from vari-
As the recommendation of the best suitable method is an essential
ous sources including Wikipedia Project-Counts, Internet Traffic
pillar of our forecasting framework, we examine the recommenda-
Archive, R packages, Kaggle, Datamarket, and many more. Further,
tion performance of our envisioned approaches (see Section 3.2.1).
the data set reflects different use cases, e.g., Internet accesses, sales
To have a ground truth for the competition, we define the following
volume, etc. Moreover, our data set covers the same frequencies
three method selecting strategies: (i) Selecting the best method for
as the M4 competition and additional frequencies (7, 48, 52, 60, 96,
each time series a-posteriori S ∗ . (ii) Selecting the method which had
144, 168, 365, 2160, and 6480). Further, our forecast horizons range
the lowest average forecast accuracy degradation in each training
from 8 to 7,304 data points, and the median length is 595.
split S L . (iii) Selecting the method, which was most often the best
method in each training split S B . Note that, based on our analysis
4.2 Evaluation Methodology
in Section 4.3, the method Nnetar will be chosen.
To evaluate our approach, we divide the original data set into 100 The results of the comparison between these six methods are
training time series and 50 validation time series. To avoid an ar- presented in Table 3. For each approach/strategy, this table lists the
bitrary split, we divide the data set in 100 unique splits. In other median, average and standard deviation of the accuracy degradation
words, we train and evaluate our approach on 100 different time ϑ over all 100 splits.
series train and test sets. We also made sure that all time series are The best values are shown by S ∗ . Indeed, this result is not surpris-
spread across all splits. ing as this strategy has a-posteriori knowledge. Thus, this method
As described in Section 3.2.1, our approach expands for each has the role of showing the theoretically best possible values. In
split the size of the training set to have a sound training set for the other words, S ∗ is the base-line for the recommendation. Conse-
recommendation. That is, our approach uses in each division the quently, only five methods remain for a fair competition. In terms
100 time series for the generation of new time series. In contrast to of the average forecast accuracy degradation, the regression-based
the description of the approach, we restrict the approach to use only approaches (AH being on average 15.9% worse than always choos-
10,000 instead of the roughly 1,000,000 time series. More precisely, ing the best method and AR with a value of 1.172) outperform the
the training data in each split contains the original 100 time series remaining approaches/strategies. While taking also the median and
and 9,900 new time series. the standard deviation of the forecast accuracy degradation into
account, it can be seen that the meta-learning layer of AH is able
4.3 Machine Learning Method Analysis to improve the performance of AR in all measures of the forecast
For reference, we investigate how each of the chosen machine accuracy degradation. The worst forecast accuracy degradation
learning methods performs in the forecasting process on the data is shown S L followed by AC . In contrast, AC exhibits the lowest
set, without recommendation, i.e., changing the method depending median followed by the regression-based approaches. The worst
on the input time series. To this end, we observe for each method median is shown by S B . While observing the standard deviation of
how often the method (i) is the best method in each split (best the forecast accuracy degradation, S L , AC , AR exhibit high values.
2 NN3 The lowest value is shown by AH .
competition: http://www.neural-forecasting-competition.com/NN3/
3 M3 competition: https://forecasters.org/resources/time-series-data/m3-competition/ The median and mean values can be better understood if the
4 Time series data set available at https://zenodo.org/record/3508552 distribution of the ranking of the recommended methods is taken

53
SESSION 2: Performance Learning ICPE '20, April 20–24, 2020, Edmonton, AB, Canada

Table 2: Investigation of the forecast performance of the different machine learning methods.

Catboost Cubist Evtree Nnetar RF Rpart SVR XGBoost

Best method in split (0% / 21%) (0% / 3%) (0% / 0%) (100% / 73%) (0% / 0%) (0% / 0%) (0% / 3%) (0% / 0%)
(Train / Test) On avg. lowest error in split (0% / 11%) (55% / 17%) (0% / 2%) (0% / 5%) (0% / 5%) (45% / 43%) (0% / 0%) (0% / 17%)
Total best method (7.2% / 18.3%) (13.5% / 13.1%) (5.9% / 9.2%) (38.1% / 23.9%) (4.5% / 4.2%) (12.3% / 8.1%) (9.2% /16.8%) (9.0% / 5.3%)

Table 3: Comparison of the recommendation methods. time series. Then, we normalize with a min-max-scaling for each
time series characteristic the data between 0 and 1 for a comparable
S∗ SL SB AC AR AH analysis. On top of this, we depict each characteristic in a spider
chart (see Figure 4). In this diagram, the maximal value of new data
Avg. ϑ 1.000 1.409 1.235 1.249 1.172 1.159
(grey) and original data (purple), and the minimum values of the
Median ϑ 1.000 1.045 1.076 1.016 1.035 1.032
new data (green) and original data (blue) are shown. Each edge of
SD ϑ 0.000 3.674 0.427 2.458 1.382 0.382
this chart represents a time series characteristics. For almost all
characteristics, the new generated time series expand the spectrum
into account. Figure 3 shows the distribution of the rankings. The of the data both in terms of the maximum value and minimum
ranks of S L are almost equally distributed. S B selects almost either value.
the best or the worst method. More precisely, it recommends in
51.4% of the time series the worst method. For all recommenda-
Max new Max orginal Min orginal Min new
tion approaches, the distribution of ranks two to five drops. The num peaks Length Frequency
1
regression-based approaches select in more than 30% the best or max spec Standard deviation

second-best method. However, choosing the worst method is al- 3rd freq 0.8 Remainder SD

most as likely as choosing the best method. In contrast to all other 2nd freq 0.6 Proportion remainder
methods, AC chooses with more than 50% the best, second, or third
0.4
best method, but also has almost 25% of choosing the worst method. Durbinwatson Proportion season
In fact, none of the methods show a proper distribution, which 0.2
Remainder kurtosis Mean period entropy
decreases with increasing rank. 0

Remainder skewness Coefficient of entropy

0.5
0.3
SL

Remainder non-linearity Mean cosine similarity

0.1
0.5
0.3 Remainder Serial correlation Seasonality
SB

0.1 Chaos Serial correlation

Probability

0.5 Self-Similarity Non-linearity

0.3 Kurtosis Skewness
AC

0.1
0.5 Figure 4: Time series generation result.
0.3
AR

0.1
0.5
0.3
AH

0.1
4.6 Evaluation of Forecast Accuracy
2 4 6 8 To investigate how well our forecasting framework performs, we
Ranks compare the forecasting error (i.e., MAPE) of our approach with
three state-of-the-art approaches that are briefly described in the
Figure 3: Distribution of the rankings. following: ETS [12] is a statistical method and builds an exponential
smoothing state space model consisting of trend, season, and error.
Each component can be combined in an additive or multiplicative
4.5 Evaluating the Time Series Generation manner, or it may be skipped. tBATs [7] extends ETS using a trigono-
One central problem of machine learning is the inherent limitation metric representation based on Fourier series for the season and an
to predict only what has been learned during the training phase. ARMA model for the error. Further, the data is transformed with a
In other words, machine learning methods have a limited ability Box-Cox transformation. sARIMA [11] determines the orders of the
for extrapolation. This also holds true for our recommendation. autoregressive model, the moving average model, and the differenti-
Consequently, we try to consider as many time series with differ- ation. sARIMA models one seasonal pattern, and each non-seasonal
ent characteristics as possible to improve the recommendation for component of the ARIMA model is extended with its seasonal coun-
unknown time series. Thus, we analyze in this section how the terpart. Table 4 lists the average, median, standard deviation of the
new time series generation affects the diversity of the time series forecast error for all 100 splits. Each of our approaches exhibits a
characteristics. To this end, we collect for each time series charac- lower average MAPE and standard deviation than the state-of-the-
teristic the values from the original data and the new generated art methods. The worst average MAPE (56.96%) is achieved by ETS.

54
SESSION 2: Performance Learning ICPE '20, April 20–24, 2020, Edmonton, AB, Canada

In contrast, tBATS has the lowest median MAPE (10.83%) followed forecasts. For the recommendation of the best-suited method, we
by AC (12.31%) while ETS again shows the highest median error. To introduce three different approaches, and in addition to time series
sum up, our approaches are equally accurate in terms of the median characteristics from the literature, we propose our own character-
forecast error, but having a lower average and standard deviation istics. In an extensive evaluation, we compare the three proposed
forecast error than the state-of-the-art methods. recommendation approaches, the impact of time series generation,
and compare the forecasting framework with state-of-the-art meth-
Table 4: Comparison of the forecast error.
ods. Although the proposed recommendation approaches perform
equally good, our approach achieves the best forecasting accuracy
MAPE AC AR AH ETS tBATS sARIMA in comparison with the state-of-the-art techniques.
Avg. 24.40 23.26 23.68 56.96 36.28 28.12
Median 12.31 13.07 13.18 14.47 10.83 13.00 ACKNOWLEDGEMENTS
SD 50.31 40.41 38.52 136.22 98.68 64.72 This work was co-funded by the German Research Foundation
(DFG) under grant No. (KO 3445/11-1) and the IHK (Industrie- und
Handelskammer) Würzburg-Schweinfurt.
5 RELATED WORK
REFERENCES
To face the "No-Free-Lunch Theorem", i.e., minimizing the variance [1] Ratnadip Adhikari and R. K. Agrawal. 2013. An Introductory Study on Time
of monolithic forecasting methods, many hybrid mechanisms and Series Modeling and Forecasting. CoRR abs/1302.6613 (2013).
forecast recommendation systems have been developed. The first [2] Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5–32.
[3] Leo Breiman, Joseph H Friedman, R. A. Olshen, and C. J. Stone. 1983. Classification
idea of selecting a forecasting method based on rules was intro- and Regression Trees.
duced by Collopy and Armstrong in 1992 [6]. In their work, they [4] Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system.
In ACM SIGKDD 2016. ACM, 785–794.
manually created an expert system. The rules based on 18 time [5] Robert B Cleveland, William S Cleveland, Jean E McRae, and Irma Terpenning.
series characteristics and include four methods. However, this rule 1990. STL: A seasonal-trend decomposition procedure based on loess. Journal of
set was created by human experts, and each modification requires Official Statistics 6, 1 (1990), 3–73.
[6] Fred Collopy and J Scott Armstrong. 1992. Rule-based forecasting: Develop-
human interaction. In 2009, Wang et al. introduced two approaches ment and validation of an expert systems approach to combining time series
for forecasting method recommendation [18]. Firstly, they pro- extrapolations. Management Science 38, 10 (1992), 1394–1414.
pose hierarchical clustering and self-organizing maps; secondly, [7] Alysha M De Livera, Rob J Hyndman, and Ralph D Snyder. 2011. Forecasting time
series with complex seasonal patterns using exponential smoothing. J. Amer.
a decision tree technique is applied. The generate rules based on Statist. Assoc. 106, 496 (2011), 1513–1527.
13 time series characteristics and covers four methods. Unfortu- [8] Harris Drucker, Christopher JC Burges, Linda Kaufman, Alex J Smola, and
Vladimir Vapnik. 1997. Support vector regression machines. In Advances in
nately, the proposed rules were not evaluated. In 2010, Lemke and neural information processing systems. 155–161.
Gabrys investigated the applicability of different meta-learning [9] Thomas Grubinger, Achim Zeileis, and Karl-Peter Pfeiffer. 2014. evtree: Evolu-
approaches [13]. In their work, they use 17 time series and six error tionary Learning of Globally Optimal Classification and Regression Trees in R.
Journal of Statistical Software, Articles 61, 1 (2014), 1–29.
characteristics while using eighth methods and seven combina- [10] Rob Hyndman, George Athanasopoulos, Christoph Bergmeir, Gabriel Caceres,
tion approaches. In 2018, Talagala et al. propose in a techpaper a Leanne Chhay, Mitchell O’Hara-Wild, Fotios Petropoulos, Slava Razbash, Earo
feature-based forecast-model selection [17]. To this end, they simu- Wang, and Farah Yasmeen. 2018. forecast: Forecasting functions for time series and
linear models. http://pkg.robjhyndman.com/forecast R package version 8.4.
late time series that are generated by fitting exponential smoothing [11] Rob J Hyndman and George Athanasopoulos. 2014. Forecasting: principles and
and ARIMA models to the original data. A random forest classifier practice. OTexts, Melbourne, Australia.
[12] Rob J Hyndman, Anne B Koehler, Ralph D Snyder, and Simone Grose. 2002. A
is then used to map 25 to 30 time series characteristics (depending state space framework for automatic forecasting using exponential smoothing
on the time series) to the best forecast method. In their work, they methods. International Journal of forecasting 18, 3 (2002), 439–454.
consider seven methods. As the work of Wang et al. [18] were not [13] Christiane Lemke and Bogdan Gabrys. 2010. Meta-learning for time series
forecasting and forecast combination. Neurocomputing 73, 10-12 (2010), 2006–
evaluated, Züfle et al. investigate and compare these rules to two 2016.
proposed dynamic recommendation algorithms [20]. [14] Steven M Pincus, Igor M Gladstone, and Richard A Ehrenkranz. 1991. A regularity
In contrast to the related work that only introduce the selection of statistic for medical data analysis. Journal of clinical monitoring 7, 4 (1991), 335–
345.
the best forecasting method, we propose an overarching framework [15] Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Doro-
that combines the selection of the best method and the forecast itself. gush, and Andrey Gulin. 2018. CatBoost: unbiased boosting with categorical
features. In Advances in Neural Information Processing Systems. 6638–6648.
While the aforementioned works use solely statistical methods, the [16] J Ross Quinlan. 1993. Combining instance-based and model-based learning. In
focus in this work lies in machine learning-based regressor methods. Proceedings of the tenth international conference on machine learning. 236–243.
Further, for the evaluation, we use a highly diverse data set. Further, [17] Priyanga Talagala, Rob Hyndman, George Athanasopoulos, et al. 2018. Meta-
learning how to forecast time series. Technical Report. Monash University, De-
our selection mechanism creates also new time series by combining partment of Econometrics and Business Statistics.
actual time series to increase the diversity of the data set. [18] Xiaozhe Wang, Kate Smith-Miles, and Rob Hyndman. 2009. Rule induction for
forecasting method selection: Meta-learning the characteristics of univariate
time series. Neurocomputing 72, 10âĂŞ12 (2009), 2581 – 2594.
6 CONCLUSION [19] D. H. Wolpert and W. G. Macready. 1997. No free lunch theorems for optimization.
In this work, we propose an automated forecasting framework that IEEE Transactions on Evolutionary Computation 1, 1 (Apr 1997), 67–82.
[20] Marwin Züfle, André Bauer, Veronika Lesch, Christian Krupitzer, Nikolas Herbst,
(i) extracts characteristics from a given time series, (ii) selects the Samuel Kounev, and Valentin Curtef. 2019. Autonomic Forecasting Method Selec-
best-suited machine learning method based on recommendation, tion: Examination and Ways Ahead. In Proceedings of the 16th IEEE International
Conference on Autonomic Computing (ICAC). IEEE.
and finally, (iii) performs the forecast. Our approach offers the ben-
efit of not relying on a single method with its possibly inaccurate

Amit Konar, Diptendu Bhattacharya-Time-Series Prediction and Applications. A Machine Intelligence Approach-Springer (2017)
No ratings yet
Amit Konar, Diptendu Bhattacharya-Time-Series Prediction and Applications. A Machine Intelligence Approach-Springer (2017)
248 pages
NN5
No ratings yet
NN5
29 pages
Solutions To Chapter 10 Problems
No ratings yet
Solutions To Chapter 10 Problems
40 pages
Ribeiro Volatility 2012
100% (3)
Ribeiro Volatility 2012
16 pages
Paper3 MTP
No ratings yet
Paper3 MTP
43 pages
A New Hybrid Method For Predicting Univariate and Multivariate Time Series Based On Pattern Forecasting
No ratings yet
A New Hybrid Method For Predicting Univariate and Multivariate Time Series Based On Pattern Forecasting
17 pages
Peerj Cs 534
No ratings yet
Peerj Cs 534
29 pages
Wang3 1 PDF
No ratings yet
Wang3 1 PDF
34 pages
20618-Article Text-24631-1-2-20220628
No ratings yet
20618-Article Text-24631-1-2-20220628
9 pages
Autoforecast: Automatic Time-Series Forecasting Model Selection
No ratings yet
Autoforecast: Automatic Time-Series Forecasting Model Selection
10 pages
Algorithms 16 00248 v2
No ratings yet
Algorithms 16 00248 v2
16 pages
Self-Supervised Learning For Fast and Scalable Time Series Hyper-Parameter Tuning
No ratings yet
Self-Supervised Learning For Fast and Scalable Time Series Hyper-Parameter Tuning
10 pages
Smoothness Priors Analysis of Economic and Financial Time Series
No ratings yet
Smoothness Priors Analysis of Economic and Financial Time Series
193 pages
A Comparative Study and Analysis of Time
No ratings yet
A Comparative Study and Analysis of Time
7 pages
MA Daniel Berberich Hybrid Methods For Time Series Forecasting
No ratings yet
MA Daniel Berberich Hybrid Methods For Time Series Forecasting
118 pages
Meta-Learning How To Forecast Time Series
No ratings yet
Meta-Learning How To Forecast Time Series
38 pages
Distributed ARIMA Models For Ultra-Long Time Series
No ratings yet
Distributed ARIMA Models For Ultra-Long Time Series
44 pages
DeepAR - Probabilistic Forecasting With Autoregressive Recurrent Networks
No ratings yet
DeepAR - Probabilistic Forecasting With Autoregressive Recurrent Networks
11 pages
Natural Time Series Parameters Forecasting
No ratings yet
Natural Time Series Parameters Forecasting
19 pages
s3950476 TimeSeriesAnalysis Assignment 3
No ratings yet
s3950476 TimeSeriesAnalysis Assignment 3
13 pages
MixMamba Time Series Modeling With Adaptive Expertise
No ratings yet
MixMamba Time Series Modeling With Adaptive Expertise
13 pages
Double Level Optimal Fuzzy Association Rules Prediction - 2024 - Expert Systems
No ratings yet
Double Level Optimal Fuzzy Association Rules Prediction - 2024 - Expert Systems
19 pages
Deep Learning For Time Series Forecasting: A Survey
No ratings yet
Deep Learning For Time Series Forecasting: A Survey
34 pages
An Artificial Neural Network P D Q Model For Times
No ratings yet
An Artificial Neural Network P D Q Model For Times
12 pages
Deep Learning and Optimisation For Quality of Service Modelling
No ratings yet
Deep Learning and Optimisation For Quality of Service Modelling
10 pages
Puspita 2019 J. Phys. Conf. Ser. 1196 012073
No ratings yet
Puspita 2019 J. Phys. Conf. Ser. 1196 012073
8 pages
Change Point Detection in Time Series Data With Random Forests
No ratings yet
Change Point Detection in Time Series Data With Random Forests
13 pages
Modeling and Forecasting of Rice Production in Some Major States of India Using ARIMA
No ratings yet
Modeling and Forecasting of Rice Production in Some Major States of India Using ARIMA
11 pages
1.6 Machine Learning For Time Series Analysis and Forecasting
No ratings yet
1.6 Machine Learning For Time Series Analysis and Forecasting
54 pages
Algorithms: Variable Selection in Time Series Forecasting Using Random Forests
No ratings yet
Algorithms: Variable Selection in Time Series Forecasting Using Random Forests
25 pages
LSTM Paper
No ratings yet
LSTM Paper
10 pages
Foundations of Scheduling Algorithms: Definitive Reference for Developers and Engineers
From Everand
Foundations of Scheduling Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Gratis
No ratings yet
Gratis
38 pages
Enhancing Time Series Forecasting Accuracy With Deep Learning Models: A Comparative Study
No ratings yet
Enhancing Time Series Forecasting Accuracy With Deep Learning Models: A Comparative Study
10 pages
The Realization of A Type of Supermarket Sales Forecast Model & System
No ratings yet
The Realization of A Type of Supermarket Sales Forecast Model & System
6 pages
TSA Chapter 1
No ratings yet
TSA Chapter 1
2 pages
A Hybrid Time Series Forecasting Method Based On Neutrosophic Log
No ratings yet
A Hybrid Time Series Forecasting Method Based On Neutrosophic Log
23 pages
FreDo - Frequency Domain-Based Long-Term Time Series Forecasting
No ratings yet
FreDo - Frequency Domain-Based Long-Term Time Series Forecasting
12 pages
TSA Chapters 1: Introduction To Time Series
No ratings yet
TSA Chapters 1: Introduction To Time Series
4 pages
Deep Learning Models For Time Series Forecasting A Review
No ratings yet
Deep Learning Models For Time Series Forecasting A Review
22 pages
Rojas-Time Series Analysis and Forecasting-Book16
100% (1)
Rojas-Time Series Analysis and Forecasting-Book16
384 pages
Business Forecasting Methods
No ratings yet
Business Forecasting Methods
5 pages
Schedule-Free Optimization of The Transformers-Based Time Series Forecasting Model
No ratings yet
Schedule-Free Optimization of The Transformers-Based Time Series Forecasting Model
10 pages
Timeseries Paper
No ratings yet
Timeseries Paper
1 page
Springer Lecture Notes in Computer Science
No ratings yet
Springer Lecture Notes in Computer Science
16 pages
1 s2.0 S0098135424001583 Main
No ratings yet
1 s2.0 S0098135424001583 Main
26 pages
陌陌陌陌莫迪
No ratings yet
陌陌陌陌莫迪
9 pages
Chandrasekaran, R., & Paramasivan, S. K. (2022) - A State-Of-The-Art Review of Time Series Forecasting Using Deep Learning Approaches.
No ratings yet
Chandrasekaran, R., & Paramasivan, S. K. (2022) - A State-Of-The-Art Review of Time Series Forecasting Using Deep Learning Approaches.
14 pages
A Joint Time-Frequency Domain Transformer For Multivariate Time Series Forecasting
No ratings yet
A Joint Time-Frequency Domain Transformer For Multivariate Time Series Forecasting
33 pages
Card: C A R B T - T S F: Hannel Ligned Obust Lend Rans Former For IME Eries Orecasting
No ratings yet
Card: C A R B T - T S F: Hannel Ligned Obust Lend Rans Former For IME Eries Orecasting
39 pages
Artificial Neural Networks in Time Series Forecasting: A Comparative Analysis
No ratings yet
Artificial Neural Networks in Time Series Forecasting: A Comparative Analysis
21 pages
A Review of Deep Learning Models For Time Series Prediction
No ratings yet
A Review of Deep Learning Models For Time Series Prediction
16 pages
A Comparison Between Machine and Deep Learning Models On High Stationarity Data
No ratings yet
A Comparison Between Machine and Deep Learning Models On High Stationarity Data
11 pages
Approaches and Applications of Early Classification
No ratings yet
Approaches and Applications of Early Classification
15 pages
Time Series
100% (1)
Time Series
91 pages
Neurocomputing: Ratnadip Adhikari
No ratings yet
Neurocomputing: Ratnadip Adhikari
12 pages
DLBDSTSA01 Course Book Time Series Analysis
No ratings yet
DLBDSTSA01 Course Book Time Series Analysis
244 pages
Advanced Multivariate Time Series Forecasting Mode
No ratings yet
Advanced Multivariate Time Series Forecasting Mode
8 pages
Introduction To Power Consumption Forecasting
No ratings yet
Introduction To Power Consumption Forecasting
15 pages
Journal of Computer Science Research - Vol.5, Iss.2 January 2023
No ratings yet
Journal of Computer Science Research - Vol.5, Iss.2 January 2023
82 pages
Ch07 - Forecast
No ratings yet
Ch07 - Forecast
20 pages
The Art of Controller Design
From Everand
The Art of Controller Design
Martin Braae
No ratings yet
MCM Assignment 2
No ratings yet
MCM Assignment 2
4 pages
Artificial Intelligence in Predicting Mechanical Properties of Composites
No ratings yet
Artificial Intelligence in Predicting Mechanical Properties of Composites
36 pages
Theory of Elasticity and Plasticity
No ratings yet
Theory of Elasticity and Plasticity
18 pages
Module 3 No Solutions
No ratings yet
Module 3 No Solutions
16 pages
Signature Redacted - Signature Redacted Signature Redacted
No ratings yet
Signature Redacted - Signature Redacted Signature Redacted
70 pages
Demand Forecasting at
No ratings yet
Demand Forecasting at
15 pages
Valid Benchmarks From Published Surveys of Forecast Accuracy - Foresight11
No ratings yet
Valid Benchmarks From Published Surveys of Forecast Accuracy - Foresight11
10 pages
The Research On Demand Forecasting of Supply Chain Based On ICCELMAN
No ratings yet
The Research On Demand Forecasting of Supply Chain Based On ICCELMAN
6 pages
Strategies Price Formation
No ratings yet
Strategies Price Formation
29 pages
AP7101-Advanced Digital Signal Processing
50% (2)
AP7101-Advanced Digital Signal Processing
8 pages
R08 Multiple Regression and Machine Learning
No ratings yet
R08 Multiple Regression and Machine Learning
24 pages
ARIMA AR MA ARMA Models
No ratings yet
ARIMA AR MA ARMA Models
46 pages
Anderson Distribution of The Correlation Coefficient
No ratings yet
Anderson Distribution of The Correlation Coefficient
14 pages
Brief Introduction To NLOGIT
No ratings yet
Brief Introduction To NLOGIT
26 pages
SPE-196861-MS Understanding Well Events With Machine Learning
No ratings yet
SPE-196861-MS Understanding Well Events With Machine Learning
12 pages
Cec332 Advanced Digital Signal Processing L T P C
No ratings yet
Cec332 Advanced Digital Signal Processing L T P C
1 page
Term Paper On FDI
67% (3)
Term Paper On FDI
42 pages
Time Series Analysis in Python With Statsmodels
No ratings yet
Time Series Analysis in Python With Statsmodels
8 pages
Solution Basic Econometrics
No ratings yet
Solution Basic Econometrics
10 pages
1 Overview of Areal Data Analysis
No ratings yet
1 Overview of Areal Data Analysis
6 pages
MLS PDF
No ratings yet
MLS PDF
3 pages
Econometric S
No ratings yet
Econometric S
26 pages
A Multivariate GARCH Model of International Transmissions of Stock Returns and Volatility
No ratings yet
A Multivariate GARCH Model of International Transmissions of Stock Returns and Volatility
16 pages
ECON2228 Notes 9: Christopher F Baum
No ratings yet
ECON2228 Notes 9: Christopher F Baum
50 pages
Stata Excel
No ratings yet
Stata Excel
25 pages
Applied Signal and Image Processing
No ratings yet
Applied Signal and Image Processing
207 pages
5 MSC App Stat III 0 IV Sem Syllabus 5 Units
No ratings yet
5 MSC App Stat III 0 IV Sem Syllabus 5 Units
29 pages
Relationship Between Math Teachers' Instructional Styles and Their Educational Philosophical Backgrounds
No ratings yet
Relationship Between Math Teachers' Instructional Styles and Their Educational Philosophical Backgrounds
15 pages
Arima Modeling With R Listendata
No ratings yet
Arima Modeling With R Listendata
12 pages
ADSP
No ratings yet
ADSP
2 pages
Minerals Engineering: C. Carrasco, L. Keeney, T.J. Napier-Munn, D. François-Bongarçon
No ratings yet
Minerals Engineering: C. Carrasco, L. Keeney, T.J. Napier-Munn, D. François-Bongarçon
7 pages
SAS Introduction To Time Series Forecasting-Libre
No ratings yet
SAS Introduction To Time Series Forecasting-Libre
34 pages
Sta 445 1 Stationarity and Non-Stationarity
No ratings yet
Sta 445 1 Stationarity and Non-Stationarity
15 pages
(25 - 36) Lela Ramadhani
No ratings yet
(25 - 36) Lela Ramadhani
12 pages
Auto Correlation and Partial Correlation
No ratings yet
Auto Correlation and Partial Correlation
2 pages
ESG - Indian Market
No ratings yet
ESG - Indian Market
6 pages

An Automated Forecasting Framework Based On Method Recommendation For Seasonal Time Series

Uploaded by

An Automated Forecasting Framework Based On Method Recommendation For Seasonal Time Series

Uploaded by

SESSION 2: Performance Learning ICPE '20, April 20–24, 2020, Edmonton, AB, Canada

An Automated Forecasting Framework based on Method

ABSTRACT forecasting into the decision-making process allows to proactively

Fourier Series Fourier Terms Add Trend

Detrended Time Recommend Machine Learning

Figure 1: Overview of forecasting process of our framework.

Time Series Creating New Machine Learning Method

Figure 2: Overview of the recommendation process of our forecasting framework.

Table 1: Overview of the considered time series characteristics

Characteristic Description Formula

Catboost Cubist Evtree Nnetar RF Rpart SVR XGBoost

Remainder skewness Coefficient of entropy

Remainder non-linearity Mean cosine similarity

0.1 Chaos Serial correlation

0.5 Self-Similarity Non-linearity

You might also like