0% found this document useful (0 votes)
47 views10 pages

Bagging Exponential Smoothing Methods Using STL Decomposition and Box-Cox Transformation

Uploaded by

jewel.plus.sic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views10 pages

Bagging Exponential Smoothing Methods Using STL Decomposition and Box-Cox Transformation

Uploaded by

jewel.plus.sic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

International Journal of Forecasting 32 (2016) 303–312

Contents lists available at ScienceDirect

International Journal of Forecasting


journal homepage: www.elsevier.com/locate/ijforecast

Bagging exponential smoothing methods using STL


decomposition and Box–Cox transformation
Christoph Bergmeir a,∗ , Rob J. Hyndman b , José M. Benítez c
a
Faculty of Information Technology, Monash University, Melbourne, Australia
b
Department of Econometrics & Business Statistics, Monash University, Melbourne, Australia
c
Department of Computer Science and Artificial Intelligence, E.T.S. de Ingenierías Informática y de Telecomunicación, University of
Granada, Spain

article info abstract


Keywords: Exponential smoothing is one of the most popular forecasting methods. We present a
Bagging
technique for the bootstrap aggregation (bagging) of exponential smoothing methods,
Bootstrapping
which results in significant improvements in the forecasts. The bagging uses a Box–Cox
Exponential smoothing
STL decomposition transformation followed by an STL decomposition to separate the time series into the trend,
seasonal part, and remainder. The remainder is then bootstrapped using a moving block
bootstrap, and a new series is assembled using this bootstrapped remainder. An ensemble
of exponential smoothing models is then estimated on the bootstrapped series, and the
resulting point forecasts are combined. We evaluate this new method on the M3 data set,
and show that it outperforms the original exponential smoothing models consistently. On
the monthly data, we achieve better results than any of the original M3 participants.
© 2015 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.

1. Introduction Exponential smoothing methods obtained competitive


results in the M3 forecasting competition (Koning, Franses,
After more than 50 years of widespread use, exponen- Hibon, & Stekler, 2005; Makridakis & Hibon, 2000), and
tial smoothing is still one of the most practically rele- the forecast package (Hyndman, 2014; Hyndman &
vant forecasting methods available (Goodwin, 2010). This Khandakar, 2008) in the programming language R (R Core
is because of its simplicity and transparency, as well as its Team, 2014) means that a fully automated software for
ability to adapt to many different situations. It also has fitting ETS models is available. Thus, ETS models are
a solid theoretical foundation in ETS state space models both usable and highly relevant in practice, and have a
(Hyndman & Athanasopoulos, 2013; Hyndman, Koehler, solid theoretical foundation, which makes any attempts to
Ord, & Snyder, 2008; Hyndman, Koehler, Snyder, & Grose, improve their forecast accuracy a worthwhile endeavour.
2002). Here, the acronym ETS stands both for ExponenTial Bootstrap aggregating (bagging), as proposed by
Smoothing and for Error, Trend, and Seasonality, which are Breiman (1996), is a popular method in machine learning
the three components that define a model within the ETS for improving the accuracy of predictors (Hastie, Tibshi-
family. rani, & Friedman, 2009) by addressing potential instabil-
ities. These instabilities typically stem from sources such
as data uncertainty, parameter uncertainty, and model se-
∗ Correspondence to: Faculty of Information Technology, P.O. Box 63 lection uncertainty. An ensemble of predictors is estimated
Monash University, Victoria 3800, Australia. Tel.: +61 3 990 59555. on bootstrapped versions of the input data, and the out-
E-mail address: [email protected] (C. Bergmeir). put of the ensemble is calculated by combining (using
http://dx.doi.org/10.1016/j.ijforecast.2015.07.002
0169-2070/© 2015 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
304 C. Bergmeir et al. / International Journal of Forecasting 32 (2016) 303–312

the median, mean, trimmed mean, or weighted mean, for example, uses a weighted moving average with weights
example), often yielding better point predictions. In this that decrease exponentially.
work, we propose a bagging methodology for exponen- Starting from this basic idea, exponential smoothing
tial smoothing methods, and evaluate it on the M3 data. has been expanded to the modelling of different com-
As our input data are non-stationary time series, both se- ponents of a series, such as the trend, seasonality, and
rial dependence and non-stationarity have to be taken into remainder components, where the trend captures the
account. We resolve these issues by applying a seasonal- long-term direction of the series, the seasonal part cap-
trend decomposition based on loess (STL, Cleveland, Cleve- tures repeating components of a series with a known
land, McRae, & Terpenning, 1990) and a moving block periodicity, and the remainder captures unpredictable
bootstrap (MBB, see, e.g., Lahiri, 2003) to the residuals of components. The trend component is a combination
the decomposition. of a level term and a growth term. For example, the
Specifically, our proposed method of bagging is as Holt–Winters purely additive model (i.e., with additive
follows. After applying a Box–Cox transformation to trend and additive seasonality) is defined by the following
the data, the series is decomposed into trend, seasonal recursive equations:
and remainder components. The remainder component
ℓt = α(yt − st −m ) + (1 − α)(ℓt −1 + bt −1 )
is then bootstrapped using the MBB, the trend and
seasonal components are added back in, and the Box–Cox bt = β ∗ (ℓt − ℓt −1 ) + (1 − β ∗ )bt −1
transformation is inverted. In this way, we generate a st = γ (yt − ℓt −1 − bt −1 ) + (1 − γ )st −m
random pool of similar bootstrapped time series. For each ŷt +h|t = ℓt + hbt + st −m+h+ .
of these bootstrapped time series, we choose a model from m

among several exponential smoothing models, using the Here, ℓt denotes the series level at time t, bt denotes the
bias-corrected AIC. Then, point forecasts are calculated slope at time t, st denotes the seasonal component of the
using each of the different models, and the resulting series at time t, and m denotes the number of seasons in a
forecasts are combined using the median. year. The constants α , β ∗ , and γ are smoothing parameters
The only related work that we are aware of is the
 the [0, 1]-interval,
in h is the forecast horizon, and h+ m =
study by Cordeiro and Neves (2009), who use a sieve (h − 1) mod m + 1.

bootstrap to perform bagging with ETS models. They use There is a whole family of ETS models, which can be
ETS to decompose the data, then fit an AR model to distinguished by the type of error, trend, and seasonality
the residuals, and generate new residuals from this AR each uses. In general, the trend can be non-existent, ad-
process. Finally, they fit the ETS model that was used for ditive, multiplicative, damped additive, or damped mul-
the decomposition to all of the bootstrapped series. They tiplicative. The seasonality can be non-existent, additive,
also test their method on the M3 dataset, and have some or multiplicative. The error can be additive or multiplica-
success for quarterly and monthly data, but overall, the tive; however, distinguishing between these two options is
results are not promising. In fact, the bagged forecasts only relevant for prediction intervals, not point forecasts.
are often not as good as the original forecasts applied Thus, there are a total of 30 models with different combi-
to the original time series. Our bootstrapping procedure nations of error, trend and seasonality. The different com-
works differently, and yields better results. We use STL binations of trend and seasonality are shown in Table 1.
for the time series decomposition, MBB to bootstrap the For more detailed descriptions, we refer to Hyndman and
remainder, and choose an ETS model for each bootstrapped Athanasopoulos (2013), Hyndman et al. (2008), and Hynd-
series. Using this procedure, we are able to outperform the man et al. (2002).
original M3 methods for monthly data in particular. In R, exponential smoothing is implemented in the
The rest of the paper is organized as follows. In ets function from the forecast package (Hyndman,
Section 2, we discuss the proposed methodology in detail. 2014; Hyndman & Khandakar, 2008). The different models
Section 3 presents the experimental setup and the results, are fitted to the data automatically; i.e., the smoothing
and Section 4 concludes the paper. parameters and initial conditions are optimized using
maximum likelihood with a simplex optimizer (Nelder &
2. Methods Mead, 1965). Then, the best model is chosen using the bias-
corrected AIC. We note that, of the 30 possible models, 11
In this section, we provide a detailed description of can lead to numerical instabilities, and are therefore not
the different parts of our proposed methodology, namely used by the ets function (see Hyndman & Athanasopoulos,
exponential smoothing, and the novel bootstrapping pro- 2013, Section 7.7, for details). Thus, ets, as it is used within
cedure involving a Box–Cox transformation, STL decompo- our bagging procedure, chooses from among 19 different
sition, and the MBB. We illustrate the steps using series models.
M495 from the M3 dataset, which is a monthly series.
2.2. The Box–Cox transformation
2.1. Exponential smoothing
This is a popular transformation for stabilizing the
variance of a time series, and was originally proposed by
The general idea of exponential smoothing is that recent
Box and Cox (1964). It is defined as follows:
observations are more relevant for forecasting than older
log(yt ), λ = 0;

observations, meaning that they should be weighted more
wt =
highly. Accordingly, simple exponential smoothing, for (yλt − 1)/λ, λ ̸= 0.
C. Bergmeir et al. / International Journal of Forecasting 32 (2016) 303–312 305

Table 1
The ETS model family, with different types of seasonality and trend.
Trend component Seasonal component
N (None) A (Additive) M (Multiplicative)

N (None) N, N N, A N, M
A (Additive) A, N A, A A, M
Ad (Additive damped) Ad , N Ad , A Ad , M
M (Multiplicative) M, N M, A M, M
Md (Multiplicative damped) Md , N Md , A Md , M

Depending on the parameter λ, the transformation is (v) deseasonalizing the original series, using the sea-
essentially the identity (λ = 1), the logarithm (λ = 0), sonal component calculated in the previous steps; and
or a transformation somewhere between. One difficulty is (vi) smoothing the deseasonalized series to get the trend
the method of choosing the parameter λ. In this work, we component. In R, the STL algorithm is available through the
restrict it to lie in the interval [0, 1], then use the method of stl function. We use it with its default parameters. The
Guerrero (1993) to choose its value in the following way. degrees for the loess fitting are d = 1 in steps (iii) and (iv),
The series is divided into subseries of a length equal to and d = 0 in step (ii). Fig. 2 shows the STL decomposition
the seasonality, or of length two if the series is not seasonal. of series M495 from the M3 dataset, as an example.
Then, the sample mean m and standard deviation s are Another possibility for decomposition is to use ETS
calculated for each of the subseries, and λ is chosen in such modelling directly, as was proposed by Cordeiro and Neves
a way that the coefficient of variation of s/m(1−λ) across the (2009). However, the components of an ETS model are
subseries is minimized. defined based on the noise terms, and evolve dynamically
For the example time series M495, this method gives with the noise. Thus, ‘‘simulating’’ an ETS process by
λ = 6.61 × 10−5 . Fig. 1 shows the original series and the decoupling the level, trend and seasonal components from
Box–Cox transformed version using this λ. the noise and treating them as independent series may
not work well. This is in contrast to an STL decomposition,
in which the trend and seasonal components are smooth
2.3. Time series decomposition and the way in which they change over time does not
depend on the noise component directly. Therefore, we
For non-seasonal time series, we use the loess method can simulate the noise term independently in an STL
(Cleveland, Grosse, & Shyu, 1992), a smoothing method decomposition using bootstrapping procedures.
based on local regressions, to decompose the time series
into trend and remainder components. For seasonal time
2.4. Bootstrapping the remainder
series, we use STL, as presented by Cleveland et al. (1990),
to obtain the trend, seasonal and remainder components.
As time series data are typically autocorrelated, adapted
In loess, a neighborhood is defined for each data versions of the bootstrap exist (see Gonçalves & Politis,
point, and the points in that neighborhood are then 2011; Lahiri, 2003). One prerequisite is the stationarity
weighted (using so-called neighborhood weights) according of the series, which we achieve by bootstrapping the
to their distances from the respective data point. Finally, remainder of the STL (or loess) decomposition.
a polynomial of degree d is fitted to these points. Usually, In the MBB, as originally proposed by Künsch (1989),
d = 1 and d = 2 are used, i.e., linear or quadratic curves data blocks of equal size are drawn from the series until
are fitted. The trend component is equal to the value of the desired series length is achieved. For a series of length
the polynomial at each data point. In R, loess smoothing n, with a block size of l, n − l + 1 (overlapping) possible
is available through the function loess. For the non- blocks exist.
seasonal data in our experiments, i.e., the yearly data from We use block sizes of l = 8 for yearly and quarterly
the M3 competition, we use the function with a degree of data, and l = 24 for monthly data, i.e., at least two full
d = 1. In this function, the neighborhood size is defined by years, to ensure that any remaining seasonality is captured.
a parameter α , which is the proportion of the overall points As the shortest series for the yearly data has a total of
to include in the neighborhood, with tricubic weighting. To n = 14 observations, care must be taken to ensure that
get a constant neighborhood of six data points, we define every value from the original series could possibly be
this parameter to be six divided by the length of the time placed anywhere in the bootstrapped series. To achieve
series under consideration. this, we draw ⌊n/l⌋ + 2 blocks from the remainder series,
In STL, loess is used to divide the time series into their then discard a random number of values, between zero
trend, seasonal, and remainder components. The division and l − 1, from the beginning of the bootstrapped series.
is additive, i.e., summing the parts gives the original series Finally, to obtain a series with the same length as the
again. In detail, the steps performed during STL decom- original series, we discard as many values as necessary to
position are: (i) detrending; (ii) cycle-subseries smooth- obtain the required length. This processing ensures that
ing: series are built for each seasonal component, and the bootstrapped series does not necessarily begin or end
smoothed separately; (iii) low-pass filtering of smoothed on a block boundary.
cycle-subseries: the subseries are put together again, There are various other methods in the literature
and smoothed; (iv) detrending of the seasonal series; for bootstrapping time series, such as the tapered block
306 C. Bergmeir et al. / International Journal of Forecasting 32 (2016) 303–312

Fig. 1. Series M495 of the M3 dataset, which is a monthly time series. Above is the original series, below the Box–Cox transformed version, with
λ = 6.61 × 10−5 .

Fig. 2. STL decomposition into the trend, seasonal part, and remainder, of the Box–Cox transformed version of series M495 from the M3 dataset.

bootstrap (Paparoditis & Politis, 2001), the dependent wild preliminary experiments (which are not reported here)
bootstrap (DWB, Shao, 2010a), and the extended tapered using the tapered block bootstrap and the DWB, but use
block bootstrap (Shao, 2010b). However, Shao (2010a) only the MBB in this paper, as the other procedures did not
concludes that, ‘‘for regularly spaced time series, the DWB provide substantial advantages.
is not as widely applicable as the MBB, and the DWB lacks Another type of bootstrap is the sieve bootstrap,
the higher order accuracy property of the MBB’’. Thus, which was proposed by Bühlmann (1997) and used by
‘‘the DWB is a complement to, but not a competitor of, Cordeiro and Neves (2009) in an approach similar to ours.
existing block-based bootstrap methods’’. We performed Here, the dependence in the data is tackled by fitting a
C. Bergmeir et al. / International Journal of Forecasting 32 (2016) 303–312 307

Fig. 3. Bootstrapped versions (blue) of the original series M495 (black). Five bootstrapped series are shown. It can be seen that the bootstrapped series
resemble the behavior of the original series quite well. (For interpretation of the references to colour in this figure legend, the reader is referred to the web
version of this article.)

model and then bootstrapping the residuals, assuming that was stated in Section 2.1, we use the ets function from
they are uncorrelated. This bootstrapping procedure has the forecast package (Hyndman, 2014; Hyndman &
the disadvantage that one must assume that the model Khandakar, 2008). The model fits all possible ETS models
captures all of the relevant information in the time series. to the data, then chooses the best model using the bias-
The MBB has the advantage that it makes no modelling corrected AIC. By applying the entire ETS fitting and model
assumptions other than stationarity, whereas the sieve selection procedure to each bootstrapped time series
bootstrap assumes that the fitted model captures all of the independently, we address the issues of data uncertainty,
serial correlation in the data. parameter uncertainty, and model selection uncertainty.
After bootstrapping the remainder, the trend and sea- For each horizon, the final resulting forecast is calcu-
sonality are combined with the bootstrapped remainder, lated from the forecasts from the single models. We per-
and the Box–Cox transformation is inverted, to get the fi- formed preliminary experiments using the mean, trimmed
nal bootstrappedsample. Fig. 3 gives an illustration of boot- mean, and median. However, we restrict our analysis in
strapped versions of the example series M495. this study to the median, as it achieves good results and is
less sensitive to outliers than the mean, for example, and
2.5. The overall procedure we also take into account the results of Kourentzes, Bar-
row, and Crone (2014).
To summarize, the bootstrapping procedure is given in
Algorithm 1. Initially, the value of λ ∈ [0, 1] is calculated 3. Experimental study
according to Guerrero (1993). Then, the Box–Cox transfor-
mation is applied to the series, and the series is decom- In this section, we describe the forecasting methods,
posed into the trend, seasonal part, and remainder, using error measures, and statistical tests that were used in the
STL or loess. The remainder is then bootstrapped using the experiments, together with the results obtained for the M3
MBB, the components are added together again, and the dataset, separately for yearly, quarterly, and monthly data.
Box–Cox transformation is inverted.

Algorithm 1 Generating bootstrapped series 3.1. Compared methods


1: procedure bootstrap(ts, num.boot)
In what follows, we refer to the decomposition
2: λ ← BoxCox.lambda(ts, min=0, max=1)
approach proposed in this paper, namely the Box–Cox
3: ts.bc ← BoxCox(ts, λ)
transformation and STL or loess, as Box–Cox and loess-based
4: if ts is seasonal then
decomposition (BLD). Bootstrapped versions of the series
5: [trend, seasonal, remainder] ← stl(ts.bc)
are generated as was discussed in Section 2, i.e., BLD is
6: else
followed by the MBB, to generate bootstrapped versions
7: seasonal ← 0
of the series. We use an ensemble size of 100, so that
8: [trend, remainder] ← loess(ts.bc)
we estimate models on the original time series and on 99
9: end if
bootstrapped series.
10: recon.series[1] ← ts
We compare our proposed method both to the original
11: for i in 2 to num.boot do
ETS method and to several variants, in the spirit of Cordeiro
12: boot.sample[i] ← MBB(remainder)
and Neves (2009). Specifically, we consider all possible
13: recon.series.bc[i] ← trend + seasonal +
combinations of using BLD or ETS for decomposition,
boot.sample[i]
and the MBB or a sieve bootstrap for bootstrapping the
14: recon.series[i] ← InvBoxCox(recon.series.bc[i],
remainder. Here, the sieve bootstrap is implemented as
λ) follows: an ARIMA model is fitted to the remainder of
15: end for
the method used for decomposition (BLD or ETS) using
16: return recon.series
the auto.arima function from the forecast package
17: end procedure
(Hyndman, 2014; Hyndman & Khandakar, 2008), which
selects a model automatically using the bias-corrected
After generating the bootstrapped time series, the ETS AIC, with model orders of up to five. Then, a normal
model fitting procedure is applied to every series. As bootstrapping procedure is applied to the residuals of this
308 C. Bergmeir et al. / International Journal of Forecasting 32 (2016) 303–312

ARIMA model. In particular, the following procedures are is defined as the mean absolute error on the test set, scaled
employed: by the mean absolute error of a benchmark method on the
training set. The naïve forecast is used as a benchmark,
ETS The original exponential smoothing method
taking into account the seasonality of the data. Thus, the
applied to the original series, selecting one model
MASE is defined as:
from among all possible models using the bias-  
corrected AIC. mean yt − ŷt 
MASE = ,
Bagged.BLD.MBB.ETS Our proposed method. Specifically, mean (|yi − yi−m |)
the bootstrapped time series are generated
where m is the periodicity, which is 1 for yearly data, 4 for
using BLD and MBB. For each of the series
quarterly data, and 12 for monthly data. The variable i runs
thus generated, a model is selected from all
over the training data, and t over the test data.
exponential smoothing models using the bias-
We calculate the sMAPE and MASE values as averages
corrected AIC. Then, the forecasts from each of
over all horizons for each series. Then, we calculate the
the models are combined using the median.
overall means of these measures across series, as well
Bagged.ETS.Sieve.ETS ETS is used for decomposition and
as ranking the forecasting methods for each series and
the sieve bootstrap, as presented above, is used
calculating averages of the ranks across series. Calculating
for bootstrapping the remainder. This approach
the average ranks has the advantage of being more robust
is very similar to the approach of Cordeiro and
to outliers than the overall means.
Neves (2009). The main differences are that (i)
we choose an ETS model for each bootstrapped
3.3. Statistical tests of the results
series, so that this approach accounts for model
uncertainty, and (ii) we use an ARIMA process
We use the Friedman rank-sum test for multiple com-
instead of an AR process for the sieve bootstrap.
parisons in order to detect statistically significant differ-
Bagged.BLD.Sieve.ETS BLD is used for decomposition,
ences within the methods, and the post-hoc procedure of
and the sieve bootstrap is used for bootstrapping
Hochberg and Rom (1995) for the further analysis of these
the remainder.
differences (García, Fernández, Luengo, & Herrera, 2010).1
Bagged.ETS.MBB.ETS ETS is used for decomposition, and
The statistical testing is done using the sMAPE measure.
MBB for bootstrapping the remainder.
We begin by using the testing framework to determine
whether the differences among the proposed and basic
3.2. Evaluation methodology models are statistically significant. Then, in the second
step, we use the testing framework to compare these
We use the yearly, quarterly, and monthly series from models to the methods that originally participated in the
the M3 competition. There are 645 yearly, 756 quarterly, M3 competition. A significance level of α = 0.05 is used.
and 1428 monthly series, so that a total of 2829 series
are used. We follow the M3 methodology, meaning that 3.4. Results on the yearly data
we forecast six periods ahead for yearly series, eight
periods ahead for quarterly series, and 18 periods ahead for Table 2 shows the results for all methods on the yearly
monthly series. The original data, as well as the forecasts data. The results are ordered by average sMAPE rank.
of the methods that participated in the competition, are It can be seen that the bagged versions that use BLD
available in the R package Mcomp (Hyndman, 2013). for decomposition perform better than the original ETS
Although the M3 competition took place some time method, outperforming it consistently on all measures. The
ago, the original submissions to the competition are still bagged versions that use ETS for decomposition perform
competitive and valid benchmarks. To the best of our worse than the original ETS method.
knowledge, the only result in the literature that reports Table 3 shows the results of the first case of sta-
a better performance than the original contest winners is tistical testing, where we compare the bagged and ETS
the recent work of Kourentzes, Petropoulos, and Trapero methods among themselves. The table shows the p-values
(2014). adjusted by the post-hoc procedure. The Friedman test has
We use the symmetric MAPE (sMAPE) to measure the an overall p-value of 5.11 × 10−5 , which is highly sig-
errors. The sMAPE is defined as nificant. The method with the best ranking, in this case
  
yt − ŷt  Bagged.BLD.Sieve.ETS, is chosen as the control method. We
sMAPE = mean 200   , can then see from the table that the differences from the
|yt | + ŷt  methods using ETS for decomposition are significant at the
chosen significance level.
where yt is the true value of the time series y at time t, and Table 4 shows the results of the further statistical
ŷt is the respective forecast. This definition differs slightly testing, where we compare the bagged and ETS methods
from that given by Makridakis and Hibon (2000), as they with the methods from the M3 competition. The overall
do not use absolute values in the denominator. However, result of the Friedman rank sum test is a p-value of
as the series in the M3 all have strictly positive values, this
difference in the definition should not have any effect in
practice (except if a method produces negative forecasts). 1 More information can be found on the thematic web site of
Furthermore, we also use the mean absolute scaled SCI2S about Statistical inference in computational intelligence and data
error (MASE) proposed by Hyndman and Koehler (2006). It mining http://sci2s.ugr.es/sicidm.
C. Bergmeir et al. / International Journal of Forecasting 32 (2016) 303–312 309

Table 2
Results for the yearly series, ordered by the first column, which is the average rank of sMAPE. The other columns show the mean sMAPE, average rank of
MASE, and mean of MASE.
Rank sMAPE Mean sMAPE Rank MASE Mean MASE

ForcX 12.458 16.480 12.437 2.769


AutoBox2 12.745 16.593 12.757 2.754
RBF 12.772 16.424 12.786 2.720
Flors.Pearc1 12.883 17.205 12.884 2.938
THETA 12.994 16.974 13.016 2.806
ForecastPro 13.050 17.271 13.064 3.026
ROBUST.Trend 13.118 17.033 13.147 2.625
PP.Autocast 13.223 17.128 13.205 3.016
DAMPEN 13.283 17.360 13.256 3.032
COMB.S.H.D 13.384 17.072 13.315 2.876
Bagged.BLD.Sieve.ETS 13.504 17.797 13.523 3.189
Bagged.BLD.MBB.ETS 13.588 17.894 13.601 3.152
SMARTFCS 13.755 17.706 13.783 2.996
ETS 13.867 17.926 13.935 3.215
HOLT 14.057 20.021 14.081 3.182
WINTER 14.057 20.021 14.081 3.182
ARARMA 14.462 18.356 14.551 3.481
B.J.auto 14.481 17.726 14.467 3.165
Flors.Pearc2 14.540 17.843 14.561 3.016
Bagged.ETS.Sieve.ETS 14.715 18.206 14.771 3.173
Auto.ANN 14.837 18.565 14.811 3.058
Bagged.ETS.MBB.ETS 15.051 18.685 15.128 3.231
AutoBox3 15.098 20.877 15.093 3.177
THETAsm 15.109 17.922 15.012 3.006
AutoBox1 15.444 21.588 15.426 3.679
NAIVE2 15.733 17.880 15.638 3.172
SINGLE 15.792 17.817 15.671 3.171

Table 3 Table 4
Results of statistical testing for yearly data, using the original ETS Results of statistical testing for yearly data, including our results (printed
method and bagged versions of it. Adjusted p-values calculated using in boldface) and the original results of the M3. Adjusted p-values
the Friedman test with Hochberg’s post-hoc procedure are shown. A calculated using the Friedman test with Hochberg’s post-hoc procedure
horizontal line separates the methods that perform significantly worse are shown. A horizontal line separates the methods that perform
than the best method from those that do not. The best method is significantly worse than the best method from those that do not.
Bagged.BLD.Sieve.ETS, which performs significantly better than either
Method pHoch
Bagged.ETS.Sieve.ETS or Bagged.ETS.MBB.ETS.
ForcX –
Method pHoch
AutoBox2 0.516
Bagged.BLD.Sieve.ETS – RBF 0.516
ETS 0.438 Flors.Pearc1 0.516
Bagged.BLD.MBB.ETS 0.438 THETA 0.516
ForecastPro 0.516
Bagged.ETS.Sieve.ETS 0.034
ROBUST.Trend 0.516
Bagged.ETS.MBB.ETS 3.95 × 10−5
PP.Autocast 0.516
DAMPEN 0.496
COMB.S.H.D 0.325
1.59 × 10−10 , which is highly significant. We see that the
Bagged.BLD.Sieve.ETS 0.180
ForcX method obtains the best ranking, and is used as the Bagged.BLD.MBB.ETS 0.117
control method. The bagged ETS methods using BLD for
SMARTFCS 0.040
decomposition are not significantly different, but ETS and ETS 0.019
the bagged versions using ETS for decomposition perform WINTER 0.004
significantly worse than the control method. HOLT 0.004
ARARMA 9.27 × 10−5
B.J.auto 8.06 × 10−5
3.5. Results on the quarterly data Flors.Pearc2 4.44 × 10−5
Bagged.ETS.Sieve.ETS 6.27 × 10−6
Table 5 shows the results for all methods on the Auto.ANN 1.47 × 10−6
quarterly data, ordered by average sMAPE ranks. It can be Bagged.ETS.MBB.ETS 9.33 × 10−8
seen that the proposed bagged method outperforms the AutoBox3 5.10 × 10−8
THETAsm 4.63 × 10−8
original ETS method in terms of average sMAPE ranks, and
AutoBox1 3.40 × 10−10
average ranks and mean MASEs, but not in mean sMAPEs. NAIVE2 3.15 × 10−12
This may indicate that the proposed method performs SINGLE 1.19 × 10−12
better in general, but that there are some individual series
where it yields worse sMAPE results.
Table 6 shows the results of statistical testing consider- for multiple comparisons results in a p-value of 1.62 ×
ing only the ETS and bagged methods. The Friedman test 10−10 , which is highly significant. The method with the
310 C. Bergmeir et al. / International Journal of Forecasting 32 (2016) 303–312

Table 5
Results for the quarterly series, ordered by the first column, which is the average rank of sMAPE.
Rank sMAPE Mean sMAPE Rank MASE Mean MASE

THETA 11.792 8.956 11.786 1.087


COMB.S.H.D 12.546 9.216 12.540 1.105
ROBUST.Trend 12.819 9.789 12.821 1.152
DAMPEN 13.067 9.361 13.050 1.126
ForcX 13.179 9.537 13.169 1.155
PP.Autocast 13.207 9.395 13.196 1.128
ForecastPro 13.544 9.815 13.571 1.204
B.J.auto 13.550 10.260 13.551 1.188
RBF 13.561 9.565 13.534 1.173
HOLT 13.575 10.938 13.513 1.225
Bagged.BLD.MBB.ETS 13.716 10.132 13.701 1.219
WINTER 13.723 10.840 13.665 1.217
ARARMA 13.827 10.186 13.786 1.185
AutoBox2 13.874 10.004 13.920 1.185
Flors.Pearc1 13.881 9.954 13.888 1.184
ETS 14.091 9.864 14.128 1.225
Bagged.BLD.Sieve.ETS 14.161 10.026 14.204 1.241
Auto.ANN 14.317 10.199 14.337 1.241
THETAsm 14.570 9.821 14.546 1.211
SMARTFCS 14.574 10.153 14.629 1.226
Flors.Pearc2 14.761 10.431 14.824 1.255
AutoBox3 14.823 11.192 14.763 1.272
AutoBox1 15.048 10.961 15.055 1.331
SINGLE 15.118 9.717 15.093 1.229
NAIVE2 15.296 9.951 15.290 1.238
Bagged.ETS.Sieve.ETS 15.687 10.707 15.706 1.351
Bagged.ETS.MBB.ETS 15.696 10.632 15.737 1.332

Table 6 Table 7
Results of statistical testing for quarterly data, using the original ETS Results of statistical testing for quarterly data, including our results
method and bagged versions of it. The best method is Bagged.BLD.MBB. (printed in boldface) and the original results of the M3. A horizontal line
ETS, which performs significantly better than either Bagged.ETS.Sieve.ETS separates the methods that perform significantly worse than the best
or Bagged.ETS.MBB.ETS. method from those that do not. We see that only the COMB.S.H.D does
not have a worse statistical significance than the THETA method.
Method pHoch
Method pHoch
Bagged.BLD.MBB.ETS –
ETS 0.354 THETA –
Bagged.BLD.Sieve.ETS 0.147 COMB.S.H.D 0.065

Bagged.ETS.Sieve.ETS 6.94 × 10−7 ROBUST.Trend 0.024


Bagged.ETS.MBB.ETS 5.00 × 10−8 DAMPEN 0.005
ForcX 0.003
PP.Autocast 0.003
best ranking is the proposed method, Bagged.BLD.MBB. B.J.auto 1.07 × 10−4
ETS. We can see from the table that the differences from RBF 1.07 × 10−4
the methods using ETS for decomposition are statistically HOLT 1.07 × 10−4
significant, but those from the original ETS method are not. Bagged.BLD.MBB.ETS 2.46 × 10−5
WINTER 2.46 × 10−5
Table 7 shows the results of further statistical testing of
ARARMA 7.50 × 10−6
the bagged and ETS methods against the methods from the AutoBox2 4.46 × 10−6
original M3 competition. The overall result of the Friedman Flors.Pearc1 4.37 × 10−6
rank sum test is a p-value of 1.11 × 10−10 , which is ETS 2.68 × 10−7
highly significant. We see from the table that the THETA Bagged.BLD.Sieve.ETS 1.04 × 10−7
method performs best and is chosen as the control method. Auto.ANN 1.06 × 10−8
It statistically significantly outperforms all methods but THETAsm 1.83 × 10−10
SMARTFCS 1.81 × 10−10
COMB.S.H.D.
Flors.Pearc2 7.15 × 10−12
AutoBox3 2.40 × 10−12
3.6. Results on the monthly data AutoBox1 3.34 × 10−14
SINGLE 8.57 × 10−15
Table 8 shows the results for all methods on the NAIVE2 2.25 × 10−16
monthly data, ordered by average sMAPE rank. The bagged Bagged.ETS.Sieve.ETS 3.61 × 10−20
versions using BLD for decomposition again outperform Bagged.ETS.MBB.ETS 3.02 × 10−20
the original ETS method. Furthermore, Bagged.BLD.MBB.
ETS also consistently outperforms all of the original
methods from the M3 on all measures. gives a p-value of 5.02 × 10−10 , meaning that the dif-
Table 9 shows the results of statistical testing consider- ferences are highly significant. The method with the best
ing only the bagged and ETS methods. The Friedman test ranking is Bagged.BLD.MBB.ETS, and we can see from the
C. Bergmeir et al. / International Journal of Forecasting 32 (2016) 303–312 311

Table 8
Results for the monthly series, ordered by the first column, which is the average rank of sMAPE.
Rank sMAPE Mean sMAPE Rank MASE Mean MASE

Bagged.BLD.MBB.ETS 11.714 13.636 11.725 0.846


THETA 11.992 13.892 11.932 0.858
ForecastPro 12.035 13.898 12.064 0.848
Bagged.BLD.Sieve.ETS 12.059 13.734 12.073 0.870
Bagged.ETS.Sieve.ETS 13.079 13.812 12.990 0.888
COMB.S.H.D 13.083 14.466 13.134 0.896
ETS 13.112 14.286 13.150 0.889
Bagged.ETS.MBB.ETS 13.180 13.873 13.116 0.870
HOLT 13.312 15.795 13.276 0.909
ForcX 13.374 14.466 13.415 0.894
WINTER 13.650 15.926 13.631 1.165
RBF 13.842 14.760 13.861 0.910
DAMPEN 14.118 14.576 14.175 0.908
AutoBox2 14.250 15.731 14.294 1.082
B.J.auto 14.278 14.796 14.290 0.914
AutoBox1 14.333 15.811 14.335 0.924
Flors.Pearc2 14.492 15.186 14.525 0.950
SMARTFCS 14.495 15.007 14.399 0.919
Auto.ANN 14.528 15.031 14.561 0.928
ARARMA 14.715 15.826 14.720 0.907
PP.Autocast 14.785 15.328 14.862 0.994
AutoBox3 14.892 16.590 14.801 0.962
Flors.Pearc1 15.213 15.986 15.211 1.008
THETAsm 15.292 15.380 15.285 0.950
ROBUST.Trend 15.446 18.931 15.353 1.039
SINGLE 15.940 15.300 16.004 0.974
NAIVE2 16.790 16.891 16.819 1.037

Table 9 Table 10
Results of statistical testing for monthly data, using the original ETS Results of statistical testing for monthly data, including our results
method and bagged versions of it. The best method is Bagged.BLD.MBB. (printed in boldface) and the original results of the M3. Bagged.BLD.MBB.
ETS, with performs significantly better than Bagged.ETS.Sieve.ETS, ETS performs best, and only the THETA, ForecastPro, and Bagged.BLD.
Bagged.ETS.MBB.ETS, and the original ETS method. Sieve.ETS methods do not perform significantly worse.
Method pHoch Method pHoch

Bagged.BLD.MBB.ETS – Bagged.BLD.MBB.ETS –
Bagged.BLD.Sieve.ETS 0.338 THETA 0.349
ForecastPro 0.349
Bagged.ETS.Sieve.ETS 3.70 × 10−6
Bagged.BLD.Sieve.ETS 0.349
ETS 4.32 × 10−8
Bagged.ETS.MBB.ETS 5.17 × 10−11 Bagged.ETS.Sieve.ETS 1.71 × 10−5
COMB.S.H.D 1.71 × 10−5
ETS 1.50 × 10−5
table that it statistically significantly outperforms both the Bagged.ETS.MBB.ETS 5.56 × 10−6
original method and methods using ETS for decomposition. HOLT 5.93 × 10−7
ForcX 2.03 × 10−7
Table 10 shows the results of statistical testing of the
WINTER 7.05 × 10−10
bagged and ETS methods against the methods from the RBF 8.45 × 10−12
original M3 competition. The overall result of the Friedman DAMPEN 6.97 × 10−15
rank sum test is a p-value of 2.92 × 10−10 , meaning AutoBox2 1.76 × 10−16
that it is highly significant. We see from the table that B.J.auto 8.27 × 10−17
AutoBox1 1.74 × 10−17
the proposed method, Bagged.BLD.MBB.ETS, is the best
Flors.Pearc2 1.37 × 10−19
method, and that only the THETA method, ForecastPro, SMARTFCS 1.29 × 10−19
and Bagged.BLD.Sieve.ETS are not significantly worse at the Auto.ANN 4.81 × 10−20
chosen 5% significance level. ARARMA 1.02 × 10−22
PP.Autocast 9.18 × 10−24
AutoBox3 2.15 × 10−25
4. Conclusions Flors.Pearc1 1.10 × 10−30
THETAsm 4.70 × 10−32
In this work, we have presented a novel method ROBUST.Trend 7.73 × 10−35
SINGLE 1.50 × 10−44
of bagging for exponential smoothing methods, using a
NAIVE2 4.53 × 10−64
Box–Cox transformation, STL decomposition, and a moving
block bootstrap. The method is able to outperform the
basic exponential smoothing methods consistently. These This may be because the longer monthly series allow for
results are statistically significant in the case of the tests with greater power, while the quarterly and yearly
monthly series, but not for the yearly or quarterly series. series are too short for the differences to be significant.
312 C. Bergmeir et al. / International Journal of Forecasting 32 (2016) 303–312

Furthermore, on the monthly data from the M3 Hyndman, R. J., Koehler, A. B., Ord, J. K., & Snyder, R. D. (2008). Forecasting
competition, the bagged exponential smoothing method with exponential smoothing: the state space approach. Springer, URL:
http://www.exponentialsmoothing.net.
is able to outperform all methods that took part in the Hyndman, R., Koehler, A., Snyder, R., & Grose, S. (2002). A state space
competition, most of them statistically significantly. Thus, framework for automatic forecasting using exponential smoothing
this method can be recommended for routine practical methods. International Journal of Forecasting, 18(3), 439–454.
Koning, A., Franses, P., Hibon, M., & Stekler, H. (2005). The M3
application, especially for monthly data. competition: Statistical tests of the results. International Journal of
Forecasting, 21(3), 397–409.
Kourentzes, N., Barrow, D., & Crone, S. (2014a). Neural network ensemble
Acknowledgments
operators for time series forecasting. Expert Systems with Applications,
41(9), 4235–4244.
This work was performed while C. Bergmeir held Kourentzes, N., Petropoulos, F., & Trapero, J. (2014b). Improving
a scholarship from the Spanish Ministry of Education forecasting by estimating time series structural components across
multiple frequencies. International Journal of Forecasting, 30(2),
(MEC) of the ‘‘Programa de Formación del Profesorado 291–302.
Universitario (FPU)’’ (AP2008-04637), and was visiting Künsch, H. R. (1989). The jackknife and the bootstrap for general
stationary observations. Annals of Statistics, 17(3), 1217–1241.
the Department of Econometrics and Business Statistics, Lahiri, S. (2003). Resampling methods for dependent data. Springer.
Monash University, Melbourne, Australia. This work was Makridakis, S., & Hibon, M. (2000). The M3-competition: Results,
supported by the Spanish National Research Plan Project conclusions and implications. International Journal of Forecasting,
16(4), 451–476.
TIN-2013-47210-P and the Andalusian Research Plan P12-
Nelder, J., & Mead, R. (1965). A simplex method for function minimization.
TIC-2985. Furthermore, we would like to thank X. Shao The Computer Journal, 7, 308–313.
for providing code for his bootstrapping procedures, and Paparoditis, E., & Politis, D. (2001). Tapered block bootstrap. Biometrika,
F. Petropoulos for helpful discussions. 88(4), 1105–1119.
R Core Team (2014). R: a language and environment for statistical
computing. Vienna, Austria: R Foundation for Statistical Computing,
URL: http://www.R-project.org/.
References Shao, X. (2010a). The dependent wild bootstrap. Journal of the American
Statistical Association, 105(489), 218–235.
Shao, X. (2010b). Extended tapered block bootstrap. Statistica Sinica,
Box, G. E. P., & Cox, D. R. (1964). An analysis of transformations. Journal of 20(2), 807–821.
the Royal Statistical Society: Series B, 26(2), 211–252.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
Bühlmann, P. (1997). Sieve bootstrap for time series. Bernoulli, 3(2), Christoph Bergmeir received the M.Sc. degree in Computer Science from
123–148. the University of Ulm, Germany, in 2008, and the Ph.D. degree from
Cleveland, R. B., Cleveland, W. S., McRae, J., & Terpenning, I. (1990). STL: the University of Granada, Spain, in 2013. He is currently working at
A seasonal-trend decomposition procedure based on loess. Journal of the Faculty of Information Technology, Monash University, Melbourne,
Official Statistics, 6, 3–73. Australia, as a Research Fellow in Applied Artificial Intelligence. His
Cleveland, W. S., Grosse, E., & Shyu, W. M. (1992). Local regression models. research interests include time series predictor evaluation, meta-learning
In Statistical models in S. Chapman & Hall/CRC (Chapter 8). and forecast combination, and time series complexity. He has published
Cordeiro, C., & Neves, M. (2009). Forecasting time series with in journals such as IEEE Transactions on Neural Networks and Learning
BOOT.EXPOS procedure. REVSTAT—Statistical Journal, 7(2), 135–149. Systems, Journal of Statistical Software, Computer Methods and Programs in
García, S., Fernández, A., Luengo, J., & Herrera, F. (2010). Advanced non- Biomedicine, and Information Sciences.
parametric tests for multiple comparisons in the design of experi-
ments in computational intelligence and data mining: Experimental
analysis of power. Information Sciences, 180(10), 2044–2064. Rob J. Hyndman is Professor of Statistics in the Department of
Gonçalves, S., & Politis, D. (2011). Discussion: Bootstrap methods for Econometrics and Business Statistics at Monash University and Director of
dependent data: A review. Journal of the Korean Statistical Society, the Monash University Business & Economic Forecasting Unit. He is also
40(4), 383–386. Editor-in-Chief of the International Journal of Forecasting and a Director
Goodwin, P. (2010). The Holt–Winters approach to exponential smooth- of the International Institute of Forecasters. Rob is the author of over
ing: 50 years old and going strong. Foresight: The International Journal 100 research papers in statistical science. In 2007, he received the Moran
of Applied Forecasting, 19, 30–33. medal from the Australian Academy of Science for his contributions to
Guerrero, V. (1993). Time-series analysis supported by power transfor- statistical research, especially in the area of statistical forecasting. For
mations. Journal of Forecasting, 12, 37–48. 30 years, Rob has maintained an active consulting practice, assisting
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical hundreds of companies and organizations. His recent consulting work has
learning: data mining, inference, and prediction. Springer. involved forecasting electricity demand, tourism demand, the Australian
Hochberg, Y., & Rom, D. (1995). Extensions of multiple testing procedures government health budget and case volume at a US call centre.
based on Simes’ test. Journal of Statistical Planning and Inference, 48(2),
141–152.
Hyndman, R.J. (2013). Mcomp: Data from the M-competitions. URL: José Manuel Benítez (M’98) received the M.S. and Ph.D. degrees in
http://robjhyndman.com/software/mcomp/. Computer Science both from the Universidad de Granada, Spain. He is
Hyndman, R.J. (2014). Forecast: Forecasting functions for time series currently an Associate Professor at the Department of Computer Science
and linear models. R package version 5.6. URL: http://CRAN.R- and Artificial Intelligence, Universidad de Granada. He is the head of the
project.org/package=forecast. Distributed Computational Intelligence and Time Series (DiCITS) lab. His
Hyndman, R., & Athanasopoulos, G. (2013). Forecasting: principles and research interests include Cloud Computing and Big Data, Data Science,
practice. URL: http://otexts.com/fpp/. Computational Intelligence and Time Series. He has published in the
Hyndman, R., & Khandakar, Y. (2008). Automatic time series forecasting: leading journals of the ‘‘Artificial Intelligence’’ and Computer Science
The forecast package for R. Journal of Statistical Software, 27(3), 1–22. field. He has led a number of research projects funded by different
Hyndman, R., & Koehler, A. (2006). Another look at measures of forecast international and national organizations as well as research contracts
accuracy. International Journal of Forecasting, 22(4), 679–688. with leading international corporations.

You might also like