Bagging Exponential Smoothing Methods Using STL Decomposition and Box-Cox Transformation
Bagging Exponential Smoothing Methods Using STL Decomposition and Box-Cox Transformation
the median, mean, trimmed mean, or weighted mean, for example, uses a weighted moving average with weights
example), often yielding better point predictions. In this that decrease exponentially.
work, we propose a bagging methodology for exponen- Starting from this basic idea, exponential smoothing
tial smoothing methods, and evaluate it on the M3 data. has been expanded to the modelling of different com-
As our input data are non-stationary time series, both se- ponents of a series, such as the trend, seasonality, and
rial dependence and non-stationarity have to be taken into remainder components, where the trend captures the
account. We resolve these issues by applying a seasonal- long-term direction of the series, the seasonal part cap-
trend decomposition based on loess (STL, Cleveland, Cleve- tures repeating components of a series with a known
land, McRae, & Terpenning, 1990) and a moving block periodicity, and the remainder captures unpredictable
bootstrap (MBB, see, e.g., Lahiri, 2003) to the residuals of components. The trend component is a combination
the decomposition. of a level term and a growth term. For example, the
Specifically, our proposed method of bagging is as Holt–Winters purely additive model (i.e., with additive
follows. After applying a Box–Cox transformation to trend and additive seasonality) is defined by the following
the data, the series is decomposed into trend, seasonal recursive equations:
and remainder components. The remainder component
ℓt = α(yt − st −m ) + (1 − α)(ℓt −1 + bt −1 )
is then bootstrapped using the MBB, the trend and
seasonal components are added back in, and the Box–Cox bt = β ∗ (ℓt − ℓt −1 ) + (1 − β ∗ )bt −1
transformation is inverted. In this way, we generate a st = γ (yt − ℓt −1 − bt −1 ) + (1 − γ )st −m
random pool of similar bootstrapped time series. For each ŷt +h|t = ℓt + hbt + st −m+h+ .
of these bootstrapped time series, we choose a model from m
among several exponential smoothing models, using the Here, ℓt denotes the series level at time t, bt denotes the
bias-corrected AIC. Then, point forecasts are calculated slope at time t, st denotes the seasonal component of the
using each of the different models, and the resulting series at time t, and m denotes the number of seasons in a
forecasts are combined using the median. year. The constants α , β ∗ , and γ are smoothing parameters
The only related work that we are aware of is the
the [0, 1]-interval,
in h is the forecast horizon, and h+ m =
study by Cordeiro and Neves (2009), who use a sieve (h − 1) mod m + 1.
bootstrap to perform bagging with ETS models. They use There is a whole family of ETS models, which can be
ETS to decompose the data, then fit an AR model to distinguished by the type of error, trend, and seasonality
the residuals, and generate new residuals from this AR each uses. In general, the trend can be non-existent, ad-
process. Finally, they fit the ETS model that was used for ditive, multiplicative, damped additive, or damped mul-
the decomposition to all of the bootstrapped series. They tiplicative. The seasonality can be non-existent, additive,
also test their method on the M3 dataset, and have some or multiplicative. The error can be additive or multiplica-
success for quarterly and monthly data, but overall, the tive; however, distinguishing between these two options is
results are not promising. In fact, the bagged forecasts only relevant for prediction intervals, not point forecasts.
are often not as good as the original forecasts applied Thus, there are a total of 30 models with different combi-
to the original time series. Our bootstrapping procedure nations of error, trend and seasonality. The different com-
works differently, and yields better results. We use STL binations of trend and seasonality are shown in Table 1.
for the time series decomposition, MBB to bootstrap the For more detailed descriptions, we refer to Hyndman and
remainder, and choose an ETS model for each bootstrapped Athanasopoulos (2013), Hyndman et al. (2008), and Hynd-
series. Using this procedure, we are able to outperform the man et al. (2002).
original M3 methods for monthly data in particular. In R, exponential smoothing is implemented in the
The rest of the paper is organized as follows. In ets function from the forecast package (Hyndman,
Section 2, we discuss the proposed methodology in detail. 2014; Hyndman & Khandakar, 2008). The different models
Section 3 presents the experimental setup and the results, are fitted to the data automatically; i.e., the smoothing
and Section 4 concludes the paper. parameters and initial conditions are optimized using
maximum likelihood with a simplex optimizer (Nelder &
2. Methods Mead, 1965). Then, the best model is chosen using the bias-
corrected AIC. We note that, of the 30 possible models, 11
In this section, we provide a detailed description of can lead to numerical instabilities, and are therefore not
the different parts of our proposed methodology, namely used by the ets function (see Hyndman & Athanasopoulos,
exponential smoothing, and the novel bootstrapping pro- 2013, Section 7.7, for details). Thus, ets, as it is used within
cedure involving a Box–Cox transformation, STL decompo- our bagging procedure, chooses from among 19 different
sition, and the MBB. We illustrate the steps using series models.
M495 from the M3 dataset, which is a monthly series.
2.2. The Box–Cox transformation
2.1. Exponential smoothing
This is a popular transformation for stabilizing the
variance of a time series, and was originally proposed by
The general idea of exponential smoothing is that recent
Box and Cox (1964). It is defined as follows:
observations are more relevant for forecasting than older
log(yt ), λ = 0;
observations, meaning that they should be weighted more
wt =
highly. Accordingly, simple exponential smoothing, for (yλt − 1)/λ, λ ̸= 0.
C. Bergmeir et al. / International Journal of Forecasting 32 (2016) 303–312 305
Table 1
The ETS model family, with different types of seasonality and trend.
Trend component Seasonal component
N (None) A (Additive) M (Multiplicative)
N (None) N, N N, A N, M
A (Additive) A, N A, A A, M
Ad (Additive damped) Ad , N Ad , A Ad , M
M (Multiplicative) M, N M, A M, M
Md (Multiplicative damped) Md , N Md , A Md , M
Depending on the parameter λ, the transformation is (v) deseasonalizing the original series, using the sea-
essentially the identity (λ = 1), the logarithm (λ = 0), sonal component calculated in the previous steps; and
or a transformation somewhere between. One difficulty is (vi) smoothing the deseasonalized series to get the trend
the method of choosing the parameter λ. In this work, we component. In R, the STL algorithm is available through the
restrict it to lie in the interval [0, 1], then use the method of stl function. We use it with its default parameters. The
Guerrero (1993) to choose its value in the following way. degrees for the loess fitting are d = 1 in steps (iii) and (iv),
The series is divided into subseries of a length equal to and d = 0 in step (ii). Fig. 2 shows the STL decomposition
the seasonality, or of length two if the series is not seasonal. of series M495 from the M3 dataset, as an example.
Then, the sample mean m and standard deviation s are Another possibility for decomposition is to use ETS
calculated for each of the subseries, and λ is chosen in such modelling directly, as was proposed by Cordeiro and Neves
a way that the coefficient of variation of s/m(1−λ) across the (2009). However, the components of an ETS model are
subseries is minimized. defined based on the noise terms, and evolve dynamically
For the example time series M495, this method gives with the noise. Thus, ‘‘simulating’’ an ETS process by
λ = 6.61 × 10−5 . Fig. 1 shows the original series and the decoupling the level, trend and seasonal components from
Box–Cox transformed version using this λ. the noise and treating them as independent series may
not work well. This is in contrast to an STL decomposition,
in which the trend and seasonal components are smooth
2.3. Time series decomposition and the way in which they change over time does not
depend on the noise component directly. Therefore, we
For non-seasonal time series, we use the loess method can simulate the noise term independently in an STL
(Cleveland, Grosse, & Shyu, 1992), a smoothing method decomposition using bootstrapping procedures.
based on local regressions, to decompose the time series
into trend and remainder components. For seasonal time
2.4. Bootstrapping the remainder
series, we use STL, as presented by Cleveland et al. (1990),
to obtain the trend, seasonal and remainder components.
As time series data are typically autocorrelated, adapted
In loess, a neighborhood is defined for each data versions of the bootstrap exist (see Gonçalves & Politis,
point, and the points in that neighborhood are then 2011; Lahiri, 2003). One prerequisite is the stationarity
weighted (using so-called neighborhood weights) according of the series, which we achieve by bootstrapping the
to their distances from the respective data point. Finally, remainder of the STL (or loess) decomposition.
a polynomial of degree d is fitted to these points. Usually, In the MBB, as originally proposed by Künsch (1989),
d = 1 and d = 2 are used, i.e., linear or quadratic curves data blocks of equal size are drawn from the series until
are fitted. The trend component is equal to the value of the desired series length is achieved. For a series of length
the polynomial at each data point. In R, loess smoothing n, with a block size of l, n − l + 1 (overlapping) possible
is available through the function loess. For the non- blocks exist.
seasonal data in our experiments, i.e., the yearly data from We use block sizes of l = 8 for yearly and quarterly
the M3 competition, we use the function with a degree of data, and l = 24 for monthly data, i.e., at least two full
d = 1. In this function, the neighborhood size is defined by years, to ensure that any remaining seasonality is captured.
a parameter α , which is the proportion of the overall points As the shortest series for the yearly data has a total of
to include in the neighborhood, with tricubic weighting. To n = 14 observations, care must be taken to ensure that
get a constant neighborhood of six data points, we define every value from the original series could possibly be
this parameter to be six divided by the length of the time placed anywhere in the bootstrapped series. To achieve
series under consideration. this, we draw ⌊n/l⌋ + 2 blocks from the remainder series,
In STL, loess is used to divide the time series into their then discard a random number of values, between zero
trend, seasonal, and remainder components. The division and l − 1, from the beginning of the bootstrapped series.
is additive, i.e., summing the parts gives the original series Finally, to obtain a series with the same length as the
again. In detail, the steps performed during STL decom- original series, we discard as many values as necessary to
position are: (i) detrending; (ii) cycle-subseries smooth- obtain the required length. This processing ensures that
ing: series are built for each seasonal component, and the bootstrapped series does not necessarily begin or end
smoothed separately; (iii) low-pass filtering of smoothed on a block boundary.
cycle-subseries: the subseries are put together again, There are various other methods in the literature
and smoothed; (iv) detrending of the seasonal series; for bootstrapping time series, such as the tapered block
306 C. Bergmeir et al. / International Journal of Forecasting 32 (2016) 303–312
Fig. 1. Series M495 of the M3 dataset, which is a monthly time series. Above is the original series, below the Box–Cox transformed version, with
λ = 6.61 × 10−5 .
Fig. 2. STL decomposition into the trend, seasonal part, and remainder, of the Box–Cox transformed version of series M495 from the M3 dataset.
bootstrap (Paparoditis & Politis, 2001), the dependent wild preliminary experiments (which are not reported here)
bootstrap (DWB, Shao, 2010a), and the extended tapered using the tapered block bootstrap and the DWB, but use
block bootstrap (Shao, 2010b). However, Shao (2010a) only the MBB in this paper, as the other procedures did not
concludes that, ‘‘for regularly spaced time series, the DWB provide substantial advantages.
is not as widely applicable as the MBB, and the DWB lacks Another type of bootstrap is the sieve bootstrap,
the higher order accuracy property of the MBB’’. Thus, which was proposed by Bühlmann (1997) and used by
‘‘the DWB is a complement to, but not a competitor of, Cordeiro and Neves (2009) in an approach similar to ours.
existing block-based bootstrap methods’’. We performed Here, the dependence in the data is tackled by fitting a
C. Bergmeir et al. / International Journal of Forecasting 32 (2016) 303–312 307
Fig. 3. Bootstrapped versions (blue) of the original series M495 (black). Five bootstrapped series are shown. It can be seen that the bootstrapped series
resemble the behavior of the original series quite well. (For interpretation of the references to colour in this figure legend, the reader is referred to the web
version of this article.)
model and then bootstrapping the residuals, assuming that was stated in Section 2.1, we use the ets function from
they are uncorrelated. This bootstrapping procedure has the forecast package (Hyndman, 2014; Hyndman &
the disadvantage that one must assume that the model Khandakar, 2008). The model fits all possible ETS models
captures all of the relevant information in the time series. to the data, then chooses the best model using the bias-
The MBB has the advantage that it makes no modelling corrected AIC. By applying the entire ETS fitting and model
assumptions other than stationarity, whereas the sieve selection procedure to each bootstrapped time series
bootstrap assumes that the fitted model captures all of the independently, we address the issues of data uncertainty,
serial correlation in the data. parameter uncertainty, and model selection uncertainty.
After bootstrapping the remainder, the trend and sea- For each horizon, the final resulting forecast is calcu-
sonality are combined with the bootstrapped remainder, lated from the forecasts from the single models. We per-
and the Box–Cox transformation is inverted, to get the fi- formed preliminary experiments using the mean, trimmed
nal bootstrappedsample. Fig. 3 gives an illustration of boot- mean, and median. However, we restrict our analysis in
strapped versions of the example series M495. this study to the median, as it achieves good results and is
less sensitive to outliers than the mean, for example, and
2.5. The overall procedure we also take into account the results of Kourentzes, Bar-
row, and Crone (2014).
To summarize, the bootstrapping procedure is given in
Algorithm 1. Initially, the value of λ ∈ [0, 1] is calculated 3. Experimental study
according to Guerrero (1993). Then, the Box–Cox transfor-
mation is applied to the series, and the series is decom- In this section, we describe the forecasting methods,
posed into the trend, seasonal part, and remainder, using error measures, and statistical tests that were used in the
STL or loess. The remainder is then bootstrapped using the experiments, together with the results obtained for the M3
MBB, the components are added together again, and the dataset, separately for yearly, quarterly, and monthly data.
Box–Cox transformation is inverted.
ARIMA model. In particular, the following procedures are is defined as the mean absolute error on the test set, scaled
employed: by the mean absolute error of a benchmark method on the
training set. The naïve forecast is used as a benchmark,
ETS The original exponential smoothing method
taking into account the seasonality of the data. Thus, the
applied to the original series, selecting one model
MASE is defined as:
from among all possible models using the bias-
corrected AIC. mean yt − ŷt
MASE = ,
Bagged.BLD.MBB.ETS Our proposed method. Specifically, mean (|yi − yi−m |)
the bootstrapped time series are generated
where m is the periodicity, which is 1 for yearly data, 4 for
using BLD and MBB. For each of the series
quarterly data, and 12 for monthly data. The variable i runs
thus generated, a model is selected from all
over the training data, and t over the test data.
exponential smoothing models using the bias-
We calculate the sMAPE and MASE values as averages
corrected AIC. Then, the forecasts from each of
over all horizons for each series. Then, we calculate the
the models are combined using the median.
overall means of these measures across series, as well
Bagged.ETS.Sieve.ETS ETS is used for decomposition and
as ranking the forecasting methods for each series and
the sieve bootstrap, as presented above, is used
calculating averages of the ranks across series. Calculating
for bootstrapping the remainder. This approach
the average ranks has the advantage of being more robust
is very similar to the approach of Cordeiro and
to outliers than the overall means.
Neves (2009). The main differences are that (i)
we choose an ETS model for each bootstrapped
3.3. Statistical tests of the results
series, so that this approach accounts for model
uncertainty, and (ii) we use an ARIMA process
We use the Friedman rank-sum test for multiple com-
instead of an AR process for the sieve bootstrap.
parisons in order to detect statistically significant differ-
Bagged.BLD.Sieve.ETS BLD is used for decomposition,
ences within the methods, and the post-hoc procedure of
and the sieve bootstrap is used for bootstrapping
Hochberg and Rom (1995) for the further analysis of these
the remainder.
differences (García, Fernández, Luengo, & Herrera, 2010).1
Bagged.ETS.MBB.ETS ETS is used for decomposition, and
The statistical testing is done using the sMAPE measure.
MBB for bootstrapping the remainder.
We begin by using the testing framework to determine
whether the differences among the proposed and basic
3.2. Evaluation methodology models are statistically significant. Then, in the second
step, we use the testing framework to compare these
We use the yearly, quarterly, and monthly series from models to the methods that originally participated in the
the M3 competition. There are 645 yearly, 756 quarterly, M3 competition. A significance level of α = 0.05 is used.
and 1428 monthly series, so that a total of 2829 series
are used. We follow the M3 methodology, meaning that 3.4. Results on the yearly data
we forecast six periods ahead for yearly series, eight
periods ahead for quarterly series, and 18 periods ahead for Table 2 shows the results for all methods on the yearly
monthly series. The original data, as well as the forecasts data. The results are ordered by average sMAPE rank.
of the methods that participated in the competition, are It can be seen that the bagged versions that use BLD
available in the R package Mcomp (Hyndman, 2013). for decomposition perform better than the original ETS
Although the M3 competition took place some time method, outperforming it consistently on all measures. The
ago, the original submissions to the competition are still bagged versions that use ETS for decomposition perform
competitive and valid benchmarks. To the best of our worse than the original ETS method.
knowledge, the only result in the literature that reports Table 3 shows the results of the first case of sta-
a better performance than the original contest winners is tistical testing, where we compare the bagged and ETS
the recent work of Kourentzes, Petropoulos, and Trapero methods among themselves. The table shows the p-values
(2014). adjusted by the post-hoc procedure. The Friedman test has
We use the symmetric MAPE (sMAPE) to measure the an overall p-value of 5.11 × 10−5 , which is highly sig-
errors. The sMAPE is defined as nificant. The method with the best ranking, in this case
yt − ŷt Bagged.BLD.Sieve.ETS, is chosen as the control method. We
sMAPE = mean 200 , can then see from the table that the differences from the
|yt | + ŷt methods using ETS for decomposition are significant at the
chosen significance level.
where yt is the true value of the time series y at time t, and Table 4 shows the results of the further statistical
ŷt is the respective forecast. This definition differs slightly testing, where we compare the bagged and ETS methods
from that given by Makridakis and Hibon (2000), as they with the methods from the M3 competition. The overall
do not use absolute values in the denominator. However, result of the Friedman rank sum test is a p-value of
as the series in the M3 all have strictly positive values, this
difference in the definition should not have any effect in
practice (except if a method produces negative forecasts). 1 More information can be found on the thematic web site of
Furthermore, we also use the mean absolute scaled SCI2S about Statistical inference in computational intelligence and data
error (MASE) proposed by Hyndman and Koehler (2006). It mining http://sci2s.ugr.es/sicidm.
C. Bergmeir et al. / International Journal of Forecasting 32 (2016) 303–312 309
Table 2
Results for the yearly series, ordered by the first column, which is the average rank of sMAPE. The other columns show the mean sMAPE, average rank of
MASE, and mean of MASE.
Rank sMAPE Mean sMAPE Rank MASE Mean MASE
Table 3 Table 4
Results of statistical testing for yearly data, using the original ETS Results of statistical testing for yearly data, including our results (printed
method and bagged versions of it. Adjusted p-values calculated using in boldface) and the original results of the M3. Adjusted p-values
the Friedman test with Hochberg’s post-hoc procedure are shown. A calculated using the Friedman test with Hochberg’s post-hoc procedure
horizontal line separates the methods that perform significantly worse are shown. A horizontal line separates the methods that perform
than the best method from those that do not. The best method is significantly worse than the best method from those that do not.
Bagged.BLD.Sieve.ETS, which performs significantly better than either
Method pHoch
Bagged.ETS.Sieve.ETS or Bagged.ETS.MBB.ETS.
ForcX –
Method pHoch
AutoBox2 0.516
Bagged.BLD.Sieve.ETS – RBF 0.516
ETS 0.438 Flors.Pearc1 0.516
Bagged.BLD.MBB.ETS 0.438 THETA 0.516
ForecastPro 0.516
Bagged.ETS.Sieve.ETS 0.034
ROBUST.Trend 0.516
Bagged.ETS.MBB.ETS 3.95 × 10−5
PP.Autocast 0.516
DAMPEN 0.496
COMB.S.H.D 0.325
1.59 × 10−10 , which is highly significant. We see that the
Bagged.BLD.Sieve.ETS 0.180
ForcX method obtains the best ranking, and is used as the Bagged.BLD.MBB.ETS 0.117
control method. The bagged ETS methods using BLD for
SMARTFCS 0.040
decomposition are not significantly different, but ETS and ETS 0.019
the bagged versions using ETS for decomposition perform WINTER 0.004
significantly worse than the control method. HOLT 0.004
ARARMA 9.27 × 10−5
B.J.auto 8.06 × 10−5
3.5. Results on the quarterly data Flors.Pearc2 4.44 × 10−5
Bagged.ETS.Sieve.ETS 6.27 × 10−6
Table 5 shows the results for all methods on the Auto.ANN 1.47 × 10−6
quarterly data, ordered by average sMAPE ranks. It can be Bagged.ETS.MBB.ETS 9.33 × 10−8
seen that the proposed bagged method outperforms the AutoBox3 5.10 × 10−8
THETAsm 4.63 × 10−8
original ETS method in terms of average sMAPE ranks, and
AutoBox1 3.40 × 10−10
average ranks and mean MASEs, but not in mean sMAPEs. NAIVE2 3.15 × 10−12
This may indicate that the proposed method performs SINGLE 1.19 × 10−12
better in general, but that there are some individual series
where it yields worse sMAPE results.
Table 6 shows the results of statistical testing consider- for multiple comparisons results in a p-value of 1.62 ×
ing only the ETS and bagged methods. The Friedman test 10−10 , which is highly significant. The method with the
310 C. Bergmeir et al. / International Journal of Forecasting 32 (2016) 303–312
Table 5
Results for the quarterly series, ordered by the first column, which is the average rank of sMAPE.
Rank sMAPE Mean sMAPE Rank MASE Mean MASE
Table 6 Table 7
Results of statistical testing for quarterly data, using the original ETS Results of statistical testing for quarterly data, including our results
method and bagged versions of it. The best method is Bagged.BLD.MBB. (printed in boldface) and the original results of the M3. A horizontal line
ETS, which performs significantly better than either Bagged.ETS.Sieve.ETS separates the methods that perform significantly worse than the best
or Bagged.ETS.MBB.ETS. method from those that do not. We see that only the COMB.S.H.D does
not have a worse statistical significance than the THETA method.
Method pHoch
Method pHoch
Bagged.BLD.MBB.ETS –
ETS 0.354 THETA –
Bagged.BLD.Sieve.ETS 0.147 COMB.S.H.D 0.065
Table 8
Results for the monthly series, ordered by the first column, which is the average rank of sMAPE.
Rank sMAPE Mean sMAPE Rank MASE Mean MASE
Table 9 Table 10
Results of statistical testing for monthly data, using the original ETS Results of statistical testing for monthly data, including our results
method and bagged versions of it. The best method is Bagged.BLD.MBB. (printed in boldface) and the original results of the M3. Bagged.BLD.MBB.
ETS, with performs significantly better than Bagged.ETS.Sieve.ETS, ETS performs best, and only the THETA, ForecastPro, and Bagged.BLD.
Bagged.ETS.MBB.ETS, and the original ETS method. Sieve.ETS methods do not perform significantly worse.
Method pHoch Method pHoch
Bagged.BLD.MBB.ETS – Bagged.BLD.MBB.ETS –
Bagged.BLD.Sieve.ETS 0.338 THETA 0.349
ForecastPro 0.349
Bagged.ETS.Sieve.ETS 3.70 × 10−6
Bagged.BLD.Sieve.ETS 0.349
ETS 4.32 × 10−8
Bagged.ETS.MBB.ETS 5.17 × 10−11 Bagged.ETS.Sieve.ETS 1.71 × 10−5
COMB.S.H.D 1.71 × 10−5
ETS 1.50 × 10−5
table that it statistically significantly outperforms both the Bagged.ETS.MBB.ETS 5.56 × 10−6
original method and methods using ETS for decomposition. HOLT 5.93 × 10−7
ForcX 2.03 × 10−7
Table 10 shows the results of statistical testing of the
WINTER 7.05 × 10−10
bagged and ETS methods against the methods from the RBF 8.45 × 10−12
original M3 competition. The overall result of the Friedman DAMPEN 6.97 × 10−15
rank sum test is a p-value of 2.92 × 10−10 , meaning AutoBox2 1.76 × 10−16
that it is highly significant. We see from the table that B.J.auto 8.27 × 10−17
AutoBox1 1.74 × 10−17
the proposed method, Bagged.BLD.MBB.ETS, is the best
Flors.Pearc2 1.37 × 10−19
method, and that only the THETA method, ForecastPro, SMARTFCS 1.29 × 10−19
and Bagged.BLD.Sieve.ETS are not significantly worse at the Auto.ANN 4.81 × 10−20
chosen 5% significance level. ARARMA 1.02 × 10−22
PP.Autocast 9.18 × 10−24
AutoBox3 2.15 × 10−25
4. Conclusions Flors.Pearc1 1.10 × 10−30
THETAsm 4.70 × 10−32
In this work, we have presented a novel method ROBUST.Trend 7.73 × 10−35
SINGLE 1.50 × 10−44
of bagging for exponential smoothing methods, using a
NAIVE2 4.53 × 10−64
Box–Cox transformation, STL decomposition, and a moving
block bootstrap. The method is able to outperform the
basic exponential smoothing methods consistently. These This may be because the longer monthly series allow for
results are statistically significant in the case of the tests with greater power, while the quarterly and yearly
monthly series, but not for the yearly or quarterly series. series are too short for the differences to be significant.
312 C. Bergmeir et al. / International Journal of Forecasting 32 (2016) 303–312
Furthermore, on the monthly data from the M3 Hyndman, R. J., Koehler, A. B., Ord, J. K., & Snyder, R. D. (2008). Forecasting
competition, the bagged exponential smoothing method with exponential smoothing: the state space approach. Springer, URL:
http://www.exponentialsmoothing.net.
is able to outperform all methods that took part in the Hyndman, R., Koehler, A., Snyder, R., & Grose, S. (2002). A state space
competition, most of them statistically significantly. Thus, framework for automatic forecasting using exponential smoothing
this method can be recommended for routine practical methods. International Journal of Forecasting, 18(3), 439–454.
Koning, A., Franses, P., Hibon, M., & Stekler, H. (2005). The M3
application, especially for monthly data. competition: Statistical tests of the results. International Journal of
Forecasting, 21(3), 397–409.
Kourentzes, N., Barrow, D., & Crone, S. (2014a). Neural network ensemble
Acknowledgments
operators for time series forecasting. Expert Systems with Applications,
41(9), 4235–4244.
This work was performed while C. Bergmeir held Kourentzes, N., Petropoulos, F., & Trapero, J. (2014b). Improving
a scholarship from the Spanish Ministry of Education forecasting by estimating time series structural components across
multiple frequencies. International Journal of Forecasting, 30(2),
(MEC) of the ‘‘Programa de Formación del Profesorado 291–302.
Universitario (FPU)’’ (AP2008-04637), and was visiting Künsch, H. R. (1989). The jackknife and the bootstrap for general
stationary observations. Annals of Statistics, 17(3), 1217–1241.
the Department of Econometrics and Business Statistics, Lahiri, S. (2003). Resampling methods for dependent data. Springer.
Monash University, Melbourne, Australia. This work was Makridakis, S., & Hibon, M. (2000). The M3-competition: Results,
supported by the Spanish National Research Plan Project conclusions and implications. International Journal of Forecasting,
16(4), 451–476.
TIN-2013-47210-P and the Andalusian Research Plan P12-
Nelder, J., & Mead, R. (1965). A simplex method for function minimization.
TIC-2985. Furthermore, we would like to thank X. Shao The Computer Journal, 7, 308–313.
for providing code for his bootstrapping procedures, and Paparoditis, E., & Politis, D. (2001). Tapered block bootstrap. Biometrika,
F. Petropoulos for helpful discussions. 88(4), 1105–1119.
R Core Team (2014). R: a language and environment for statistical
computing. Vienna, Austria: R Foundation for Statistical Computing,
URL: http://www.R-project.org/.
References Shao, X. (2010a). The dependent wild bootstrap. Journal of the American
Statistical Association, 105(489), 218–235.
Shao, X. (2010b). Extended tapered block bootstrap. Statistica Sinica,
Box, G. E. P., & Cox, D. R. (1964). An analysis of transformations. Journal of 20(2), 807–821.
the Royal Statistical Society: Series B, 26(2), 211–252.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
Bühlmann, P. (1997). Sieve bootstrap for time series. Bernoulli, 3(2), Christoph Bergmeir received the M.Sc. degree in Computer Science from
123–148. the University of Ulm, Germany, in 2008, and the Ph.D. degree from
Cleveland, R. B., Cleveland, W. S., McRae, J., & Terpenning, I. (1990). STL: the University of Granada, Spain, in 2013. He is currently working at
A seasonal-trend decomposition procedure based on loess. Journal of the Faculty of Information Technology, Monash University, Melbourne,
Official Statistics, 6, 3–73. Australia, as a Research Fellow in Applied Artificial Intelligence. His
Cleveland, W. S., Grosse, E., & Shyu, W. M. (1992). Local regression models. research interests include time series predictor evaluation, meta-learning
In Statistical models in S. Chapman & Hall/CRC (Chapter 8). and forecast combination, and time series complexity. He has published
Cordeiro, C., & Neves, M. (2009). Forecasting time series with in journals such as IEEE Transactions on Neural Networks and Learning
BOOT.EXPOS procedure. REVSTAT—Statistical Journal, 7(2), 135–149. Systems, Journal of Statistical Software, Computer Methods and Programs in
García, S., Fernández, A., Luengo, J., & Herrera, F. (2010). Advanced non- Biomedicine, and Information Sciences.
parametric tests for multiple comparisons in the design of experi-
ments in computational intelligence and data mining: Experimental
analysis of power. Information Sciences, 180(10), 2044–2064. Rob J. Hyndman is Professor of Statistics in the Department of
Gonçalves, S., & Politis, D. (2011). Discussion: Bootstrap methods for Econometrics and Business Statistics at Monash University and Director of
dependent data: A review. Journal of the Korean Statistical Society, the Monash University Business & Economic Forecasting Unit. He is also
40(4), 383–386. Editor-in-Chief of the International Journal of Forecasting and a Director
Goodwin, P. (2010). The Holt–Winters approach to exponential smooth- of the International Institute of Forecasters. Rob is the author of over
ing: 50 years old and going strong. Foresight: The International Journal 100 research papers in statistical science. In 2007, he received the Moran
of Applied Forecasting, 19, 30–33. medal from the Australian Academy of Science for his contributions to
Guerrero, V. (1993). Time-series analysis supported by power transfor- statistical research, especially in the area of statistical forecasting. For
mations. Journal of Forecasting, 12, 37–48. 30 years, Rob has maintained an active consulting practice, assisting
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical hundreds of companies and organizations. His recent consulting work has
learning: data mining, inference, and prediction. Springer. involved forecasting electricity demand, tourism demand, the Australian
Hochberg, Y., & Rom, D. (1995). Extensions of multiple testing procedures government health budget and case volume at a US call centre.
based on Simes’ test. Journal of Statistical Planning and Inference, 48(2),
141–152.
Hyndman, R.J. (2013). Mcomp: Data from the M-competitions. URL: José Manuel Benítez (M’98) received the M.S. and Ph.D. degrees in
http://robjhyndman.com/software/mcomp/. Computer Science both from the Universidad de Granada, Spain. He is
Hyndman, R.J. (2014). Forecast: Forecasting functions for time series currently an Associate Professor at the Department of Computer Science
and linear models. R package version 5.6. URL: http://CRAN.R- and Artificial Intelligence, Universidad de Granada. He is the head of the
project.org/package=forecast. Distributed Computational Intelligence and Time Series (DiCITS) lab. His
Hyndman, R., & Athanasopoulos, G. (2013). Forecasting: principles and research interests include Cloud Computing and Big Data, Data Science,
practice. URL: http://otexts.com/fpp/. Computational Intelligence and Time Series. He has published in the
Hyndman, R., & Khandakar, Y. (2008). Automatic time series forecasting: leading journals of the ‘‘Artificial Intelligence’’ and Computer Science
The forecast package for R. Journal of Statistical Software, 27(3), 1–22. field. He has led a number of research projects funded by different
Hyndman, R., & Koehler, A. (2006). Another look at measures of forecast international and national organizations as well as research contracts
accuracy. International Journal of Forecasting, 22(4), 679–688. with leading international corporations.