0% found this document useful (0 votes)

89 views9 pages

Forecasting Time Series With Missing Data Using Holt's Model

Uploaded by

Juan Francisco Durango Grisales

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views9 pages

Forecasting Time Series With Missing Data Using Holt's Model

Uploaded by

Juan Francisco Durango Grisales

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Journal of Statistical Planning and Inference 139 (2009) 2791 -- 2799

Contents lists available at ScienceDirect

Journal of Statistical Planning and Inference

journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / j s p i

Forecasting time series with missing data using Holt's model

José D. Bermúdez∗ , Ana Corberán-Vallet, Enriqueta Vercher
Department of Statistics and O.R., University of Valencia, Doctor Moliner 50, E-46100 Burjassot, Spain

A R T I C L E I N F O A B S T R A C T

Article history: This paper deals with the prediction of time series with missing data using an alternative
Received 4 May 2007 formulation for Holt's model with additive errors. This formulation simplifies both the calcu-
Received in revised form lus of maximum likelihood estimators of all the unknowns in the model and the calculus of
29 December 2008
point forecasts. In the presence of missing data, the EM algorithm is used to obtain maximum
Accepted 1 January 2009
likelihood estimates and point forecasts. Based on this application we propose a leave-one-out
Available online 14 January 2009
algorithm for the data transformation selection problem which allows us to analyse Holt's
Keywords: model with multiplicative errors. Some numerical results show the performance of these pro-
Forecasting cedures for obtaining robust forecasts.
Exponential smoothing © 2009 Elsevier B.V. All rights reserved.
Linear model
EM algorithm
Data transformation

1. Introduction

Most time series models assume that the observations are sampled with the same frequency, but it is common to find time
series with missing data. In order to carry out a precise analysis of these time series and obtain reliable forecasts, it is necessary
to deal effectively with the missing data.
The missing data problem has been dealt with successfully using the state-space methodology. Jones (1980) obtained the
maximum likelihood estimates of the parameters of an ARMA model in the presence of missing data using the Kalman filter. Kohn
and Ansley (1986) proposed a modified Kalman filter to generalise those previous results to the case of ARIMA models. Gómez
and Maravall (1994) showed a new definition of the likelihood of an ARIMA model with missing observations that permits the
use of the ordinary Kalman filter. Recently, Gómez et al. (1999) proposed filling in the holes in the missing data with arbitrary
values and carrying out the maximum likelihood estimation with additive outliers.
Analogously, Wright (1986) suggested an extension for simple exponential smoothing and Holt's method in the case of missing
data. Cipra et al. (1995) extended the previous approach to the Holt–Winters method: the level, trend and seasonal terms are
updated each time a new observation becomes available, using modified transition equations which take into account the possible
presence of missing data between two consecutive observations. Cipra and Romera (1997) proposed new transition equations for
the Holt–Winters method in the presence of missing data using a robust version of the Kalman filter based on the M-estimation
methodology: if the observation at one time point is missing, the estimated level, trend and seasonality remain unchanged. The
problem of missing data in time series prediction has also been dealt with using neural networks (Hofmann and Tresp, 1998) and
Monte Carlo methods (Chen and Liu, 1998).
Since exponential smoothing methods are widely used for short-term prediction in business and industry (Gardner, 2006),
in this paper, we present a new approach to the prediction of time series with missing data based on an alternative formulation
for Holt's model with additive errors. In this new formulation the stochastic component of the model is introduced by means
of additive, independent, homoscedastic and normal errors. Then the data vector is multivariate normal and its mean and

∗ Corresponding author.
E-mail addresses: [email protected] (J.D. Bermúdez), [email protected] (A. Corberán-Vallet), [email protected] (E. Vercher).

0378-3758/$ - see front matter © 2009 Elsevier B.V. All rights reserved.
doi:10.1016/j.jspi.2009.01.004
2792 J.D. Bermúdez et al. / Journal of Statistical Planning and Inference 139 (2009) 2791 -- 2799

covariance matrix are functions of the model parameters. Hence Holt's model can be formulated as a heteroscedastic linear
model with coefficients given by the initial conditions and the covariance matrix relying on the smoothing parameters. This
formulation allows us to obtain the maximum likelihood estimates of all the unknowns, the smoothing parameters and the initial
values of level and trend jointly, as in some other proposals (Harvey, 1989; Ord et al., 1997; Segura and Vercher, 2001; Bermúdez
et al., 2006a,b; Bermúdez et al., 2007, 2008). In the presence of missing data, the EM algorithm (Dempster et al., 1977) is used in
the estimation of the model's parameters and the calculus of point forecasts.
The paper is organised as follows. In Section 2 we define the multivariate linear model for Holt's model and the formulae for the
calculation of the maximum likelihood estimators and point forecasts. In Section 3 we apply the EM algorithm to the estimation
of the model parameters and the calculus of point forecasts when missing data are presented, and show the performance of the
procedure with some numerical results. In Section 4 we develop a new data transformation selection method based on the leave-
one-out technique and present the results corresponding to the prediction of the yearly time series of the M3 Competition when
the proposed data transformation selection mechanism is used. The last section is concerned with some concluding remarks.

2. Holt's model with additive errors

We assume that {yt }nt=1 are the observed data. The Holt model with additive errors assumes that the observation at time t
comes from the random variable

Yt = t + t (1)

where t = at−1 + bt−1 , at and bt are the level and trend at time t, respectively, and {t }nt=1 are independent homoscedastic
normal random variables, N(0, 2 ). When a new observation becomes available the level and trend terms are updated through
the transition equations

at = t + t (2)

bt = bt−1 + t (3)

Let Y = (y1 , y2 , . . . , yn ) be the data vector, = (, ) the vector of smoothing parameters and = (a0 , b0 ) the vector of initial
values. Given the initial values and applying the transition recursively the h steps ahead prediction is usually obtained as
ŷn+h = an + hbn for h 1. In practice, both and vectors are unknown and have to be estimated from the data. Applying (1),
(2) and (3) recursively the data can be stated (Bermúdez et al., 2007) as

Y1 = a0 + b0 + 1

Y2 = a0 + 2b0 + (1 + )1 + 2

Y3 = a0 + 3b0 + (1 + 2)1 + (1 + )2 + 3

..
.
t−1

Yt = a0 + tb0 + (1 + (t − r))r + t
r=1

The data vector can then be described through the equation

Y = A + L (4)

where A is the n × 2 matrix whose first column is the vector (1, 1, . . . , 1) and the second one the vector (1, 2, . . . , n) ; L is the n × n
lower triangular matrix whose elements in the main diagonal are equal to 1 and li,j = (1 + (i − j)) if i > j; = (1 , 2 , . . . , n )
is the error vector. Therefore, the data vector follows a multivariate normal distribution with mean E(Y) = A and covariance
matrix V(Y) = 2 LL . This covariance matrix depends on the smoothing parameters but is always positive-definite because it is
symmetric and its determinant is always positive: |V(Y)| = 2n |L|2 = 2n > 0, due to |L| = 1. We assume that the value of each
smoothing parameter lies in the interval [0,1], although this restriction is not necessary.
From Eq. (4), the log-likelihood function of the data vector Y is given by the logarithm of the multivariate normal density
function
n 1
− ln(2 ) − (Y − A) (LL )−1 (Y − A) (5)
2 2 2

The quadratic form in (5), (Y − A) (LL )−1 (Y − A), can be decomposed as (˜ − ) X X(
˜ − ) + (L−1 Y) (I − PX )(L−1 Y), where
−1 −1
X is the matrix L A, PX = X(X X) X is the orthogonal projection matrix on the vector space generated by the columns of the
matrix X and ˜ = (X X)−1 X L−1 Y is the mean square estimator of when is known. So, the log-likelihood function can be
J.D. Bermúdez et al. / Journal of Statistical Planning and Inference 139 (2009) 2791 -- 2799 2793

expressed as

n 1 1
− ln(2 ) − ˜ − ) X X(
( ˜ − ) − (L−1 Y) (I − PX )(L−1 Y) (6)
2 22 2 2
The first quadratic form that appears in (6) can always be annulled, whatever the value of is, while the second quadratic form
involves only the smoothing vector . The maximum likelihood estimator of , denoted by , can, therefore, be obtained by
minimising

min (L−1 Y) (I − PX )L−1 Y (7)

has been obtained, let L̂ be the matrix L computed at ˆ and X̂ = L̂−1 A. The maximum likelihood estimator of ,
Once ˆ is

ˆ = (X̂ X̂)−1 X̂ L̂−1 Y

(8)

and finally, the maximum likelihood estimator of 2 is

1
ˆ 2 = (L̂−1 Y) (I − PX̂ )L̂−1 Y (9)
n
Using this approach we only have to solve one optimisation problem (7) with respect to , since once its maximum likelihood
estimator has been calculated, the estimators of and are obtained analytically using (8) and (9).
Notice that for solving the optimisation problem (7), we need to use nonlinear optimisation procedures that enable us to
incorporate box-constraints since we are assuming that the values of the smoothing parameters lie in the interval [0,1]. To do
that we use the `L-BFGS-B' method (Byrd et al., 1995) in the R-language. It is important to perform a multi-start strategy because
the routines used do not guarantee the achievement of the global optima when the objective function is not convex. On the other
hand, the calculus of the matrix PX can give numerical problems that can be solved using the singular value decomposition of the
matrix X. So if X = UDV is the singular value decomposition of X, PX = UU and the matrix (X X)−1 X needed to obtain the maximum
likelihood estimator of is equal to VD−1 U . We work with the software R Development Core Team (2008), where the singular
value decomposition of a matrix and the optimisation of the problem (7) are easily carried out using standard R commands.
Once the parameters of the model have been estimated, forecasts of future values can be obtained as follows. Let Y1 be the
n × 1 data vector and Y2 the h × 1 vector of future values. If we assume that the joint (n + h) × 1 vector Ye = (Y1 , Y2 ) satisfies
Eq. (4), we have

Y1 A1 L1 0 1
Ye = = +
Y2 A2 L21 L2 2
where the vector and the matrices A and L have been partitioned in a similar way to the vector Ye . Therefore (see for example
Seber, 1984, p. 19), the conditional distribution of Y2 given Y1 is multivariate normal with mean

2.1 = A2 + L21 L−1 −1 −1

1 (Y1 − A1 ) = (A2 − L21 L1 A1 ) + L21 L1 Y1

and point forecasts can be obtained as follows:

ˆ 2.1 = (A2 − L̂21 L̂−1 −1

1 A1 ) + L̂21 L̂1 Y1
ˆ (10)

Notice that L−1

1 (Y1 − A1 ) is the vector of the one-step-ahead observed errors, so the forecasting vector agrees with the Holt
model's usual predictor, ŷn+h = ân + hb̂n for h 1, although the estimates used are obtained in a different way.

3. Holt's model with missing data

The missing data problem has usually been dealt with using the Kalman filter (Kohn and Ansley, 1986; Gómez and Maravall,
1994; Cipra and Romera, 1997). In this section, we present a new approach to the prediction of time series with missing data
based on the formulation for Holt's model proposed in the previous section and using the EM algorithm for the estimation of all
the unknowns in the model: smoothing parameters and starting values.

3.1. Implementation of the EM algorithm

The EM algorithm is an iterative optimisation method for finding the maximum likelihood estimates of the parameters of
a probabilistic model. It is appropriate when the given data are incomplete or have missing values, or when optimising the
likelihood function is analytically intractable but can be simplified by assuming the existence of unobserved latent variables.
Suppose that (i) denotes the current value of after i iterations of the algorithm. The next iteration can be described in two
steps, expectation and maximisation steps, giving name to the algorithm:
2794 J.D. Bermúdez et al. / Journal of Statistical Planning and Inference 139 (2009) 2791 -- 2799

E-step (expectation step). Obtain the expected value of the complete data log-likelihood with respect to the unknown data
given the observed data and the current parameter estimates.
M-step (maximisation step). Choose (i+1) to be a value which maximises the above expectation.
In our case, let Yobs =(y1 , . . . , ym1 −1 , ym1 +1 , . . . , ymf −1 , ymf +1 , . . . , yn ) be the observed data vector, where the observations at times
m1 , . . . , mf are missing and Ycom = (y1 , y2 , . . . , ym1 −1 , ym1 , ym1 +1 , . . . , yn ) is the complete data vector. Let us assume that the vector
Ycom is the outcome of Holt's model with additive errors, so Ycom = A + L.
(0)
As with any iterative algorithm, the EM algorithm needs a starting point, that is (0) = ((0) , , (2 )(0) ). To obtain the starting
point we propose analysing the observed data in the time series as if there were no missing data between two consecutive
observations, and calculating the maximum likelihood estimates of the model parameters by using Eqs. (7)–(9). The output of
the first iteration (1) is a new estimate of the model parameters used to begin the next iteration, and so on till the stopping rule
is achieved.
(i)
Let (i) = ((i) , , (2 )(i) ) be the estimate obtained in the ith iteration. The next iteration is given by the following two steps:
E-step. The log-likelihood function for the complete data is given by Eq. (5), and its expectation is

n 1
Q(, , 2 | (i) ) = − ln 2 − E[(Ycom − A) (LL )−1 (Ycom − A)]
2 2 2

where E stands for the expectation operator with respect to the distribution of Ycom given Yobs and = (i) . After some simple
algebra, the above quadratic form becomes

(Ycom − A) (LL )−1 (Ycom − A) = (Ycom − E(Ycom )) (LL )−1 (Ycom − E(Ycom ))

+ 2(Ycom − E(Ycom )) (LL )−1 (E(Ycom ) − A)

+ (E(Ycom ) − A) (LL )−1 (E(Ycom ) − A)

Moreover, if ◦ stands for the Hadamard product (or entrywise product)

(Ycom − E(Ycom )) (LL )−1 (Ycom − E(Ycom ))

= tr[(Ycom − E(Ycom )) (LL )−1 (Ycom − E(Ycom ))]

= tr[(LL )−1 (Ycom − E(Ycom ))(Ycom − E(Ycom )) ]

= 1 [(LL )−1 ◦ (Ycom − E(Ycom ))(Ycom − E(Ycom )) ]1

and using the linearity of the expectation operator

n 1 1
Q(, , 2 | (i) ) = − ln(2 ) − 1 [(LL )−1 ◦ V(Ycom )]1 − (E(Ycom ) − A) (LL )−1 (E(Ycom ) − A) (11)
2 2 2 22

In order to compute E(Ycom ) and V(Ycom ), let YR be the complete data vector reordered so that the missing data {ym1 , . . . , ymf }
are in the last positions, YR = (y1 , . . . , ym1 −1 , ym1 +1 , . . . , yn , ym1 , . . . , ymf ) , and let AR and LR be the matrices A and L reordered in a
similar way. Let YR.1 and YR.2 be the subvectors of YR corresponding to the observed and missing data, respectively. Partitioning
matrices AR and LR in a similar way, Eq. (4) becomes

YR.1 AR.1 LR.11 LR.12 R.1
YR = = +
YR.2 AR.2 LR.21 LR.22 R.2
As the joint distribution of YR is multivariate normal, the conditional distribution of YR.2 given YR.1 is also multivariate normal
with mean and covariance matrix given by (see for example Seber, 1984, p. 19)

E(YR.2 |YR.1 ) = AR.2 +

21
−1
11 (YR.1 − AR.1 )

V(YR.2 |YR.1 ) =
22 −
21
−1
11
12

where

11
12 LR.11 LR.11 + LR.12 LR.12 LR.11 LR.21 + LR.12 LR.22
V(YR ) = = 2

21
22 LR.21 LR.11 + LR.22 LR.12 LR.21 LR.21 + LR.22 LR.22

Therefore, E(Ycom ) = (y1 , . . . , ymj −1 , E(ymj |Yobs , (i) ), ymj +1 , . . . , yn ) where E(ymj |Yobs , (i) ) is the j-th component of vector
E(YR.2 |YR.1 ) computed at = (i) . In a similar way, the component (ml , mj ) of matrix V(Ycom ) is equal to the (l, j) compo-
nent of matrix V(YR.2 |YR.1 ) computed at = (i) , for all l, j = 1, . . . , f , all other components being zero, which implies that
1 [(LL )−1 ◦ V(Ycom )]1 = 1 [(LL )−1
R.22 ◦ V(YR.2 |YR.1 )]1.
J.D. Bermúdez et al. / Journal of Statistical Planning and Inference 139 (2009) 2791 -- 2799 2795

Moreover, the quadratic form in the last term of Eq. (11) can be decomposed as in expression (5), X = L−1 A and PX being its
orthogonal projection matrix, so

n 1
Q( | (i) ) = − ln(2 ) − 1 [(LL )−1
R.22 ◦ V(YR.2 |YR.1 )]1
2 2 2
1 1
− ˜ − ) X X(
( ˜ − ) − (L−1 E(Ycom )) (I − PX )(L−1 E(Ycom )) (12)
22 2 2

M-step. Maximise the function Q( | (i) ) obtained from the previous E-step over = (, , 2 ) .
The quadratic form corresponding to in Eq. (12) can always be annulled, so the problem can be reduced to minimise the
expression

1 [(LL )−1 −1 −1
R.22 ◦ V(YR.2 |YR.1 )]1 + (L E(Ycom )) (I − PX )(L E(Ycom )) (13)

(i+1) (i+1)
Once is obtained, let L̂ be the matrix L computed at and X̂ = L̂−1 A. The new estimate of , (i+1) , is then given by

(i+1) = (X̂ X̂)−1 X̂ L̂−1 E(Ycom ) (14)

and the new estimate of 2 is, as before, the mean squared fitting errors

1 1 −1
(2 )(i+1) = 1 [(L̂L̂ )−1 −1
R.22 ◦ V̂(YR.2 |YR.1 )]1 + (L̂ E(Ycom )) (I − PX̂ )L̂ E(Ycom ) (15)
n n

The algorithm ends when the stopping rule has been reached. We propose the following stopping rule, based on a relative
comparison of mean squared fitting errors in two consecutive iterations:

|(2 )(i) − (2 )(i+1) |

< = 0.01
(2 )(i)

Notice the similarity between the optimisation problem (13) and Eqs. (14) and (15) with those in Section 2, when no missing data
are present, expression (7) and Eqs. (8) and (9), respectively. The main difference is that in problem (13) a new addend appears,
containing the covariance matrix of the missing data.

3.2. On the convergence of our proposed EM algorithm

Let Ll( ) be the log-likelihood function of the observed data Yobs . The purpose of the EM algorithm is to maximise Ll( ) over
, the parametric space, but by an iterative procedure where the function to maximise in each iteration is Q( | ), where is a
fixed known constant which is different in each iteration. It is well-known (Dempster et al., 1977) that for a bounded sequence
∗ ∗
Ll( p ) from the EM algorithm, Ll( p ) converges monotonically to some Ll . It is not evident whether Ll is a global maximum of
Ll( ) over . Wu (1983) proves some interesting convergence results of EM sequences that could be of interest in our application.
Specifically, Theorem 2 of Wu (1983) states that all the limit points of any instance { p } of an EM algorithm are stationary points
∗
of Ll( ) and Ll( p ) converges monotonically to Ll = Ll( ∗ ) for some stationary point ∗ if the following conditions are satisfied:

(i) is a subset in the r-dimensional Euclidean space Rr ,

(ii) 0 = { ∈ : Ll( ) Ll( 0 )} is compact for any Ll( 0 ) > − ∞,
(iii) Ll( ) is continuous in and differentiable in the interior of ,
(iv) Q( | ) is continuous in both and .

In our proposal, all those conditions are easily proved if the smoothing parameter vector is assumed to be known, and then
matrix L is a known constant matrix and the model is equivalent to a linear homoscedastic normal model. In the general case,
matrix L is a function of the smoothing parameters (recall that L is the n × n lower triangular matrix whose elements in the main
diagonal are equal to 1 and li,j = (1+(i−j)) if i > j), each component of L is continuous and differentiable with respect to =(, ) .
The components of L−1 are bivariate polynomial functions of (, ), therefore, they are also continuous and differentiable, and
likewise for the components of X = L−1 A and PX . Hence function Q( | ), expression (12), is continuous and differentiable. Using
similar reasoning, the same properties can also be verified for function Ll( ), which is the log-likelihood of a multivariate normal
with covariance matrix given by a submatrix of LL . Finally, compactness follows from the continuity of Ll( ) and the bounded
restrictions on the smoothing parameters (, ), which are restricted to belonging to the unit square.
If the log-likelihood Ll( ) is unimodal in with ∗ being the only stationary point, then for any EM sequence { p }, p
converges to the unique maximiser ∗ of Ll( ). In general, however, the log-likelihood Ll( ) could have several maxima and
stationary points, and in such a case the convergence of the EM sequence to either type of point will depend on the choice of
starting points. A multi-start strategy, usual for any other general optimisation algorithm, is also advisable for the EM algorithm.
2796 J.D. Bermúdez et al. / Journal of Statistical Planning and Inference 139 (2009) 2791 -- 2799

5000

4000

3000

2000

1000

0 20 40 60

Fig. 1. Time plot together with the predictions obtained for the time series number 160 of the `other' series of the M3 Competition.

Table 1
Prediction error for eight steps-ahead for the time series number 160 with and without missing values.

RMSE MAD MAPE SMAPE

Complete data 44.38 31.22 1.73 1.70

Missing data 42.57 33.11 1.86 1.86

Table 2
Average SMAPE for the 174 `other' series.

Method Forecasting horizon Average

1 2 3 4 5 6 8 1–4 1–6 1–8

Holt 1.9 2.9 3.9 4.7 5.8 5.6 7.2 3.32 4.13 4.81
Dampen Holt 1.8 2.7 3.9 4.7 5.8 5.4 6.6 3.28 4.06 4.61
Complete data 1.9 2.4 2.9 3.4 3.9 4.1 4.9 2.68 3.13 3.51
Missing data 1.9 2.4 2.9 3.4 3.9 4.1 4.9 2.67 3.12 3.51

3.3. An example using series from the M3 Competition

With the aim of checking the performance of our approach to the prediction of time series with missing data, we consider
the òther' series from the M3 Competition (Makridakis et al., 2000). The M3 Competition, the last of the M-Competitions, was
an empirical study which compared the performance of 24 forecasting methods proposed by experts. The competition was
composed of 3003 real time series, mostly economic and business ones, classified as yearly series (645 series), quarterly series
(756 series), monthly series (1428 series) and òther' series (174 series). The series are all strictly positive and their length is quite
short. Specifically, the median length for the 645 yearly series is 19 observations, with a minimum of 14 observations. Those
values are 44 and 16 observations for quarterly series, 115 and 48 observations for monthly series and 63 and 60 observations
for òther' series. The series are available on the web page of the International Institute of Forecasters.
First we present the results obtained in the prediction of the time series number 160 of the òther' series. The length of the
time series is 63 observations and we suppose that the observations at time 48, 53, 59 and 61 are missing. We consider values
from the end of the time series as missing observations since those values are the most significant ones in the prediction of
future values. Fig. 1 shows the time plot of this time series together with the point forecasts for the next eight steps. The points
emphasised correspond to the four missing observations.
The results in Table 1 refer to the prediction error obtained using the missing-values approach previously presented, together
with the prediction error obtained when the complete time series was considered and the results from Section 2 were applied.
Using the complete data series the prediction fit is slightly better than if missing data are present, but not as much as could be
expected.
Table 2 shows the average prediction error for the 174 òther' time series of the M3 Competition when we assume that the time
series present missing data, and we apply the EM procedure introduced here. We assume as missing values those corresponding
J.D. Bermúdez et al. / Journal of Statistical Planning and Inference 139 (2009) 2791 -- 2799 2797

with the positions n − 15, n − 10, n − 4 and n − 2, where n is the length of the time series. We compare these results with those
obtained using the complete data series from both the approach introduced in Section 2 and the exponential smoothing methods
that took part in the M3 Competition called Holt (an automatic Holt linear exponential smoothing method), and Dampen Holt (a
Dampen trend exponential smoothing).
To measure the fitting and forecast accuracy we use the symmetric mean absolute percentage error, SMAPE, defined as

n

1 |yt − ŷt |
SMAPE = ∗ 200
k yt + ŷt
t=n−k+1

where yt is the observation at time t and ŷt is its forecast. We choose the SMAPE because it is scale independent, symmetric and
bounded: it fluctuates between −200% and 200%. It is also the measure of accuracy used in the M3 Competition.
Table 2 shows that Holt and Dampen Holt methods, whose results are communicated in Makridakis and Hibon (2000), have a
worse forecast accuracy than the methods introduced in this paper. This is mainly due to the fact that here we used a complete
maximum likelihood estimation procedure: both the initial conditions and the parameters of the model are introduced as decision
variables when maximising the likelihood function.

4. Data transformation selection problem

The Holt model (4) assumes that the errors are homoscedastic, but on some occasions it is more suitable to assume that the
error variance depends on the level, that is, V(∗t ) = (t )2 , so that variance increases with level. In those cases, the observation
at time t comes from the random variable

∗t ∗
Yt = t + ∗t = t + t = t 1 + t = t (1 + t ) (16)
t t

with t = ∗t / t being an error with variance 2 , for t = 1, 2, . . . , n. This model is known as Holt's model with multiplicative errors.
Taking logarithms in the previous equation, we obtain

ln Yt = ln t + ln(1 + t ) = ln t + et

where et = ln(1 + t ) is a random variable that can be assumed to follow a normal distribution with zero mean.
Holt's model with additive errors is then adequate, if not for the raw data but for when we work with the logarithm of the
given data. It may, therefore, be suitable to forecast using the transformed time series and then obtain the original time series
forecasts by applying the inverse transformation. Other transformations, not only the logarithmic one, could also be of interest.
When it is advisable to work with data transformations it is necessary to have a mechanism which allows us to decide what the
best transformation is. Until now, the mechanism that has generally been used is to select the transformation with the minimum
fitting error from the time series data. This choice, however, guarantees the best fitting but not necessarily the best prediction.
Alternatively, a cross-validation study could be of interest.
The usual way of doing cross-validation is to select a proper subset of data as the training set, using it to estimate the
parameters of the model, and cross-validate the results using the remaining data. When the data set only contains a few cases,
a different method of cross-validation is necessary. The `leave-one-out' method leaves out only one data point from the training
set, uses it to cross-validate, and repeats the procedure several times with a different validation data point each time.
Most of the time series to be forecast in applications, especially in industrial applications, are very short so the training
set has to be almost the complete time series. For this reason, here we proposed a leaving-one-out procedure to select data
transformation in time series analysis. The training set of such a procedure consists of a time series with a missing value, and
the EM algorithm proposed in the above section could be applied to analysing it. An empirical study using the yearly data series
from the M3 Competition shows the performance of our transformation selection procedure. All those 645 yearly time series are
very short, their median length being 19 observations.
Let n be the length of the time series and let k be an integer such that 0 < k n. Our data transformation selection procedure
could be described as follows:
For each transformed time series:

For each j in n − k + 1 j n repeat:

(1) Let Y−j be the time series whose j-th element is a missing value. Using Y−j as the training set, calculate the maximum
likelihood estimators of smoothing parameters and initial conditions, ˆ −j .
(2) Estimate the missing observation as ŷj = E(yj |Y−j ,
ˆ −j ). Obtain the fitting error ej = yj − ŷj .
Compute the SMAPE measure of fit from those k fitting errors.

Select the transformation with the smallest SMAPE.

2798 J.D. Bermúdez et al. / Journal of Statistical Planning and Inference 139 (2009) 2791 -- 2799

Table 3
Average SMAPE for the 645 yearly series.

Method Forecasting horizon Average

1 2 3 4 5 6 1–4 1–6

RBF 8.2 12.1 16.4 18.3 20.8 22.7 13.75 16.42

ForecastX 8.6 12.4 16.1 18.2 21 22.7 13.80 16.48
Autobox2 8 12.2 16.2 18.2 21.2 23.3 13.65 16.52
Theta 8 12.2 16.7 19.2 21.7 23.6 14.02 16.90
Robust-Trend 7.6 11.8 16.6 19 22.1 23.5 13.75 16.78
Dampen Holt 8 12.4 17 19.3 22.3 24 14.19 17.18
DT Holt 8.9 10.9 13.2 14.9 16.7 18.2 11.99 13.81

The value of k is usually set equal to n in leave-one-out procedures, although smaller values are also common in order to speed
up the procedure. In this application, as the first data in a time series are not very informative in forecasting, we propose to use
only the later ones in our leave-one-out procedure.
The row `DT Holt' in Table 3 shows the forecasting accuracy obtained by our data transformation selection mechanism in the
prediction of the 645 yearly time series in the M3 Competition. We consider the raw data, the logarithm transformation, the
square root and the square of the data and set k = 10, although we have obtained very similar results with values of k from 8
to 14, the length of the shortest yearly series in the M3 Competition. Our results are compared with those obtained from the
methods with best forecasting accuracy in the M3 Competition: RBF, ForecastX, Autobox2, Theta and Robust-Trend. Table 3 also
shows the results from the Dampen Holt method, the best exponential smoothing method in the M3 Competition.
RBF is a rule-based forecasting procedure that uses three methods (random walk, linear regression and Holt's) to estimate
level and trend, involving corrections, simplification, automatic feature identification and re-calibration. ForecastX and Autobox2
are commercially available forecasting packages. ForecastX runs tests for seasonality and outliers and selects from among sev-
eral methods: exponential smoothing, Box–Jenkins and Croston's method. Autobox2 is a robust ARIMA univariate Box–Jenkins
with/without intervention detection. The Theta method is based on a specific decomposition technique, projection and combi-
nation of the individual components. It has been proved that the method is a special case of single exponential smoothing with
drift where the drift parameter is half the slope of the linear trend fitted to the data. Robust-Trend is a non-parametric version
of Holt's linear model with a median-based estimate of trend. It is worth pointing out here that all the competitors in the M3
Competition had complete freedom to manipulate the data looking for transformations, outliers, etc.
Note that, except for the one-year forecasting horizon, the average of the SMAPE prediction errors that we obtain is smaller
than that obtained by the other methods. Therefore, the data transformation selection procedure proposed in this paper, which
is easy to implement, obtains better results than the usual methods.

5. Concluding remarks

With our formulation for Holt's model with additive errors, we can obtain the maximum likelihood estimators of the smoothing
parameters and the initial conditions jointly. We only need to solve one optimisation problem with respect to the smoothing
parameters, and then the maximum likelihood estimators of the other parameters are found analytically. Point forecasts are then
easily computed by plugging the obtained maximum likelihood estimations of the parameters into the expected-value forecasts.
This formulation also allows us to solve the problem of missing data, using the EM algorithm to obtain the maximum likelihood
estimates of the unknowns in the model, taking into account the uncertainty originated by the missing data.
The numerical results obtained in the prediction of the yearly time series in the M3 Competition show that the algorithm
proposed in this paper performs well. Except for the one-year forecasting horizon, the average SMAPE prediction error obtained
with our algorithm is smaller than the average given by the other methods. The data transformation selection procedure obtains
better results than the usual methods, is easily implemented and its computational cost is not too high, therefore, its use is
recommended.

Acknowledgements

We would like to acknowledge Grant no. MTM2008-03993 from the Ministerio de Ciencia e Innovación of Spain. Ana Corberán-
Vallet's research was supported by the Generalitat Valenciana, Grant CTBPRB/2005/006. We are also indebted to an anonymous
referee for all their helpful comments, which have improved our paper.

References

Bermúdez, J.D., Segura, J.V., Vercher, E., 2006a. Improving demand forecasting accuracy using non-linear programming software. Journal of the Operational
Research Society 57, 94–100.
Bermúdez, J.D., Segura, J.V., Vercher, E., 2006b. A decision support system methodology for forecasting of time series based on soft computing. Computational
Statistics and Data Analysis 51, 177–191.
J.D. Bermúdez et al. / Journal of Statistical Planning and Inference 139 (2009) 2791 -- 2799 2799

Bermúdez, J.D., Segura, J.V., Vercher, E., 2007. Holt–Winters forecasting: an alternative formulation applied to UK air passenger data. Journal of Applied Statistics
34, 1075–1090.
Bermúdez, J.D., Segura, J.V., Vercher, E., 2008. SIOPRED: a prediction and optimisation integrated system for demand. TOP 16, 258–271.
Byrd, R.H., Lu, P., Nocedal, J., Zhu, C., 1995. A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing 16,
1190–1208.
Chen, R., Liu, J.S., 1998. Sequential Monte Carlo methods for dynamic systems. Journal of the American Statistical Association 93, 1032–1044.
Cipra, T., Rubio, A., Trujillo, J., 1995. Holt–Winters method with missing observations. Management Science 41, 174–178.
Cipra, T., Romera, R., 1997. Kalman filter with outliers and missing observations. Test 6, 379–395.
Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood for incomplete data via the EM algorithm. Journal of the Royal Statistical Society B-39, 1–38.
Gardner Jr., E.S., 2006. Exponential smoothing: the state of the art-Part II. International Journal of Forecasting 22, 637–666.
Gómez, V., Maravall, A., 1994. Estimation, prediction, and interpolation for nonstationary series with the Kalman filter. Journal of the American Statistical
Association 89, 611–624.
Gómez, V., Maravall, A., Peña, D., 1999. Missing observations in ARIMA models: skipping approach versus additive outlier approach. Journal of Econometrics 88,
341–363.
Harvey, A.C., 1989. Forecasting, Structural time Series Models and the Kalman Filter. Cambridge University Press, Cambridge.
Hofmann, R., Tresp, V., 1998. Nonlinear time-series prediction with missing and noisy data. Neural Computation 10, 731–747.
Jones, R.H., 1980. Maximum likelihood fitting of ARMA models to time series with missing observations. Technometrics 22, 389–395.
Kohn, R., Ansley, C.F., 1986. Estimation, prediction, and interpolation for ARIMA models with missing data. Journal of the American Statistical Association 81,
751–761.
Makridakis, S., Hibon, M., 2000. The M3-competition: results, conclusions and implications. International Journal of Forecasting 16, 451–476.
Makridakis, S., Ord, K., Hibon, M., 2000. The M3-competition. International Journal of Forecasting 16, 433–436.
Ord, J.K., Koehler, A.B., Snyder, R.D., 1997. Estimation and prediction for a class of dynamic nonlinear statistical models. Journal of the American Statistical
Association 92, 1621–1629.
R Development Core Team., 2008. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
Seber, G.A.F., 1984. Multivariate Observations. Wiley, New York.
Segura, J.V., Vercher, E., 2001. A spreadsheet modeling approach to the Holt–Winters optimal forecasting. European Journal of Operational Research 131,
375–388.
Wright, D.J., 1986. Forecasting data published at irregular time intervals using extension of Holt's method. Management Science 32, 499–510.
Wu, C.F.J., 1983. On the convergence properties of the EM algorithm. The Annals of Statistics 11, 95–103.

Regression Analysis Assignment
100% (1)
Regression Analysis Assignment
8 pages
Research Methodology and Medical Statistics-Book Preview
29% (7)
Research Methodology and Medical Statistics-Book Preview
18 pages
Basic Forecasting Methods
No ratings yet
Basic Forecasting Methods
144 pages
Holt's & Winter's Method
No ratings yet
Holt's & Winter's Method
30 pages
Exponential Smoothing - The State of The Art
No ratings yet
Exponential Smoothing - The State of The Art
28 pages
From Fourier To Koopman Spectral Methods For Long-Term Prediction
No ratings yet
From Fourier To Koopman Spectral Methods For Long-Term Prediction
38 pages
W8 Lectures
No ratings yet
W8 Lectures
56 pages
ARDL Eviews9 David Giles
100% (1)
ARDL Eviews9 David Giles
35 pages
1985 Forecasting Trends
No ratings yet
1985 Forecasting Trends
11 pages
Tutorial 3 Solution
No ratings yet
Tutorial 3 Solution
8 pages
Djakaria 2019
No ratings yet
Djakaria 2019
9 pages
Arnond Class 3
No ratings yet
Arnond Class 3
16 pages
Process Identification With Autoregressive Linear Regression Method Using Experimental Data: Review
No ratings yet
Process Identification With Autoregressive Linear Regression Method Using Experimental Data: Review
7 pages
2019, Oliveira - Bidirectional - Mean - Distance - Estimation - A - New - Gap - Filling - Method
No ratings yet
2019, Oliveira - Bidirectional - Mean - Distance - Estimation - A - New - Gap - Filling - Method
6 pages
Time Series: International University - Vnu HCMC
No ratings yet
Time Series: International University - Vnu HCMC
35 pages
A Paradigm For Any Data
No ratings yet
A Paradigm For Any Data
37 pages
Introduction To (Demand) Forecasting
No ratings yet
Introduction To (Demand) Forecasting
39 pages
Bagging Exponential Smoothing Methods Using STL Decomposition and Box-Cox Transformation
No ratings yet
Bagging Exponential Smoothing Methods Using STL Decomposition and Box-Cox Transformation
10 pages
Time Series Using Exponential Smoothing Cells
No ratings yet
Time Series Using Exponential Smoothing Cells
9 pages
Forecasting Models - PPT
No ratings yet
Forecasting Models - PPT
57 pages
2024 Division Statistics Month Celebration Statistics Quiz Answers
No ratings yet
2024 Division Statistics Month Celebration Statistics Quiz Answers
7 pages
Lectures Week14 s25
No ratings yet
Lectures Week14 s25
21 pages
LN LinearTSModels 3
No ratings yet
LN LinearTSModels 3
15 pages
D-H Marara Research Report
No ratings yet
D-H Marara Research Report
103 pages
Forecasting: 1. Optimal Forecast Criterion - Minimum Mean Square Error Forecast
No ratings yet
Forecasting: 1. Optimal Forecast Criterion - Minimum Mean Square Error Forecast
9 pages
EV92
No ratings yet
EV92
1,099 pages
Statistical Forecasting Models
100% (1)
Statistical Forecasting Models
37 pages
Time Series Forecasting Using Exponential Smoothing
No ratings yet
Time Series Forecasting Using Exponential Smoothing
18 pages
ARDL Models-Bounds Testing
88% (8)
ARDL Models-Bounds Testing
17 pages
Unit 3
No ratings yet
Unit 3
87 pages
Statistical Methods For Forecasting
No ratings yet
Statistical Methods For Forecasting
8 pages
3 Mavzu
No ratings yet
3 Mavzu
87 pages
LN LinearTSModels 4
No ratings yet
LN LinearTSModels 4
22 pages
Spreadsheet Modeling & Decision Analysis: A Practical Introduction To Management Science
No ratings yet
Spreadsheet Modeling & Decision Analysis: A Practical Introduction To Management Science
59 pages
Group - Case 1 Assignment Marketing Conundrum
No ratings yet
Group - Case 1 Assignment Marketing Conundrum
7 pages
SDA 3E Chapter 7
No ratings yet
SDA 3E Chapter 7
43 pages
Lead Scoring Group Case Study Presentation
100% (2)
Lead Scoring Group Case Study Presentation
19 pages
JSS2008
No ratings yet
JSS2008
23 pages
Time Series Analysis Lecture 8-3
No ratings yet
Time Series Analysis Lecture 8-3
12 pages
Predicting ARMA Processes: T T 2 T T
No ratings yet
Predicting ARMA Processes: T T 2 T T
8 pages
Hevia ARMA Estimation
No ratings yet
Hevia ARMA Estimation
6 pages
An Adaptive Hybrid Algorithm For Time Series Prediction in Healthcare
No ratings yet
An Adaptive Hybrid Algorithm For Time Series Prediction in Healthcare
6 pages
Extreme Learning Machine For Missing Data Using Multiple Imputations
No ratings yet
Extreme Learning Machine For Missing Data Using Multiple Imputations
18 pages
L6 - Kalman Filter
No ratings yet
L6 - Kalman Filter
15 pages
Journal of Statistical Software: Automatic Time Series Forecasting: The Forecast Package For R
No ratings yet
Journal of Statistical Software: Automatic Time Series Forecasting: The Forecast Package For R
23 pages
Lecture3 1
No ratings yet
Lecture3 1
48 pages
Latin Square Design
No ratings yet
Latin Square Design
10 pages
QBUS2820 Mid-Semester 2015s2 (Solution)
No ratings yet
QBUS2820 Mid-Semester 2015s2 (Solution)
7 pages
Stata Lab4 2023
No ratings yet
Stata Lab4 2023
36 pages
Product Demand Estimation Forecasting Capacity OT Prod RT Prod Out Sourcing Production Plan
No ratings yet
Product Demand Estimation Forecasting Capacity OT Prod RT Prod Out Sourcing Production Plan
8 pages
Griya Report-Con S Eom 1704.Dwg Jalan
No ratings yet
Griya Report-Con S Eom 1704.Dwg Jalan
11 pages
Forecast2 PDF
No ratings yet
Forecast2 PDF
22 pages
Simple Exponential Smoothing
No ratings yet
Simple Exponential Smoothing
32 pages
Thedynamics2modelardls PDF
No ratings yet
Thedynamics2modelardls PDF
25 pages
Unit 3 B Time Series Analysis
No ratings yet
Unit 3 B Time Series Analysis
50 pages
Mathematics
No ratings yet
Mathematics
20 pages
Exponential Smoothing Methods
No ratings yet
Exponential Smoothing Methods
74 pages
Malamud y Turcotte
No ratings yet
Malamud y Turcotte
24 pages
Arima
No ratings yet
Arima
8 pages
FORECASTING
No ratings yet
FORECASTING
28 pages
Quantitative Techniques For Business Decisions
0% (1)
Quantitative Techniques For Business Decisions
8 pages
Parametric Test
No ratings yet
Parametric Test
2 pages
The Effects of School-Based Management in The Philippines: Policy Research Working Paper 5248
No ratings yet
The Effects of School-Based Management in The Philippines: Policy Research Working Paper 5248
29 pages
Chapter Three
No ratings yet
Chapter Three
35 pages
Fractalyse
No ratings yet
Fractalyse
13 pages
Douka, M., 2018, Statistical Analyses of Extreme Rainfall Events in Thessaloniki, Greece
No ratings yet
Douka, M., 2018, Statistical Analyses of Extreme Rainfall Events in Thessaloniki, Greece
18 pages
Test Bank Statistics
No ratings yet
Test Bank Statistics
9 pages
Class Notes
No ratings yet
Class Notes
6 pages
IE3265 Forecasting
No ratings yet
IE3265 Forecasting
61 pages
Xu, Y. 2013 Reconstructionofthelandsurfacetemperaturetimeseries
No ratings yet
Xu, Y. 2013 Reconstructionofthelandsurfacetemperaturetimeseries
7 pages
In-House Validation of Analytical Methods: Procedures & Calculation Sheets
No ratings yet
In-House Validation of Analytical Methods: Procedures & Calculation Sheets
21 pages
Spatial Variability of The Hurst Exponent For The Daily Scale Rainfall Series in The State of Zacatecas, Mexico
No ratings yet
Spatial Variability of The Hurst Exponent For The Daily Scale Rainfall Series in The State of Zacatecas, Mexico
10 pages
Time Series Part 2
No ratings yet
Time Series Part 2
16 pages
Boccolari, M., 2013, Changes in Temperature and Precipitation Extremes Observed in Modena, Italy
No ratings yet
Boccolari, M., 2013, Changes in Temperature and Precipitation Extremes Observed in Modena, Italy
16 pages
Statistical Machine Learning
No ratings yet
Statistical Machine Learning
28 pages
Fung, D.-2006-Methods For The Estimation of Missing Values in Time Series
No ratings yet
Fung, D.-2006-Methods For The Estimation of Missing Values in Time Series
205 pages
Sevruk, B.,2009, The WMO Precipitation Measurement Intercomparisons
No ratings yet
Sevruk, B.,2009, The WMO Precipitation Measurement Intercomparisons
5 pages
Research Chapter 3
100% (1)
Research Chapter 3
9 pages
Woo Joo Tae-2015-Time Series Forecasting Based On Wavelet Filtering
No ratings yet
Woo Joo Tae-2015-Time Series Forecasting Based On Wavelet Filtering
7 pages
Geophysics Research Laboratory, University of Tokyo, Tokyo 113, Japan
No ratings yet
Geophysics Research Laboratory, University of Tokyo, Tokyo 113, Japan
7 pages
Chapter 11-Inferences About Population Variances
No ratings yet
Chapter 11-Inferences About Population Variances
14 pages
Let,, ., Be A Random Sample From ( ,) - Construct A (1) Confidence Interval For
No ratings yet
Let,, ., Be A Random Sample From ( ,) - Construct A (1) Confidence Interval For
5 pages
Bonat 2018
No ratings yet
Bonat 2018
30 pages
Essay On Different Types of Sampling Used in Nepal
No ratings yet
Essay On Different Types of Sampling Used in Nepal
4 pages
Dynamics of Atmospheres and Oceans: Jae-Won Choi, Yumi Cha, Hae-Dong Kim
No ratings yet
Dynamics of Atmospheres and Oceans: Jae-Won Choi, Yumi Cha, Hae-Dong Kim
15 pages
Managerial Economics Project
No ratings yet
Managerial Economics Project
9 pages
Vyshkvarkova, 2018, Spatial Distribution of The Daily Precipitation Concentration Index in Southern Russia
No ratings yet
Vyshkvarkova, 2018, Spatial Distribution of The Daily Precipitation Concentration Index in Southern Russia
8 pages
Beecham, Et - Al-2010-International - Journal - of - Climatology PDF
No ratings yet
Beecham, Et - Al-2010-International - Journal - of - Climatology PDF
16 pages
Consecutive Differences As A Method of Signal Fractal Analysis
No ratings yet
Consecutive Differences As A Method of Signal Fractal Analysis
10 pages
Jappclim94-Modeling Mountain Precip-Daly
No ratings yet
Jappclim94-Modeling Mountain Precip-Daly
20 pages
Hendricks Franssen, H.J.-2009-The Impact of Climate Change On Groundwater Resources
No ratings yet
Hendricks Franssen, H.J.-2009-The Impact of Climate Change On Groundwater Resources
16 pages
Moritz, S. - 2010-imputeTS Time Series Missing Value
No ratings yet
Moritz, S. - 2010-imputeTS Time Series Missing Value
12 pages
07 - Principles of Visualizing Data
No ratings yet
07 - Principles of Visualizing Data
41 pages
Anish Bhandari.22015997.FE5056, Problem Solving Methods and Analysis
No ratings yet
Anish Bhandari.22015997.FE5056, Problem Solving Methods and Analysis
20 pages
Baez-Villanuev, O., 2017, Temporal and Spatial Evaluation of Satellite Rainfall Estimates Over Different Regions in Latin
No ratings yet
Baez-Villanuev, O., 2017, Temporal and Spatial Evaluation of Satellite Rainfall Estimates Over Different Regions in Latin
48 pages
PR Ekonometrika
No ratings yet
PR Ekonometrika
8 pages
Notes On ARIMA Modelling: Brian Borchers November 22, 2002
No ratings yet
Notes On ARIMA Modelling: Brian Borchers November 22, 2002
19 pages
Classification and Prediction-Module4
No ratings yet
Classification and Prediction-Module4
26 pages
3 Hypothesis Testing - Part 1
No ratings yet
3 Hypothesis Testing - Part 1
40 pages
Angelo Doldolea Chapter 8 PCK1
No ratings yet
Angelo Doldolea Chapter 8 PCK1
2 pages
Based: Universitat Augsburg West Germany
No ratings yet
Based: Universitat Augsburg West Germany
1 page
Statistik Era Ayu Wandira
No ratings yet
Statistik Era Ayu Wandira
3 pages
ML Labmanual
No ratings yet
ML Labmanual
33 pages
1 2 5 Confidence Intervals
No ratings yet
1 2 5 Confidence Intervals
19 pages
E6 MSA-GRR Blank Updated Proposal
No ratings yet
E6 MSA-GRR Blank Updated Proposal
3 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Co-Clustering: Models, Algorithms and Applications
From Everand
Co-Clustering: Models, Algorithms and Applications
Gérard Govaert
No ratings yet
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)

Forecasting Time Series With Missing Data Using Holt's Model

Uploaded by

Forecasting Time Series With Missing Data Using Holt's Model

Uploaded by

Journal of Statistical Planning and Inference 139 (2009) 2791 -- 2799

Contents lists available at ScienceDirect

Journal of Statistical Planning and Inference

Forecasting time series with missing data using Holt's model

2. Holt's model with additive errors

bt = bt−1 + t (3)

Y2 = a0 + 2b0 + (1 + )1 + 2

Y3 = a0 + 3b0 + (1 + 2)1 + (1 + )2 + 3

The data vector can then be described through the equation

min (L−1 Y) (I − PX )L−1 Y (7)

ˆ = (X̂ X̂)−1 X̂ L̂−1 Y

and finally, the maximum likelihood estimator of 2 is

2.1 = A2  + L21 L−1 −1 −1

and point forecasts can be obtained as follows:

ˆ 2.1 = (A2 − L̂21 L̂−1 −1

Notice that L−1

3. Holt's model with missing data

3.1. Implementation of the EM algorithm

+ 2(Ycom − E(Ycom )) (LL )−1 (E(Ycom ) − A)

+ (E(Ycom ) − A) (LL )−1 (E(Ycom ) − A)

Moreover, if ◦ stands for the Hadamard product (or entrywise product)

(Ycom − E(Ycom )) (LL )−1 (Ycom − E(Ycom ))

= tr[(Ycom − E(Ycom )) (LL )−1 (Ycom − E(Ycom ))]

= tr[(LL )−1 (Ycom − E(Ycom ))(Ycom − E(Ycom )) ]

= 1 [(LL )−1 ◦ (Ycom − E(Ycom ))(Ycom − E(Ycom )) ]1

and using the linearity of the expectation operator

E(YR.2 |YR.1 ) = AR.2  +

(i+1) = (X̂ X̂)−1 X̂ L̂−1 E(Ycom ) (14)

|(2 )(i) − (2 )(i+1) |

3.2. On the convergence of our proposed EM algorithm

(i) is a subset in the r-dimensional Euclidean space Rr ,

RMSE MAD MAPE SMAPE

Complete data 44.38 31.22 1.73 1.70

Method Forecasting horizon Average

1 2 3 4 5 6 8 1–4 1–6 1–8

3.3. An example using series from the M3 Competition

4. Data transformation selection problem

For each j in n − k + 1 j n repeat:

Select the transformation with the smallest SMAPE.

Method Forecasting horizon Average

RBF 8.2 12.1 16.4 18.3 20.8 22.7 13.75 16.42

You might also like

bt = bt−1 + t (3)

Y2 = a0 + 2b0 + (1 + )1 + 2

Y3 = a0 + 3b0 + (1 + 2)1 + (1 + )2 + 3

2.1 = A2 + L21 L−1 −1 −1

+ 2(Ycom − E(Ycom )) (LL )−1 (E(Ycom ) − A)

+ (E(Ycom ) − A) (LL )−1 (E(Ycom ) − A)

E(YR.2 |YR.1 ) = AR.2 +

(i+1) = (X̂ X̂)−1 X̂ L̂−1 E(Ycom ) (14)