0% found this document useful (0 votes)
97 views12 pages

Multi-Step-Ahead Prediction With Neural Networks

This document provides a review of existing approaches for using neural networks for multi-step-ahead time series prediction. It discusses two main modeling approaches: local approaches that build separate models for different segments of the time series, and global approaches that build a single complex model for the entire time series. The document outlines common neural network architectures used in global approaches, including MLPs, recurrent networks, and variants with feedback or finite impulse response connections. It also notes challenges that can arise from assumptions of local approaches when training data is limited.

Uploaded by

Dinibel Pérez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views12 pages

Multi-Step-Ahead Prediction With Neural Networks

This document provides a review of existing approaches for using neural networks for multi-step-ahead time series prediction. It discusses two main modeling approaches: local approaches that build separate models for different segments of the time series, and global approaches that build a single complex model for the entire time series. The document outlines common neural network architectures used in global approaches, including MLPs, recurrent networks, and variants with feedback or finite impulse response connections. It also notes challenges that can arise from assumptions of local approaches when training data is limited.

Uploaded by

Dinibel Pérez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

R. Boné, M. Crucianu, Multi-step-ahead prediction with neural networks: a review.

Publication
de l'équipe RFAI, 9èmes rencontres internationales « Approches Connexionnistes en Sciences
Économiques et en Gestion ». 21-22 novembre 2002, Boulogne sur Mer, France. pp. 97-106.

Multi-step-ahead Prediction with Neural Networks:


a Review

R. Boné, M. Crucianu

Laboratoire d'Informatique, Université de Tours


64 avenue Jean Portalis, 37200 Tours, FRANCE
[email protected], [email protected]

Abstract
We review existing approaches in using neural networks for solving multi-step-ahead
prediction problems. A few experiments allow us to further explore the relationship
between the ability to learn longer-range dependencies and performance in multi-step-
ahead prediction. We eventually focus on characteristics of various multi-step-ahead
prediction problems that encourage us to prefer one method over another.

Key words: time series prediction, neural networks, multi-step-ahead prediction,


long-range dependencies, comparison of methods

1 Introduction
While reliable multi-step-ahead (MS) time series prediction has many important
applications and is often the intended outcome, published literature usually considers
single-step-ahead (SS) prediction. The main reason for this is the increased difficulty of the
problems requiring MS prediction and the fact that the results obtained by simple
extensions of techniques developed for SS prediction are often disappointing. Moreover, if
many different techniques perform rather similarly on SS prediction problems, significant
differences show up when extensions of these techniques are employed on MS problems.
The purpose of this short review of existing work concerning the use of neural networks for
MS prediction is to investigate the relationship between modeling approaches and
prediction problems.
In the following section, we present the main existing approaches in using neural
networks for coping with MS prediction problems. Section 3 provides some additional
experimental results regarding an expected relation between the ability to learn longer-
range dependencies and performance in MS prediction. The discussion in section 4 is an
attempt to identify characteristics of various MS prediction problems that encourage us to
prefer one method over another.

1
R. Boné, M. Crucianu, Multi-step-ahead prediction with neural networks: a review. Publication
de l'équipe RFAI, 9èmes rencontres internationales « Approches Connexionnistes en Sciences
Économiques et en Gestion ». 21-22 novembre 2002, Boulogne sur Mer, France. pp. 97-106.

2 Modeling approaches
Before taking a closer look at the existing approaches we must introduce some notation.
Consider x(t ) , for 0 ≤ t ≤ TD , the time series data one can employ for building a model. In
most cases, the available data actually consists in samples of x(t ) obtained with a time
interval of τ . In multi-step-ahead prediction, given {x(t ), x(t − τ ),  , x(t − nτ ), } one is
looking for a good estimate x̂(t + hτ ) of x(t + hτ) , h being the number of steps ahead.
The most common approach in dealing with a prediction problem can be traced back to
[Yule, 1927] and consists in using a fixed number M of past values (a fixed-length time
window sliding over the time series) when building the prediction:
x(t ) = [x(t ), x(t − τ),  , x(t − (M − 1)τ)] (1)
xˆ (t + τ ) = f (x(t )) (2)
Most of the current work on single-step-ahead prediction relies on a result in [Takens,
1980] showing that under several assumptions (among which the absence of noise) it is
possible to obtain a perfect estimate of x(t + τ ) according to (2) if M ≥ 2d + 1 , where d is
the dimension of the stationary attractor generating the time series. In this approach, all the
memory of the past is preserved in the sliding time window.
An alternative solution is to introduce a memory of the past in the model itself and only
keep a time window of small length (usually M = 1 ). Time series prediction with recurrent
neural networks usually corresponds to such a solution. Memory of the past is maintained
in the internal state of the model, s(t ) , which evolves according to (for M = 1 ):
s(t + τ) = g (s(t ), x(t )) (3)
The estimate of x(t + τ ) is provided by
xˆ (t + τ ) = h(s(t )) (4)

2.1 Prediction in general: global versus local approaches


Regarding time series prediction in general, a first distinction should be made between
local and global modeling approaches.
Following equations (1) and (2), existing local approaches employ a time window of
adequate length and build a piecewise approximation for the function f . To elaborate the
model, a vector quantization technique performs a segmentation of the set of x vectors
encountered in the learning set [Walter et al., 1990]. Then, a simple – usually linear
autoregressive (AR) – local model is developed for every such segment.
Each segment must correspond to a specific behavior of the time series. In order to
avoid grouping together relatively similar vectors x(t ) and x(t ') that correspond to
significantly different desired predictions x̂(t + τ ) and xˆ (t '+τ ) , more recent methods learn
the quantization on the set of { [γx(t + τ ), x(t )]} vectors instead [Vesanto, 1997]. The
(difficult to select) γ parameter is very important because it controls the weight given to
the desired prediction in the quantization phase.

2
R. Boné, M. Crucianu, Multi-step-ahead prediction with neural networks: a review. Publication
de l'équipe RFAI, 9èmes rencontres internationales « Approches Connexionnistes en Sciences
Économiques et en Gestion ». 21-22 novembre 2002, Boulogne sur Mer, France. pp. 97-106.

To perform the prediction, the quantizer identifies the segment that is closest to the
input vector x(t ) by ignoring the first coordinate corresponding to γx(t + τ ) (which is now
unknown). The local model associated to this segment is selected and produces the
prediction x̂(t + τ ) .
Neural network models are often employed in the vector quantization phase. Self-
organizing feature maps (SOFMs) having a constrained topology were employed first
[Walter et al., 1990], also [Vesanto, 1997]. However, the predefined topology of SOFMs
may impose unnecessary constraints on the quantization process, so some work used
instead free topology networks such as neural-gas networks [Martinetz et al., 1993] or
dynamic cell structures [Chudy et Farkas, 1998]. The upper bound on the number of
different segments must be carefully chosen. Indeed, the number of segments determines
the number of training examples in each segment, which imposes an upper bound on the
number of parameters of the corresponding local model.
The fact that local methods were successful in many MS prediction problems [Vesanto,
1997], [Chudy et Farkas, 1998], [McNames et al., 1999], [McNames, 2000], [Gers et al.,
2001] shows that a robust quantization of the set of x vectors can often be performed and
that even simple AR local models can provide very good results. Some authors also
mention the difficulties encountered by the local approach on a few other problems
[Vesanto, 1997], [Gers et al., 2001].
We must note that local approaches rely heavily on several assumptions. If only a
limited amount of data is available for training, then the relevant dimension of the space
actually spanned by the x vectors encountered must be significantly lower than the
dimension of the time window. Under the same circumstances (rather common as far as
applications are concerned), the time series should exhibit only a few different behaviors (a
few segments should be enough) and simple local models should prove satisfactory. If any
of these assumptions is false, the number of parameters becomes huge (and so does the
amount of data required for reliable training) either because too many segments are
required or because each local model has too many parameters.
Global approaches follow either equations (1) and (2), or equations (3) and (4). A global
approach attempts to build a single complex model for the entire range of behaviors
identified in the time series. It is usually considered that such a model can be more
parsimonious than a set of local models. However, in order to be able to model an entire
range of behaviors, powerful models have to be used; such models appear to be rather
difficult to train.
Given their universal approximation properties, neural networks such as multi-layer
perceptrons (MLPs) or recurrent networks (RNs) are good candidate models for the global
approaches. Among the many neural network architectures employed for time series
prediction we can mention MLPs with a time window in the input [Weigend et al., 1990],
MLPs with finite impulse response (FIR) connections (equivalent to time windows) both
from the input to the hidden layer and from the hidden layer to the output [Wan, 1993],
recurrent networks obtained by providing MLPs with a feedback from the output
[Czernichow, 1996], simple recurrent networks [Suykens and Vandewalle, 1995], recurrent
networks with FIR connections [El Hihi and Bengio, 1996], [Lin et al., 1996] and recurrent
networks with both internal loops and feedback from the output [Parlos et al., 2000].
3
R. Boné, M. Crucianu, Multi-step-ahead prediction with neural networks: a review. Publication
de l'équipe RFAI, 9èmes rencontres internationales « Approches Connexionnistes en Sciences
Économiques et en Gestion ». 21-22 novembre 2002, Boulogne sur Mer, France. pp. 97-106.

One can notice that many authors included at various places in the feed-forward or
recurrent networks time-delayed connections which provide an explicit memory of the past.
These additions appear to be (implicitly and sometimes explicitly) justified by the fact that
they promote learning of longer-range dependencies in the data, which is supposed to be
generally helpful for SS prediction and in particular for MS prediction problems. However,
the relationship between learning longer-range dependencies and performance in MS
prediction was neither theoretically elucidated, nor experimentally explored.
A general drawback of the global approach is the difficulty of learning a single model
for the entire range of behaviors of the time series. In SS prediction the resulting models
are usually unreliable for only a small part of the entire range of behaviors, but these
difficult data points produce significantly negative effects in MS prediction schemes. This
observation explains why valuable results were obtained by applying boosting methods to
time-dependent processes [Avnimelech and Intrator, 1999] or by using support vector
machines [Suykens and Vandewalle, 2000].
Recently, a significantly different method that also follows the global approach was put
forward in [Jaeger, 2001]. A huge recurrent neural network is randomly generated so that
its units present a rich set of behaviors in an answer to the input sequence; then, by training
the output weights, these behaviors are combined in order to obtain the desired prediction.
Among the very promising results obtained by this method, we can mention MS prediction
with a very long time horizon for a chaotic time series.

2.2 From single-step-ahead prediction to multi-step-ahead prediction


Irrespective of the local or global nature of the models employed, there are several
methods for dealing with a MS prediction problem after finding a satisfactory solution to
the associated SS problem.
The first and most common method consists in building (training) a predictor for the SS
problem and using it recursively for the corresponding MS problem. The estimates
provided by the model for the next time step are fed back into the input of the model until
the desired prediction horizon is reached. This method is usually called iterated prediction.
Local models that have very low SS error on the entire range of behaviors of the time series
generally perform well with this method. For global models, this simple method is plagued
by the accumulation of errors on the difficult data points encountered; the model can
quickly diverge from the desired behavior.
A better method, so far applied to global models, consists in training the predictor on the
SS problem and, at the same time, making use of the propagation of penalties across time
steps in order to punish the predictor for accumulating errors in MS prediction. In the
following we shall call this method corrected iterated prediction. When the models are
MLPs or RNs, such a procedure is directly inspired from the error back-propagation
through time (BPTT) algorithm [Rumelhart et al., 1986], which is an efficient method for
performing gradient descent on the cumulated error. The model is thus simultaneously
trained on both the SS and the associated MS prediction problem.
According to the direct method, the predictor is no longer concerned with an SS
problem and is directly trained on the MS problem. By a formal analysis of the expected
error, it is shown in [Atiya et al., 1999] that the direct method always performs better than
4
R. Boné, M. Crucianu, Multi-step-ahead prediction with neural networks: a review. Publication
de l'équipe RFAI, 9èmes rencontres internationales « Approches Connexionnistes en Sciences
Économiques et en Gestion ». 21-22 novembre 2002, Boulogne sur Mer, France. pp. 97-106.

the iterated method and at least as well as the corrected iterated method. However, this
result relies on several assumptions, among which the ability of the model to learn the
different target functions (the one for SS prediction and the one for direct MS prediction)
perfectly. This assumption can hardly be satisfied by a local approach, because individual
local models are usually simple and the (direct) MS prediction function may be complex.
The comparison then only holds for global approaches. But even in a global approach the
learning algorithm welcomes some form of “help”. For instance, improved results were
obtained by using recurrent networks and training them with progressively increasing
prediction horizons [Suykens and Vandewalle, 1995] or including time-delayed
connections from the output of the network to its input [Parlos et al., 2000].
A method which is related to the direct method was suggested in [Duhoux et al., 2001]
and consists in chaining several networks. For a time horizon of k, a first network learns to
predict at t+1, then a second network is trained to predict at t+2 by using as a
supplementary input the prediction provided by the first network, and so on, until the
desired time horizon is reached. This method was experimentally found to provide better
predictions than the iterated method in a global approach, but the total number of
parameters is proportional to the longest prediction horizon. Owing to the incremental
development of models for progressively longer time horizons, for a local approach this
method should probably be preferred to the direct method.

3 Illustrative experiments
In order to better understand the important issue of the relationship between learning
longer-range dependencies and performance in MS prediction, we decided to perform
specific experiments. These experiments contribute to the discussion in section 4. We
tested on MS prediction problems two constructive algorithms that were originally
developed for learning medium or long-range dependencies in time series [Boné et al.,
2000], [Boné et al., in press]. These constructive algorithms perform a selective addition of
time-delayed connections to recurrent networks and were shown to produce parsimonious
models (few parameters, linear prior on the longer-range dependencies) having good results
on SS prediction problems.
Such results, together with the fact that a longer-range memory embodied in the time
delays should allow a network to better retain the past information when predicting at a
longer horizon, let us anticipate improved results on MS prediction problems. Some more
support for this claim is provided by the experimental evidence in [Parlos et al., 2000]
concerning the successful use of time delays in recurrent networks for MS prediction. We
expected the constructive algorithms to identify the most useful delays for a given problem
and network architecture, instead of using an entire range of delays.

3.1 Two constructive algorithms


The addition of connections with time delays to RNs allows gradient descent algorithms
to find better solutions in the presence of long-term dependencies in the data. Indeed, along

5
R. Boné, M. Crucianu, Multi-step-ahead prediction with neural networks: a review. Publication
de l'équipe RFAI, 9èmes rencontres internationales « Approches Connexionnistes en Sciences
Économiques et en Gestion ». 21-22 novembre 2002, Boulogne sur Mer, France. pp. 97-106.

the time-delayed connections the signal does no longer cross nonlinear activation functions
between successive time steps.
Instead of systematically adding FIR connections to a recurrent network, each
connection encompassing a whole range of delays, we opted for a constructive approach:
start with an RN having no time-delayed connections, then selectively add a few such
connections. The two algorithms we present in the following allow us to choose the
location and the delay associated with a time-delayed connection which is added to an RN.
The first heuristic [Boné et al., in press] for defining the relevance of a candidate
connection is closely dependent on BPTT-like [Rumelhart et al., 1986] underlying learning
algorithms. This method makes use of measures computed during gradient descent and its
order of complexity is the same as for BPTT. For the first heuristic, a connection is
considered useful if it can have an important contribution to the computation of the
gradient of the error with respect to the weights. The resulting algorithm is called
Constructive Back Propagation Through Time (CBPTT).
The second heuristic [Boné et al., 2000] is a sort of breadth-first search. It explores the
alternatives for the location and the delay associated with a new connection by adding that
connection and performing a few iterations of the underlying learning algorithm. The
connection that produces the largest increase in performance during these few iterations is
then added, and learning continues until error increases on the stop set. Another
exploratory stage begins for the addition of a new connection. The breadth-first heuristic
does not need any gradient information and can be applied in combination with learning
algorithms which are not based on the gradient. However, in the experiments reported here
we employed BPTT as the underlying learning algorithm and for this reason we called the
resulting constructive algorithm Exploratory Back-Propagation Through Time (EBPTT).

3.2 Experimental results


We applied standard BPTT, CBPTT and EBPTT to RNs having an input neuron, a
linear output neuron, a bias unit and a recurrent hidden layer composed of neurons with the
symmetric sigmoid (tanh) as activation function. We performed 20 experiments for every
architecture, by randomly initializing the weights in [− 0.3, 0.3]. In the following we
employ the normalized mean squared error (NMSE) which is the ratio between the MSE
and the variance of the time series. The normalized root mean squared error (NRMSE)
employed for some of the results in the literature is the square root of the NMSE.

6
R. Boné, M. Crucianu, Multi-step-ahead prediction with neural networks: a review. Publication
de l'équipe RFAI, 9èmes rencontres internationales « Approches Connexionnistes en Sciences
Économiques et en Gestion ». 21-22 novembre 2002, Boulogne sur Mer, France. pp. 97-106.

200
Test 1 Test 2
180
160
yearly sunspot averages

140
120
100
80
60
40
20
0
1700 1725 1750 1775 1800 1825 1850 1875 1900 1925 1950 1975

FIGURE 1. THE SUNSPOTS TIME SERIES.


The sunspots dataset (Figure 1) is a natural dataset that contains the yearly number of
dark spots on the sun from 1700 to 1979. It is common practice to use as the training set
the data from 1700 to 1920 and to evaluate the performance of the model on two sets,
1921-1955 (test1) and 1956-1979 (test2). Test2 is considered to be more difficult.
We tested RNs having 2 neurons in the hidden layer. For both CBPTT and EBPTT we
set to 20 the upper bound for the delays of the new connections, to 4 the maximal number
of new connections and to 20 the number of BPTT iterations performed for each candidate
connection during the exploratory stage of EBPTT. Figure 2 shows the results obtained on
each test set.
2,5 2,5

2 2
NMSE Test 1

NMSE Test 2

BPTT
1,5 CBPTT 1,5
EBPTT
1 1
BPTT
CBPTT
0,5 0,5
EBPTT

0 0
1 2 3 4 5 6 1 2 3 4 5 6

Steps ahead Steps ahead

FIGURE 2. SUNSPOTS TIME SERIES: MEAN NMSE ON THE TEST SETS AS A FUNCTION OF THE
PREDICTION HORIZON.

One can see that for test2 performance degrades much faster than for test1. It is
commonly accepted that the behavior on test2 can not be explained (by some longer-range
phenomenon) given the available history. Short-range information available in SS
prediction lets the network evaluate the rate of change in the number of sunspots. Such
information is missing in MS prediction, so nothing compensates for the lack of knowledge
concerning the longer-range phenomenon that could explain the behavior of test2.
7
R. Boné, M. Crucianu, Multi-step-ahead prediction with neural networks: a review. Publication
de l'équipe RFAI, 9èmes rencontres internationales « Approches Connexionnistes en Sciences
Économiques et en Gestion ». 21-22 novembre 2002, Boulogne sur Mer, France. pp. 97-106.

The Mackey-Glass benchmarks [Mackey and Glass, 1977] are well-known for the
evaluation of SS and MS prediction methods. The time series are generated by the
following nonlinear differential equation:
0.2 ⋅ x(t − θ)
= −0.1 ⋅ x(t ) +
dx
(5)
dt 1 + x10 (t − θ)
The behavior is chaotic for θ > 16,8 . The results in the literature usually concern θ = 17
(known as MG17, Figure 3) and θ = 30 (MG30). The data is generated and then sampled
with a period of 6, according to the common practice (see e.g. [Wan, 1993]). We use the
first 500 values for the learning set and the next 100 values for the test set.
1,4
1,3
1,2
1,1
1
0,9
0,8
0,7
0,6
0,5
0,4
0 100 200 300 400 500 600

FIGURE 3. THE MACKEY-GLASS TIME SERIES FOR θ = 17 .


We tested RNs having 7 neurons in the hidden layer. For both CBPTT and EBPTT we
set to 34 the upper bound for the delays of the new connections, to 10 the maximal number
of new connections and to 20 the number of BPTT iterations performed during the
exploratory stage of EBPTT. The results are depicted in Figure 4 and 5.
0,4
BPTT
0,35
CBPTT
0,3 EBPTT
0,25
NMSE

0,2
0,15
0,1
0,05
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14

Steps ahead

FIGURE 4. MG17 TIME SERIES: MEAN NMSE ON THE TEST SET AS A FUNCTION OF THE
PREDICTION HORIZON.

8
R. Boné, M. Crucianu, Multi-step-ahead prediction with neural networks: a review. Publication
de l'équipe RFAI, 9èmes rencontres internationales « Approches Connexionnistes en Sciences
Économiques et en Gestion ». 21-22 novembre 2002, Boulogne sur Mer, France. pp. 97-106.

1
BPTT
0,9
0,8 CBPTT
0,7 EBPTT
0,6
NMSE
0,5
0,4
0,3
0,2
0,1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14

Steps ahead

FIGURE 5. MG30 TIME SERIES: MEAN NMSE ON THE TEST SET AS A FUNCTION OF THE
PREDICTION HORIZON.

Comparisons with other published results concerning MS prediction can only be


performed for a horizon of 14; the results presented here are inferior to those of the local
methods put forward in [Vesanto, 1997], [Chudy and Farkas, 1998] or of the method in
[Jaeger, 2001], but the total number of parameters is significantly lower for the RNs trained
by CBPTT or EBPTT than for these local models (at most 81 compared to 1225 – 40000)
and significantly fewer data points were employed for training (500 compared to 3000 or
10000).
MG30 is usually considered to be more difficult than MG17 and the method in
[Vesanto, 1997] performs significantly worse on MG30 than on MG17. In our case the
situation is reversed, the results degrade quickly on MG17 and remain rather constant up to
a horizon of 6 on MG30. These facts can be explained by noting that our algorithms are
dedicated to problems presenting long-range dependencies, which probably play a more
important role for MG30 than for MG17.

4 Multi-step-ahead prediction: relating methods to problems


Starting from the published results and the experiments we presented here, we now
attempt to identify characteristics of various MS prediction problems that encourage us to
prefer one prediction method over another.
For some problems, the available history is not sufficient for explaining part of their
behavior (the sunspots dataset above). This will be the case when the history is too short to
account for some longer-range phenomenon or when outer events significantly affect the
behavior of the time series. In such situations we can not expect reliable MS prediction, no
matter what modeling approach is employed. SS prediction can nevertheless be quite
successful as short-range information concerning a change in behavior is available. SS
prediction performance should not be taken as an indication of likely MS prediction
performance.
When the time series is generated by the low-dimensional stationary attractor of a low
noise dynamical system (e.g. MG17 [Mackey and Glass, 1977] or Chua’s scrolls
[McNames et al., 1999]) and an important amount of data is available, reliable local
9
R. Boné, M. Crucianu, Multi-step-ahead prediction with neural networks: a review. Publication
de l'équipe RFAI, 9èmes rencontres internationales « Approches Connexionnistes en Sciences
Économiques et en Gestion ». 21-22 novembre 2002, Boulogne sur Mer, France. pp. 97-106.

models can be obtained and a local approach should be preferred. When the dimension of
the attractor increases, either as a consequence of a noisy system or because explicit long-
range dependencies are present, the performance of local approaches is likely to degrade
faster than the performance of global approaches. Note that a chaotic attractor displays a
long-range behavior that can often be explained by short-range dependencies only.
The negative effect on local models of the presence of long-range dependencies is not
well understood; for example, in [Vesanto, 1997] the lack of power of the local models is
blamed for the lower performance obtained on MG30. A simple alternative explanation
consists in noting that the time window may not be long enough. Since any amount of
training data can be generated for MG30, an experiment with a longer time window can
easily be performed and should help in clarifying this issue.
The frequent expectation that improved learning of longer-range dependencies in the
data necessarily implies better performance in MS prediction appears to be unfounded.
While an advantage in MS prediction can indeed be obtained when long-range
dependencies are strong, performance is poor when such dependencies have a low
importance, as shown by our results on MG17. Similar poor results were obtained on
MG17 with LSTM [Gers et al., 2001], an algorithm that is also dedicated to long-range
dependencies.
The general weakness of global models as compared to local models on time series like
MG17 can easily be explained for the iterative approach by the accumulation of errors on
the most difficult parts of the range of behaviors. For direct MS prediction the difficulty of
a problem appears to increase significantly, but a more comprehensive explanation has yet
to be found.
Further comparisons between global and local approaches need to be performed on MS
prediction problems where exogenous variables are present. We expect global approaches
to scale better to such situations.

5 Conclusion
The papers concerning MS prediction we cited here are only a part of the recent work on
the subject, but the methods we mentioned represent well the existing approaches. On the
problems that are most frequently encountered in the published literature, comparisons turn
to the advantage of the local approach. However, the promising global methods that were
put forward recently deserve further evaluation. Combinations between global and local
approaches should also be explored, such as using a smaller number of more complex
models instead of many simple models in a local approach.
We attempted to relate features of MS prediction problems to the characteristics of
existing modeling methods. To pursue this effort, more data should be collected concerning
MS problems, either from prior knowledge regarding the problems or from the
experimental results obtained by different prediction methods on these problems.

10
R. Boné, M. Crucianu, Multi-step-ahead prediction with neural networks: a review. Publication
de l'équipe RFAI, 9èmes rencontres internationales « Approches Connexionnistes en Sciences
Économiques et en Gestion ». 21-22 novembre 2002, Boulogne sur Mer, France. pp. 97-106.

References
[1] A.F. Atiya, S.M. El-Shoura, S.I. Shaheen, M.S. El-Sherif (1999) A comparison
between neural-Network Forecasting Techniques – case Study: River Flow
Forecasting, IEEE Transaction on Neural Networks, Vol. 10, pp. 402-409.
[2] R. Avnimelech, N. Intrator (1999) Boosting Regression Estimators, Neural
Computation, vol. 11, pp. 491-513.
[3] R. Boné, M. Crucianu, J.-P. Asselin de Beauville (2000) Two Constructive
Algorithms for Improved Time-series Processing with Recurrent Neural Networks,
IEEE International Workshop on Neural Networks for Signal Processing, pp. 55-64,
Sydney, Australia.
[4] R. Boné, M. Crucianu, J.-P. Asselin de Beauville (2002) Learning Long-Term
Dependencies by the Selective Addition of Time-Delayed Connections to Recurrent
Neural Networks, Neurocomputing, in press.
[5] L. Chudy, I. Farkas (1998) Prediction of Chaotic Time-Series Using Dynamic Cell
Structures and Local Linear Models, Neural Network World, 8(5), pp. 481-489.
[6] T. Czernichow (1996) Apport des réseaux récurrents à la prévision de séries
temporelles, application à la prévision de consommation d'électricité, Thèse de
Doctorat, Université Paris 6, Paris.
[7] M. Duhoux, J. Suykens, B. de Moor, J. Vandewalle (2001) Improved Long-term
Temperature Prediction by Chaining of Neural Networks, International Journal of
Neural Systems, World Scientific Publishing Company, Vol. 11, pp. 1-10.
[8] S. El Hihi, Y. Bengio (1996) Hierarchical Recurrent Neural Networks for Long-Term
Dependencies, in M. Mozer, D. S. Touretzky and M. Perrone, Eds., Advances in
Neural Information Processing Systems, Cambridge, MA, MIT Press. VIII, pp. 493-
499.
[9] F. Gers, D. Eck, J. Schmidhuber (2001) Applying LSTM to Time Series Predictable
Through Time-Window Approaches, International Conference on Artificial Neural
Networks, Vienna, Austria, pp. 669-675.
[10] H. Jaeger (2001) The “Echo State” Approach to Analyzing and Training Recurrent
Neural Networks, GMD Report 148, GMD, Germany.
[11] T. Lin, B.G. Horne, P. Tino, C.L. Giles (1996) Learning Long-Term Dependencies in
NARX Recurrent Neural Networks, IEEE Transactions on Neural Networks 7(6), pp.
1329-1335.
[12] M. Mackey, L. Glass (1977) Oscillations and chaos in physiological control systems,
Science, pp. 197-287.
[13] T. Martinetz, S.G. Berkovich, K. Schulten (1993) Neural-gas Network for Vector
Quantization and Its Application to Time-series Prediction, IEEE Transactions on
Neural Networks, 4(4), pp. 558-569.
[14] J. McNames, J.A.K. Suykens, J.Vandewalle (1999) Winning Entry of the K.U.Leuven
Time-Series Prediction Competition, International Journal of Bifurcation and Chaos,
vol. 9, no. 8, pp. 1485-1500.
[15] J. McNames (2000) Local Modeling Optimization for Time Series Prediction, 8th
European Symposium on Artificial Neural Networks, Bruges, Belgium, pp. 305-310.
11
R. Boné, M. Crucianu, Multi-step-ahead prediction with neural networks: a review. Publication
de l'équipe RFAI, 9èmes rencontres internationales « Approches Connexionnistes en Sciences
Économiques et en Gestion ». 21-22 novembre 2002, Boulogne sur Mer, France. pp. 97-106.

[16] A.G. Parlos, O.T. Rais, A.F. Atiya (2000) Multi-step-ahead Prediction Using Dynamic
Recurrent Neural Networks, Neural Networks, Vol. 13, pp. 765-786.
[17] D.E. Rumelhart, G.E. Hinton, R.J. Williams (1986) Learning Internal Representations
by Error Propagation, in D. E. Rumelhart, J. McClelland (Eds.), Parallel Distributed
Processing: Explorations in the Microstructure of Cognition Vol. 1, pp. 318-362,
Cambridge, MA: MIT Press.
[18] J.A.K. Suykens, J. Vandewalle (1995) Learning a Simple Recurrent Neural State
Space Model to Behave Like Chua’s Double Scroll, IEEE Transactions on Circuits
and Systems-I, Vol. 42, pp. 499-502.
[19] J.A.K. Suykens, J.Vandewalle (2000) Recurrent least squares support vector
machines, IEEE Transactions on Circuits and Systems-I, vol. 47, no. 7, pp. 1109-1114.
[20] F. Takens (1980) Detecting Strange Attractors in Fluid Turbulence, in D. A. Rand and
L. S. Young, Dynamical Systems and Turbulence, Springer-Verlag, New York, pp.
366-381.
[21] J. Vesanto (1997) Using the SOM and Local Models in Time-Series Prediction,
Proceedings of the Workshop on Self-Organizing Maps (WSOM’97), Espoo, Finland,
pp. 209-214.
[22] J. Walter, H. Ritter, K.J. Schulten (1990) Non-linear Prediction with Self-organizing
Feature Maps, in Proceedings of the International Joint Conference on Neural
Networks (IJCNN’90), vol. 2, pp. 589-594.
[23] E.A. Wan (1993) Finite Impulse Response Neural Networks with Applications in
Time Series Prediction, Ph.D. Thesis, Stanford University, USA.
[24] A.S. Weigend, B.A. Huberman, D.E. Rumelhart (1990) Predicting the Future: A
Connectionist Approach, International Journal of Neural Systems, 1(3), pp. 193-209.
[25] G.U. Yule (1927) On a Method of Investigating Periodicity in Disturbed Series with
Special Reference to Wolfer' s Sunspot Numbers,Philos. Trans. Roy. Soc. London Ser.
A 226, pp. 267-298.

12

You might also like