0% found this document useful (0 votes)
17 views8 pages

NIPS 1991 Recurrent Networks and Narma Modeling Paper

Uploaded by

0619969024
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views8 pages

NIPS 1991 Recurrent Networks and Narma Modeling Paper

Uploaded by

0619969024
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Recurrent Networks and N ARMA Modeling

Jerome Connor Les E. Atlas Douglas R. Martin


FT-lO B-317
Interactive Systems Design Laboratory Dept. of Statistics
Dept. of Electrical Engineering University of Washington
University of Washington Seattle, Washington 98195
Seattle, Washington 98195

Abstract
There exist large classes of time series, such as those with nonlinear moving
average components, that are not well modeled by feedforward networks
or linear models, but can be modeled by recurrent networks. We show that
recurrent neural networks are a type of nonlinear autoregressive-moving
average (N ARMA) model. Practical ability will be shown in the results of
a competition sponsored by the Puget Sound Power and Light Company,
where the recurrent networks gave the best performance on electric load
forecasting.

1 Introduction
This paper will concentrate on identifying types of time series for which a recurrent
network provides a significantly better model, and corresponding prediction, than
a feedforward network. Our main interest is in discrete time series that are par-
simoniously modeled by a simple recurrent network, but for which, a feedforward
neural network is highly non-parsimonious by virtue of requiring an infinite amount
of past observations as input to achieve the same accuracy in prediction.
Our approach is to consider predictive neural networks as stochastic models. Section
2 will be devoted to a brief summary of time series theory that will be used to
illustrate the the differences between feedforward and recurrent networks. Section 3
will investigate some of the problems associated with nonlinear moving average and
state space models of time series. In particular, neural networks will be analyzed as
301
302 Connor, Atlas, and Martin

nonlinear extensions oftraditionallinear models. From the preceding sections, it will


become apparent that the recurrent network will have advantages over feedforward
neural networks in much the same way that ARMA models have over autoregressive
models for some types of time series.
Finally in section 4, the results of a competition in electric load forecasting spon-
sored by the Puget Sound Power and Light Company will discussed. In this com-
petition, a recurrent network model gave superior results to feed forward networks
and various types of linear models. The advantages of a state space model for
multivariate time series will be shown on the Puget Power time series.

2 Traditional Approaches to Time Series Analysis


The statistical approach to forecasting involves the construction of stochastic mod-
els to predict the value of an observation Xt using previous observations. This is
often accomplished using linear stochastic difference equation models, with random
inputs.
A very general class of linear models used for forecasting purposes is the class of
ARMA(p,q) models
p q

Xt = L <PXt-1 + L (Jet-i + et
1=1 i=l
where et denotes random noise, independent of past X"~ The conditional mean
(minimum mean square error) predictor Xt of Xt can be expressed in the recurrent
form p q

Xt = L<pXt-, + L(Jet-i·
1=1 i=l
where ek is approximated by
fk = Xk - Xk, Ie = t - 1, ... , t - q

The key properties of interest for an ARMA(p,q) model are stationarity and invert-
ibility. If the process Xt is stationary, its statistical properties are independent of
time. Any stationary ARMA(p,q) process can be written as a moving average
00

Xt = L hket-k + et·
k=l
An invertible process can be equivalently expressed in terms of previous observations
or residuals. For a process to be invertible, all the poles of the z-transform must
lie inside the unit circle of the z plane. An invertible ARMA(p,q) process can be
written as an infinite autoregression
00

Xt = L <PkXt-k + et·
k=l

As an example of how the inverse process occurs, let et be solved for in terms of Xt
and then substitute previous et's into the original process. This can be illustrated
Recurrent Networks and NARMA Modeling 303

with an MA(I) process


Xt = et + (}et-1
et-i = Xt-i - (}et-i-1
Xt = + (}(Xt-1 - (}et-2)
et

Xt = et + ~(-I)i-1(}iXt_i
i
Looking at this example, it can be seen that an MA(I) processes with I(}I ~ 1 will
depend significantly on observations in the distant past. However, if I(}I < 1, then
the effect of the distant past is negligible.
In the nonlinear case, it will be shown that it is not always possible to go back
and forth between descriptions in terms of observables (e.g. Xi) and descriptions
in terms of unobservables (e.g. ei) even when St = O. For a review of time series
prediction in greater depth see the works of Box [1] or Harvey [2].

3 Nonlinear ARMA Models


Many types of nonlinear models have been proposed in the literature. Here we focus
on feed forward and recurrent neural networks and how they relate to nonlinear
ARMA models.

3.1 Nonlinear Autoregressive Models

The simplest generalization to the nonlinear case would be the nonlinear autore-
gressive (NAR) model
Xt = h(xt-1! Xt-2, ... , Xt-p) + et,
where hO is an unknown smooth function with the assumption the best (i.e., mini-
mum mean square error) prediction of Xt given Xt-1I ... , Xt-p is its conditional mean
Zt = E(xtl x t-1I ... , Xt_p) = h(xt-1I ... , Xt-p).

Feedforward networks were first proposed as an NAR model for time series predic-
tion by Lapedes and Farber [3]. A feedforward network is a nonlinear approximation
to h given by
I p
Zt = h(Xt-1I ... , Xt-p) = ~ Wd(~ WijXt-j).
i=l ;=1
The weight matrix W is lower diagonal and will allow no feedback. Thus the feed-
forward network is a nonlinear mapping from previous observation onto predictions
of future observations. The function /(x) is a smooth bounded monotonic function,
typically a sigmoid.
The parameters Wi and Wij are estimates from a training sample x~, ... , x')." thereby
obtaining an estimate of h of h. Estimates are obtained by minimizing the sum
of the square residuals E~l (Xt - Zt)2 by gradient descent procedure known as
"backpropagation" [4].
304 Connor, Atlas, and Martin

3.2 NARMA or NMA


A simple nonlinear generalization of ARMA models is

It is natural to predict

Zt = h(Xt-b Xt-2, ... , Xt-p, et-b ... , et-q).


If the model h(Xt-b Xt-2, ... , Xt-p, et-l, ... , et-q) is chosen, then a recurrent network
can approximate it as
1 p q

Zt = h(Xt-1' ... , Xt-p) = L Wd(L WijXt-j + L wij(Xt-j - Zt-j».


i=1 j=1 ;=1
This model is a special case of the fully interconnected recurrent network
1 n
Zt = L Wd(L wijXt-j)
i=1 j=1
where wij are coefficients of a full matrix.
Nonlinear autoregressive models and nonlinear moving average models are not al-
ways equivalent for nondeterministic processes as in the linear case. If the prob-
ability of the next observation depends on the previous state of the process, a
representation built on et may not be complete unless some information on the pre-
vious state is added[8]. The problem is that if et, ... , et-m are known, there is still
not enough information to determine which state the series is in at t - m. Given
the lack of knowledge of the initial state, it is impossible to predict future states
and without the state information, the best predictions cannot be made.
If the moving average representation cannot be made with et alone, it still may be
possible to express a model in terms of past et and state information.

It has been shown that for a large class of nondeterministic Markov processes, a
model of this form can be constructed[8]. This link is important, because a recurrent
network is this type of model. For further details on using recurrent networks to
NARMA modeling see Connor et al[9].

4 Competition on Load Forecasting Data


A fully interconnected recurrent network trained with the Williams and Zipser algo-
rithm [10] was part of a competition to predict the loads of the Puget Sound Power
and Light Company from November 11, 1990 to March 31, 1991. The object was
to predict the demand for the electric power, known as the load, profile of each day
on the previous working day. Because the forecast is made on Friday morning, the
Monday prediction is the most difficult. Actual loads and temperatures of the past
are available as well as forecasted temperatures for the day of the prediction.
Recurrent Networks and NARMA Modeling 305

Neural networks are not parsimonious and many parameters need to be determined.
Seasonality limits the amount of useful data for the load forecasting problem. For
example, the load profile in August is not useful for predicting the load profile in
January. This limited amount of data severely constrains the number of parameters
a model can accurately determine. We avoided seasonality, while increasing the size
of the training set by including data form the last four winters. In total 26976
vectors were available when data from August 1 to March 31 for 1986 to 1990 were
included. The larger training set enables neural network models be trained with
less danger of overfitting the data. If the network can accurately model load growth
over the years, then the network will have the added advantage of being exposed
to a larger temperature spectrum on which to base future predictions. The larger
temperature spectrum is hypothetically useful for predicting phenomenon such as
cold snaps which can result in larger loads than normal. It should be noted that
neural networks have been applied to this model in the past[6].
Initially five recurrent models were constructed, one for each day of the week, with
Wednesday, Thursday and Friday in a single network. Each network has tempera-
ture and load values from a week previous at that hour, the forecasted temperature
of the hour to be predicted, the hour year and the week of the forecast. The week of
the forecast was included to allow the network to model the seasonality of the data.
Some models have added load and temperature from earlier in the week, depending
on the availability of the data. The networks themselves consisted of three to four
neurons in the hidden layer. This predictor is of the form
It(k) = et(k - 7) + I(lt(k - 7), et(k - 7), it(k), T8(k - 1), t, d, y),
where 10 is a nonlinear function, It(k) is the load at time t and day k, et is the
noise, T is the temperature, T is the forecasted temperature, d is the day of the
week, and y is the year of the data.
After comparing its performance to the winner of the competition, the linear model
in Fig. 1, the poor performance could be attributed to the choice of model, rather
than a problem with recurrent networks. It should be mentioned that the linear
model took as one of its inputs, the square of the last available load. This is a
parsimonious way of modeling nonlinearities. A second recurrent predictor was
then built with the same input and output configuration as the linear model, save
the square of the previous load term which the nets nonlinearities can handle. This
net, denoted as the Recurrent Network, had a different recurrent model for each
hour of the day. Each hour of the day had a different model, this yielded the best
predictions. This predictor is of the form
=
It(k) et(k) + It.(lt(k - 1), et(k - 1), it(k), Ts(k - 1), d, y).
All of the models in the figure use the last available load, forecasted temperature
at the hour to be predicted, maximum forecasted temperature of the day to be
predicted, the previous midnight temperatures, and the hour and year of the pre-
diction. A second recurrent network was also trained with the last available load
at that hour, this enabled et-l to be modeled. The availability of et-l turned out
to be the difference between making superior and average predictions. It should be
noted that the use of et-l did not improve the results of linear models.
The three most important error measures are the weekly morning, afternoon, and
total loads and are listed in the table below. The A.M. peak is the mean average
306 Connor, Atlas, and Martin

Recurrent .0275 .0355 .0218 .0311

Table 1: Mean Square Error

percent error (MAPE) of the summed predictions of 7 A.M. to 9 A.M., the P.M.
peak is the MAPE of the summed predictions of 5 P.M. to 7 P.M, and the total
is the MAPE of the summed predictions over the entire day. Results, of the total
power for the day prediction, of the recurrent network and other predictors are
shown in Fig. 1. The performance on the A.M. and P.M. peaks were similar[9].
The failure of the daily recurrent network to accurately predict is a product of trying
to model to complex a problem. When the complexity of the problem was reduced
to that of predicting a single hour of the day, results improved significantly[7].
The superior performance of the recurrent network over the feedforward network
is time series dependent. A feedforward and a recurrent network with the same
input representation was trained to predict the 5 P.M. load on the previous work
day. The feedforward network succeeded in modeling the training set with a mean
square error of .0153 compared to the recurrent networks .0179. However, when
the tested on several winter outside the training set the results, listed in the table
below, varied. For the 1990-91 winter, the recurrent network did better with a
mean square error of .0311 compared to the feedforward networks .0331. For the
other winter of the years before the training set, the results were quite different,
the feedforward network won in all cases. The differences in prediction performance
can be explained by the inability of the feedforward network to model load growth
in the future. The loads experience in the 1990-91 winter were outside the range of
the entire training set. The earlier winters range of loads were not as far form the
training set and the feedforward network modeled them well.

The effect of the nonlinear nature of neural networks was apparent in the error
residuals of the training and test sets. Figs. 2 and 3 are plots of the residuals
against the predicted load for the training and test sets respectively. In Fig. 2,
the mean and variance of the residuals is roughly constant as a function of the
predicted load, this is indicative of a good fit to the data. However, in Fig. 3,
the errors tend to be positive for larger loads and negative for lesser loads. This
is a product of the squashing effect of the sigmoidal nonlinearities. The squashing
effect becomes acute during the prediction of the peak loads of the winter. These
peak loads are caused when a cold spell occurs and the power demand reaches record
levels. This is the only measure on which the performance of the recurrent networks
is surpassed, human experts outperformed the recurrent network for predictions
during cold spells. The recurrent network did outperform all other statistical models
on this measure.
Recurrent Networks and NARMA Modeling 307

8~1----------------------------------~I

:tI i
~.

E
.. ,I-I
w
~

;:
J
..

n
u

:1I n
I Recunenl IFeed.'4elWOu!
Forward Besl Linearl
~elWOr"

°O~----------~:------3------4----~~--~6·
.'.fodel :
fj\!fIIJrl'
~r
I Ori. I
Recunenl I
~elWOrlt :

Figure 1; Competition Performance on Total Power

error
400
: ..
200 • :
'.. ••
. .." •
" ..
", I, ". • • .'
• • #'. • • &. ..-, •
_ _ _..\--I'
. . .. ·,&·"~·i·-f/: ",,' d'
..-.i-'"r,.;'.,..c",.~~i:..;o:!..·I..:.·~'......____-'-_ _ _ p r e ~ c t e
l!o.........................
"
d
. .:2' ~J).~,·l-~~5·0...
. ..... •.'...... -. .. .' 3750 Load
:.e'.,':-:' .'... 1:
• •• • • I. :_ • • ••••

-20a'
...' ., ...
.. .
-400
~ .

Figure 2: Prediction vs. Residual on Training Set

error
400
'. ..
200

.. .
.' • '.1
.'
• '.
___+-___~~~......._ ....,.'~.:~:,~.~.~.-..-._ ................_predi c ted
. :. . .
2750 .. ·.. 3"25'0
. .' .
3750 Load
-200 . . -..
-400

Figure 3: Prediction vs. Residual on Testing Set


308 Connor, Atlas, and Martin

5 Conclusion
Recurrent networks are the nonlinear neural network analog of linear ARMA mod-
els. As such, they are well-suited for time series that possess moving average com-
ponents, are state dependent, or have trends. Recurrent neural networks can give
superior results for load forecasting, but as with linear models, the choice of model
is critical to good prediction performance.

6 Acknowledgements
We would like to than Milan Casey Brace of the Puget Power Corporation, Dr. Seho
Oh, Dr. Mohammed EI-Sharkawi, Dr. Robert Marks, and Dr. Mark Damborg for
helpful discussions. We would also like to thank the National Science Foundation
for partially supporting this work.
References
[1] G. Box, Time series analysis: forecasting and control, Holden-Day, 1976.
[2] A. C. Harvey, The econometric analysis 0/ time series, MIT Press, 1990.
[3] A. Lapedes and R. Farber, "Nonlinear Signal Processing Using Neural Net-
works: Prediction and System Modeling", Technical Report, LA-UR87-2662,
Los Alamos National Laboratory, Los Alamos, New Mexico, 1987.
[4] D.E. Rumelhart, G.E. Hinton, and R.J. Williams, "Learning internal represen-
tations by error propagation", in Parallel Distributed Processing, vol. 1, D.E.
Rumelhart, and J.L. NcCelland,eds. Cambridge:M.I.T. Press,1986, pp. 318-362.
[5] M.C. Brace, A Comparison 0/ the Forecasting Accuracy of Neural Networks
with Other Established Techniques, Proc. of the 1st Int. Forum on Applications
of Neural Networks to Power Systems, Seattle, July 23-26, 1991.
[6] L. Atlas, J. Connor, et al., "Performance Comparisons Between Backpropaga-
tion Networks and Classification Trees on Three Real-World Applications",
Advances in Neural In/ormation Processing Systems 2, pp. 622-629, ed. D.
Touretzky, 1989.
[7] S. Oh et al., Electric Load Forecasting Using an Adaptively Trained Layered
Perceptron, Proc. of the 1st Int. Forum on Applications of Neural Networks to
Power Systems, Seattle, July 23-26, 1991.
[8] M. Rosenblatt, Markov Processes. Structure and Asymptotic Behavior,
Springer-Verlag, 1971, 160-182.
[9] J. Connor, L. E. Atlas, and R. D. Martin,"Recurrent Neural Networks and
Time Series Prediction", to be submitted to IEEE Trans. on Neural Networks,
1992.
[10] R. Williams and D. Zipser. A Learning Algorithm for Continually Running
Fully Recurrent Neural Networks, Neural Computation, 1, 1989, 270-280.

You might also like