0% found this document useful (0 votes)
32 views18 pages

A Recurrent Neural Network For Modelling Dynamical Systems

This document summarizes a research article that introduces a recurrent neural network architecture for modeling dynamical systems. The network is designed to model real-world processes based on empirical measurements taken at discrete time points. It can learn from multiple temporal patterns that may evolve on different timescales and be sampled at non-uniform intervals. The network is tested on a synthetic problem where it must predict the final state based only on initial and final measurements, and is able to infer the role of unmeasured state variables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views18 pages

A Recurrent Neural Network For Modelling Dynamical Systems

This document summarizes a research article that introduces a recurrent neural network architecture for modeling dynamical systems. The network is designed to model real-world processes based on empirical measurements taken at discrete time points. It can learn from multiple temporal patterns that may evolve on different timescales and be sampled at non-uniform intervals. The network is tested on a synthetic problem where it must predict the final state based only on initial and final measurements, and is able to infer the role of unmeasured state variables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/2860700

A Recurrent Neural Network for Modelling Dynamical Systems

Article  in  Network Computation in Neural Systems · August 2002


DOI: 10.1088/0954-898X_9_4_008 · Source: CiteSeer

CITATIONS READS
24 1,913

3 authors, including:

David Mackay Philip J. Withers


University of Cambridge The University of Manchester
50 PUBLICATIONS   6,494 CITATIONS    965 PUBLICATIONS   30,386 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Diffraction analysis of ferroelectric ceramics View project

Critical Defect induced Fatigue Damage in AM Al Alloys using In-situ SR-μCT View project

All content following this page was uploaded by Philip J. Withers on 19 September 2013.

The user has requested enhancement of the downloaded file.


A Recurrent Neural Network for Modelling Dynamical Systems

Coryn A.L. Bailer-Jonesy, David J.C. MacKayz


Cavendish Laboratory, University of Cambridge,
Madingley Road, Cambridge, CB3 OHE, England
Philip J. Withers x
Department of Materials Science and Metallurgy, University of Cambridge,
Pembroke Street, Cambridge, CB2 3QZ, England

ABSTRACT
We introduce a recurrent network architecture for modelling a general class o f dynamical systems.
The network is intended for modelling real-world processes in which empirical measurements of the
external and state variables are obtained at discrete time points. The model can learn from multiple
temporal patterns, which may evolve on different timescales and be sampled at non-uniform time in-
tervals. We demonstrate the application of the model to a synthetic problem in which target data are
only provided at the final time step. Despite the sparseness of the training data, the network is able not
only to make good predictions at the final time step for temporal processes unseen in training, but also to
reproduce the sequence of the state variables at earlier times. Moreover, we show how the network can
infer the existence and role of state variables for which no target information is provided. The ability
of the model to cope with sparse data is likely to be useful in a number of applications, particularly the
modelling of metal forging.

Network: Computation in Neural Systems, 1998, 9, 531–547

1. Introduction
Many real-world processes can be represented as dynamical systems. A dynamical system can be described in terms
of the evolution of one or more state variables in response to one or more external variables. In this paper we will
consider dynamical systems which can be modelled with the equation
@ v( ) = F(v( ); x( ))
@ (1)

where x are the external variables, v are the state variables, and F is a non-linear static function. In terms of this
definition, the external variables are causally independent of the state variables. Given a set of initial conditions,
( = 0)
v ()
, and the time sequence of the external variables, x  , equation 1 determines the evolution of the state
() 0
variables, i.e. v  at  > . It is the problem of learning the time sequence of the state variables which we address
in this paper.
An example of a dynamical system is the hot forging of a piece of metal. When a material is forged, its macro and
micro structural properties are altered through mechanical deformation. The deformation is described by the forging
parameters, such as the strain, strain rate and temperature, all of which are generally functions of time. These are
the external variables. Typical state variables of interest are the material grain size and extent of recrystallisation,
as well as macroscopic properties such as strength and toughness.
The modelling of materials forging is an important but difficult task (see, e.g. Bailer-Jones et al. 1997, 1998).
While it is usually straightforward to measure most of the external variables during forging, many of the state vari-
ables are difficult to measure while forging takes place. Therefore we will usually only have measurements of the
state variables at the beginning and end of the forging process, giving us relatively few target data with which to
develop our model. Additionally, there are some state variables, such as the dislocation density, which are important
in describing the evolution of the material, yet cannot be measured at all in most practical applications. As these
 email: [email protected]
y Present address: Max-Planck-Institut für Astronomie, Königstuhl 17, D-69117 Heidelberg, Germany
z email: [email protected]
x email: [email protected]
that it is nonetheless possible to infer the existence and role of such “unmeasured” state variables.
The problem we address is the modelling of dynamical systems on the basis of empirical data. That is, we wish
to learn the underlying dynamical system for a process from which our only knowledge comes from incomplete
measurements of x and v. In the following sections we shall introduce our recurrent network architecture for mod-
elling equation 1. After deriving the training rule, we shall demonstrate the performance of the model through its
application to a synthetic problem.

2. The Model
Many types of recurrent neural network architectures have been proposed for modelling time-dependent phenom-
ena. These include discrete time networks (e.g. Jordan 1986; Rumelhart, Hinton & Williams 1986; Stornetta, Hogg
& Huberman 1988; Williams & Zipser 1989; Elman 1990) and continuous time networks (e.g. Pineda 1987, 1988;
Pearlmutter 1989). A number of authors have looked specifically at the application of recurrent networks to mod-
elling dynamical processes (e.g. Robinson & Fallside 1991; Parlos, Chong & Atiya 1994; Nerrand et al. 1994).
Non-recurrent networks have also been used in some situations to model time series (e.g. Chakraborty et al. 1992).
We are interested in modelling real-world dynamical systems in which discrete time measurements are made of the
relevant variables at known time points, or epochs. Therefore, our model is a discrete time network model of a
continuous time system.
Our recurrent network is based on a first-order solution to equation 1. The Taylor expansion about a point,
( )
v  ?  , is
v( ) = v( ?  ) + @ v(@?  )  : (2)

(We shall assume that the separations between measurement epochs,  , are sufficiently small to allow us to drop
( )
the additional terms of order  2 .) This solution is modelled using the recurrent network shown in Figure 1, and
can be considered as a network in two stages. The first stage is a standard feedforward network which implements
() ()
function F in equation 1 directly. The inputs to this stage are the external inputs, x  , and the state variables, v  ,
at a certain epoch  , and the outputs are
y( ) = F(v( ); x( )) : (3)
These outputs are the time derivatives of the state variables at epoch  . The second stage of the network is the
recurrent part, and implements equation 2 via the one-to-one connections from the network outputs to the state
variables (or recurrent inputs). The weights of these connections are set to  . The state variables also feed back
into themselves with unit weights. The cycle then repeats for every epoch for which external inputs are defined.
What we label as “outputs” in Figure 1 are not outputs in the sense that we usually read values from them. Rather
we give them this name to make the analogy with feedforward networks.
In the rest of this paper the indices k , l and m will be used exclusively to count over nodes in the state variable,
external input and hidden layers respectively. As there are one-to-one connections between the output and state
variable layers, k will also be used to label the nodes in the output layer. These indices will also be used to label
weights between the nodes, e.g. wkm is the weight between the k th state variable and the mth hidden node. The
indices i and j will be used to label arbitrary nodes. V , X and H will denote the sets of nodes in the state variable,
external input and hidden layers respectively. We introduce the integer t to enumerate epochs at time t , for t =
12
; ;:::;T.
The activation of the mth node in the hidden layer is given by
hm(t ? 1) = f [pm(t ? 1)] (4)

X X
where
pm(t ? 1) = wkm vk (t ? 1) + wlm xl (t ? 1) + wbm xb : (5)
k l
The input-hidden transfer function, f , is the tanh function to introduce non-linearity. The bias value, xb , is fixed to
unity. The outputs from the network are given by
yk (t ? 1) = g[qk (t ? 1)] (6)

X
where
qk (t ? 1) = wmk hm (t ? 1) + wbk hb : (7)
m

2
state
variables
(recurrent inputs)
vk

external 1.0 outputs


inputs yk
xl
hidden units
hm
1.0

Fig. 1: A recurrent neural network architecture for modelling dynamical systems. Data flows counter-clockwise around the network. An external
? ? ?
input vector, x(t 1), and state variable vector, v(t 1), are the inputs to the feedforward stage of the network. The “outputs”, y(t 1), are
the time derivatives of the state variables. The one-to-one connections from the outputs to the state variables provide the means of evaluating
the state variables at the next epoch, v(t). All connections and the two bias nodes are shown.

The hidden-output transfer function, g , is linear to permit the outputs to be of any size. The bias value, hb , is fixed
to unity.
1
Everything which occurs in the first stage of the network does so at the same epoch, t? . The recurrent part of
the network gives the state variable at the next epoch,
vk (t) = vk (t ? 1) + yk (t ? 1)(t) (8)
where (t) = t ? t?1 . Thus the external inputs and state variables at epoch t ? 1 give rise to the state variable at
t.
This network will produce a complete sequence of the state variables, v(1); v(2); : : : ; v(T ), given initial con-
ditions on the state variables, v(0), the external input sequence, x(0); x(1); : : : ; x(T ? 1), and of course the time
steps, (1); (2); : : : ; (T ). We shall refer to a single such sequence of the variables as a temporal pattern.

3. Training
The network is trained in the conventional supervised manner by minimizing an error function with respect to the
network weights. Optimization of the weights is achieved using an extension of the backpropagation scheme of
Rumelhart, Hinton & Williams (1986). Our network architecture is similar to that of Jordan (1986), but differs in
the important respect that in training our network the error derivatives are propagated to later epochs: although the
recurrent weights themselves are not trainable, they can nonetheless be used to propagate errors.
The network is trained with one or more temporal patterns. The necessary training data for a single temporal
pattern is the sequence of external inputs and associated epoch separations, the initial state variables, and at least
one target value. While target values can be specified on any node in the network, we will limit ourselves to the
practical situation in which our targets are values of the state variables. Note that the model does not need targets at
every epoch: In modelling forging we are often only able to measure a target state variable at the end of the forge.
We shall see in section 6 that this is often sufficient to learn the underlying dynamical system.
We will now consider the propagation of training errors for a single temporal process. The error in the k th state
variable at epoch t is
ek (t) = vk (t) ? Tk (t) (9)
where Tk (t) is the target at epoch t. If no target is defined, ek (t) = 0. The error at epoch t is
X
E (t) = 12 k [ek (t)]2 (10)
k

3
Differentiating equation 10 with respect to an arbitrary weight wij
@E (t) = X e (t) @vk (t) :
@wij k k @w (11)
k ij
The last term is obtained by differentiating equation 8 with respect to wij
@vk (t) = @vk (t ? 1) + @yk (t ? 1) (t) :
@wij @wij @wij (12)

From equation 6
@yk (t ? 1) = g0 [q (t ? 1)] @qk (t ? 1)
@wij k @wij (13)

and from equation 7



@qk (t ? 1) = X @wmk h (t ? 1) + w @hm(t ? 1) + @wbk h

@wij m mk @w @wij b
m @wij ij
X
= kj iH hi (t ? 1) + wmk @hm@w(t ? 1) (14)
m ij
where kj is the conventional delta function and iH is defined as
iH = 1 if i 2 H
= 0 otherwise
i.e. iH =1if i is a node in the hidden layer. When i = b (bias node), hi(t ? 1) = hb = 1.
From equation 4
@hm(t ? 1) = f 0[p (t ? 1)] @pm(t ? 1)
@wij m @wij (15)

and from equation 5


@pm(t ? 1) = X  @wkm 
@vk (t ? 1) + X @wlm x (t ? 1) + @wbm x
@wij @wij vk (t ? 1) + w km @wij l @wij b
k l @wij
X k (t ? 1) +   x (t ? 1)
= mj iV vi (t ? 1) + wkm @v@w mj iX i (16)
k ij
where mj is the conventional delta function, and
iV = 1 if i 2 V
= 0 otherwise
and
iX = 1 if i 2 X
= 0 otherwise :
Writing
@vk (t)  ij (t)
@wij k (17)

and
@yk (t)  ij (t)
@wij k (18)

equation 12 becomes
ijk (t) = ijk (t ? 1) + ijk (t ? 1)(t) : (19)

4
 X @h m (t ? 1) 
ijk (t ? 1) = g0[qk (t ? 1)] kj iH hi (t ? 1) + wmk @w
ij
(20)

and equation 16 into 15


!
@hm(t ? 1) = f 0[p (t ? 1)]  [ v (t ? 1) +  x (t ? 1)] + X w 0 ij (t ? 1) :
@wij m mj iV i iX i k m k0 (21)
k0
Combining equations 20 and 21 we get
ijk (t"? 1) = g0[qk (t ? 1)] !#
(22)
X X ij
kj iH hi (t ? 1) + wmk f 0 [pm(t ? 1)] mj [iV vi (t ? 1) + iX xi (t ? 1)] + wk0 m k0 (t ? 1)
m k0
ij
This equation and equation 19 give a recurrence relation for k (t), the gradient of vk with respect to any network
weight, in terms of ij ij
k (t ? 1). To solve this, the system must be initialised with an initial value of k (t), which
is taken to be zero because the initial state variables are independent of the network weights. The gradient of the
error for a single temporal pattern at epoch t is then
@E (t) = X e (t)ij (t) :
@wij k k k (23)
k
These equations allow us to compute an error gradient at every epoch. Note that a target does not have to be defined
at every epoch in order to be able to propagate ijk t. ()
We will often want to train the network on several temporal patterns. This can be done quite simply by applying
the above recursion relations to each pattern separately. How we then update the weights is a matter of choice. For
example, we could update the weights using the gradient calculated at each epoch for each pattern, or we could
cumulate the gradients at all epochs for each pattern before updating. This former method is similar to the Real
Time Recurrent Learning method of Williams & Zipser (1989) extended to multiple temporal patterns. We choose
instead to do “batch” learning in which the gradient is cumulated over all epochs for all patterns, and then update
the weights using a conjugate gradient optimizer.

4. Regularization and a Bayesian Perspective


We regularize training of the network using weight decay, that is by penalizing large weights. The total error which
is to be minimized is then
ET = Ed + E (w) 0 1
X X X A
= E (t) + 12 @ g wij : 2
(24)
;t g i;j 2g
The first term, Ed , is the sum of errors in equation 10 over all epochs for all patterns ( ). The weight decay term,
( )
E w , is the sum over all the weights collected into four groups: state variable to hidden weights; external input
to hidden weights; bias to hidden weights; hidden to output weights. Each group has a value, g , associated with
it which controls the scale of the weights.
A Bayesian probabilistic interpretation of training identifies Djw P( )=
e?Ed as the probability of the data
given the weights (MacKay 1995). The weight decay term is written as w e?E(w) , the prior probability over
P( ) =
the weights. Using Bayes’ theorem
P(wjD) / P(Djw)P(w) (25)
we see that P(wjD) / e T is the posterior probability of the weights given the data. It is this quantity which
?E
we maximise when training the regularized network. From equation 24, we can consider P(w) as a product of
Gaussian prior probability distribution over the different groups of weights, with means zero and standard deviations
g ?g 1=2 . Similarly, Djw can be considered as a Gaussian distribution over the target values, with standard
= P( )
deviation, or noise level, of k = k?1=2 . Schemes exist for inferring the optimum values of g and k directly
from the data (MacKay 1995). However, in section 6 we shall fix these hyperparameters to reasonable values. It
should be noted that the parameters for the hidden to output weights are not independent of a rescaling of time or
of the state variables, as the outputs (y) are the time derivatives of the state variables. It is for this reason also that
the hidden-output transfer function must provide an arbitrary large output range.

5
The network offers a number of features which may be useful in practical applications. The first is that the network
can learn from more than one temporal pattern. Moreover, these temporal processes can consist of different numbers
of epochs, and the time interval between the epochs does not have to be constant. This is useful for learning from
temporal patterns which evolve on very different time scales but nonetheless correspond to the same dynamical
system. An example is the forging of two pieces of material which are identical other than being of very different
sizes and masses.
A second feature is that it is not necessary to have a target value at every epoch when training the network: in
addition to an initial v value, only one target v is required. Furthermore, it is not necessary to have an external input
defined at those epochs where a v target is defined. Again this is useful in practical situations, as it means that we
can measure our process (external input) variables independently of the state variables.
In modelling dynamical systems, there may well be important state variables which cannot be measured. For
example, the dislocation density is important in determining the evolution of a material microstructure during me-
chanical processing, but is time consuming to measure in practice. Our model can use state variables for which
there are no target values: we shall refer to these as “unmeasured” state variables. Their purpose is to propagate any
additional state variable information required to make correct predictions of the measured state variables. As there
are no target values for these unmeasured state variables, they may not even correspond to any physical variables,
although we may be able to give them a physical interpretation. An example of their use will be given in the next
section.

6. Application to Synthetic Problems


We now demonstrate the performance of the model on a synthetic problem in which there are two external input
() () () ()
variables, x1  and x2  , and two state variables, v1  and v2  , defined by

@v1 = x ? 2v + 8v ? x v
@ 1 1 2 1 1 (26)
@v2 = x ? 5v + v ? x v :
@ 2 1 2 2 2 (27)

The autonomous part of this dynamical system (that with the external inputs set to zero) is a decaying harmonic
oscillator, with period 1.0 and e?1 damping timescale 2.0. The x terms make the temporal processes somewhat
more complex, but the overall oscillatory behaviour is retained.
To mimic real processes in which the external inputs are often constrained to be positive (e.g. temperature),
the x1 and x2 input sequences were generated from constrained random walks: x1 (x2 ) changes with a probability
0 65 0 999
per unit time of : ( : ) by a random amount uniformly distributed between ? : and 05 +0 5 1
: (? and ). +1
The modulus of x is then taken to ensure a positive sequence. The initial v values were randomly selected from a
1
uniform distribution between ? and +1 .
One hundred synthetic temporal patterns were simulated numerically, with different sequences for x1 and x2
and different initial values of v1 and v2 . The sequences were generated from  =0 to  =8 inclusive, with a
constant epoch separation of t ( )=0 1: . In all of the following examples, the network had eight hidden nodes and
was trained on 50 of the processes. All results shown are for application to the other 50 processes not seen during
training. In training, the network is only ever given the initial state variables and targets at the final epoch: it is
never given intermediate targets. The hyperparameters were set to = 40000 and = 0 01
: . Although these are
reasonable, no attempt has been made to optimise them, manually or otherwise.

Problem 1 This network has two state variables, one for each of v1 and v2 . Figure 2 shows that the trained network
makes excellent predictions of v1 and v2 at the final epoch. It is interesting also to ask whether the network has
managed to correctly learn the entire v sequences for the test data, or whether it has learned some simpler form
for F in equation 1 which just gives correct v values at the final epoch. Figure 3 shows the x input sequences,
and the predicted v sequences in comparison with the true sequences, and shows that the sequence predictions are
generally very accurate. However, some of the predicted sequences deviate from the true sequences at early times
( < 2
 ). This is because the target values (at the final epoch) lie in a smaller region of the state space than do the
state variables at earlier epochs (Figure 4): The network has only learned the dynamical system (or rather a good
approximation to it) in this smaller region of the state space, and its extrapolations to unfamiliar parts of the state
space are somewhat poorer. Nonetheless we see that for this particular problem the network is able to recover after
 .2
6
Fig. 2: Network predictions at the final epoch ( = 8) for a network with two measured state variables (problem 1), (a) v1 and (b) v2 . The rms
errors are 0.0067 and 0.0084 for v1 and v2 respectively.

Problem 2 The dynamical system in equations 26 and 27 consists of two coupled state variables. Thus if one
state variable (v1 ) is removed from the network, we would not expect the network to make good predictions of v2
either at the final epoch (where there is a target) or at intermediate epochs. We have confirmed this experimentally
by training a network with only one state variable with targets for v2 : no v1 data is seen by the network. Figure 5
shows the scatter plot of the resultant network predictions. The correlation is not zero, probably because v1 and v2
are strongly coupled, but the correlation is considerably worse than in Figure 2b. Moreover, Figure 6 shows that
the network has failed to learn the underlying dynamical system in any region of the state space.

Problems 3 & 4 Problem 3 is similar to problem 2, but now we add to the network an “unmeasured” state variable
for which we provide no target information. The goal is that the network will use the unmeasured state variable to
propagate any information required to make good predictions for v2 . The initial value of the unmeasured state
variable was set to zero for all temporal patterns. The same four target sequences as used in the previous two
problems are shown in Figure 7, along with the networks predictions for v2 and the unmeasured state variable. In
order to achieve such good predicted sequences for v2 (other than at early epochs as explained earlier), the network
must be using the unmeasured state variable, because we saw in the previous problem that without this the network
is unable to predict the v2 sequences. While we would not expect the unmeasured state variable to replicate the
behaviour of v1 (as no target information was provided for it), it nonetheless emulates it closely. Learning v1 exactly
is not necessary, as any monotonic transformation of v1 carries the same information. This can be seen better in
Figure 8, in which we have re-trained the same network from different initial random weights (problem 4). Again
the sequences for v2 are very close to the true ones, as are the final values (Figure 9a), but this time the network
has discovered a different sequence for the unmeasured state variable. Figure 9b shows that there is an excellent
correlation between the values of the unmeasured state variable and the true v1 variable at the final epoch. This
demonstrates that the network can use the extra degree of freedom provided by the unmeasured state variable to

7
Fig. 3: Four samples of temporal patterns from the synthetic problem in equations 26 and 27. In each panel the solid lines from bottom to top
are v1 , v2 , x1 and x2 . For v1 and v2 the solid lines are the network predictions from problem 1. The lower and upper dashed lines are the true
v1 and v2 sequences respectively. For clarity, the sequences for v2 (network and true), x1 and x2 have been offset from their true positions by
1, 2 and 3 vertical units respectively.

infer the behaviour of a missing variable without being supplied any target information for it.

Problems 5 & 6 In the final two problems, we train a network with one measured state variable (v2 ) and two un-
measured state variables, to see what the network does with the extra (redundant) degree of freedom. The sequence
predictions are shown in Figure 10. The v2 sequence is again predicted well. What is interesting here is that the
network appears to be using both unmeasured state variables. Figure 11a shows the correlations between these two
unmeasured variables and v1 . Neither correlation is as strong as in problem 4, when we used only one unmeasured
state variable. Indeed, it appears from inspection of Figure 10 that some linear combination of the two unmeasured
state variables will give a better correlation with the true v1 , and this is shown in Figure 11b. While the final epoch
values are still predicted well (Figure 11c), the performance is slightly worse than in problem 4, indicating that the
network may have slightly overfitted. This could be alleviated by increasing the values of the appropriate weight
decay parameters.
If we retrain the same network from different initial random weights, it is possible to get a very different solution
for the two unmeasured state variables. One such alternative solution is shown in Figure 12 (problem 6) where we
see that the network has made use of only one of the unmeasured state variables. As can be seen in Figure 13, the
predictions for v2 are as good as the previous network solution (problem 5). Indeed, the reproducibility of the v2
predictions at the final epoch is excellent.

8
Fig. 4: Distributions of the true state variables for the synthetic problem in equations 26 and 27. The filled squares are the initial v values, and
? ?
the open circles the final (target) v values. The initial (v1 ,v2 ) values for the the four examples in Figure 3 (0:33,0:17) (top left), ( 0:44, 0:01)
?
(top right), (0:02, 0:80) (bottom left) and (0:27,0:80) (bottom right).

Fig. 5: Network predictions at the final epoch ( = 8) for a network with only one measured state variable ( v2 ) (problem 2). The v1 state variable
has been omitted from the network, leading to poorer performance. The rms error is 0.117.

We have tested the network on a number of other problems. In one problem we used a network with two measured
state variables and one (redundant) unmeasured state variable. The predictions for v1 and v2 were almost as good
as in problem 1, and the unmeasured state variable was unused, as with one of the state variables in problem 6. We
( )
have also had success using variable t terms and more complex synthetic problems in which the time derivatives
were non-linear functions of the state variables.

7. Conclusions
We have introduced a discrete time recurrent neural network for modelling dynamical systems. The network archi-
tecture is very general, and should be applicable to a wide range of real-world problems. We have shown that the
network is capable of learning a dynamical system based on temporally sparse measurements of the state variables.
The network can learn from multiple temporal patterns which may be sampled at non-constant time intervals. We
have show how the network can infer the existence of a relevant but omitted state variable using an unmeasured state
variable for which no target data is provided. The unmeasured state variable was shown to be well correlated with
the omitted state variable, thus allowing us to give it a physical interpretation. This ability is likely to be useful in
practical situations, such as forging, in which some important state variables can often not be measured. Our model
allows the evolution of these usually “hidden” variables to be monitored.
It should be noted that the synthetic problem we presented was a noise-free problem. Noise could be introduced
( )
into any of the external inputs, the initial and target state variables and the t time steps. This, and the application
to real forging data, will be the subject of future work.
Successful application of the model requires that the separations between the measurement epochs, t , are ( )
9
Fig. 6: State variable sequence predictions for a network with only one state variable (v2 ) (problem 2). The true v2 sequences (dashed lines)
are the same as in Figure 3. The solid lines are the corresponding v2 sequences predicted by the network. The sequences have been offset by 1
vertical unit to aid comparison with other diagrams.

small compared to the characteristic time scale of the dynamical system being modelled. This is necessary to satisfy
the approximation in equation 2. If measurements cannot be obtained at sufficient frequency, this could be accom-
modated by adding an extra set of output nodes to give higher order terms in the Taylor expansion: these nodes give
()
@ 2 v  =@ 2 and would be connected to the state variables with connection strength t 2 . ( )
Acknowledgements
This work was supported by the DERA and the EPSRC (grant number GR/L10239).

References
[1] Bailer-Jones C A L, Sabin T J, MacKay D J C and Withers P J 1997 Prediction of deformed and annealed
microstructures using Bayesian neural networks and Gaussian processes Proceedings of the Australasia Pa-
cific Forum on Intelligent Processing and Manufacturing of Materials vol 2, eds T Chandra et al. (Watson
Ferguson & Co., Brisbane) pp 913–919
[2] Bailer-Jones C A L, MacKay D J C, Sabin T J, and Withers P J 1998 Static and dynamic modelling of materials
forging Australian Journal on Intelligent Information Processing Systems 5(1) 10–17
[3] Chakraborty K, Mehrotra K, Mohan C K and Ranka S 1992 Forecasting the behaviour of multivariate time
series using neural networks Neural Networks 5 961–970
[4] Elman J L 1990 Finding structure in time Cognitive Science 14 179–211
[5] Jordan M I 1986 Attractor dynamics and parallelism in a connectionist sequential machine Proc. of the Eight
Ann. Conf. of the Cognitive Science Society (Erlbaum, Hillsdale NJ) 531–546
[6] D J C MacKay 1995 Probable networks and plausible predictions – a review of practical Bayesian methods
for supervised neural networks Network: Computation in Neural Systems 6 469–505

10
Fig. 7: State variable sequence predictions for a network with one measured state variable (v2 ) and an unmeasured state variable (problem 3).
The lower and upper dashed lines in each panel are the same true v1 and v2 sequences (respectively) as shown in Figure 3. The lower solid line
is the sequence of the unmeasured state variable. The upper solid line is the sequence for v2 as predicted by the network. The v2 sequences
(network and true) have been offset by 1 vertical unit.

[7] Nerrand O, Roussel-Ragot P, Urbani D, Personnaz L and Dreyfus G 1994 Training recurrent neural networks:
why and how? An illustration in dynamical process modeling IEEE Transactions on Neural Networks 5(2)
178–184
[8] Parlos A G, Chong K T and Atiya A F 1994 Application of the recurrent multilayer perceptron in modelling
complex process dynamics IEEE Transactions on Neural Networks 5(2) 255–266
[9] Pearlmutter B A 1989 Learning state space trajectories in recurrent neural networks Neural Computation 1
263–269
[10] Pineda F J 1987 Generalization of back-propagation to recurrent neural networks Physical Review Letters
59(19) 2229–2232
[11] Pineda F J 1988 Dynamics and architecture for neural computation Journal of Complexity 4 216–245
[12] Robinson A J and Fallside F 1991 A recurrent error propagation network speech recognition system Computer
Speech and Language 5 259–274
[13] Rumelhart D E, Hinton G E and Williams R J 1986 Learning internal representations by error propagation
Parallel distributed processing: explorations in the microstructure of cognition eds D E Rumelhart, J L Mc-
Clelland and the PDP Research Group (MIT Press, Cambridge MA) pp 318–362
[14] Stornetta W S, Hogg T and Huberman B A 1988 A dynamical approach to temporal pattern processing Neural
Information Processing Systems ed D Z Anderson (AIP, New York) pp 750–759
[15] Williams R J and Zipser D 1989 A learning algorithm for continually running fully recurrent neural networks
Neural Computation 1 270–280

11
Fig. 8: Same plot as in Figure 7 but using a network trained from different initial random weights (problem 4).

12
Fig. 9: Network predictions at the final epoch ( = 8) for a network with one measured state variable ( v2 ) and an unmeasured state variable
(problem 4). (a) Network v2 plotted against true v2 . The rms error is 0.0095 (b) The unmeasured state variable learned by the network plotted
against the v1 state variable, no values of which were given to the network in training.

13
Fig. 10: State variable sequence predictions for a network with one measured state variable (v2 ) and two unmeasured state variables (problem
5). The lower and upper dashed lines in each panel are the same true v1 and v2 sequences (respectively) as shown in Figure 3. The upper solid
line at  = 0 is the sequence for v2 as predicted by the network. The other two solid lines are the sequences of the unmeasured state variables.
The v2 sequences (network and true) have been offset by 1 vertical unit.

14
Fig. 11: Network predictions at the final epoch ( = 8) for a network with one measured state variable ( v2 ) and two unmeasured state variables
(problem 5). (a) The two unmeasured state variables (u1 as squares, u2 as stars) plotted against the true v1 state variable. (b) uc = 0:244u1 +
0:970u2 is the optimum linear combination of u1 and u2 which gives the best correlation with v2 . (c) Network v2 against true v2 . The rms
error is 0.0119.

15
Fig. 12: Same plot as in Figure 10 but using a network trained from different initial random weights (problem 6).

16
Fig. 13: Network predictions from problem 6 at the final epoch ( = 8). (a) Network v2 plotted against true v2 . The rms error is 0.0111. (b)
One of the unmeasured state variables learned by the network plotted against the v1 state variable, no values of which were given to the network
in training.

17

View publication stats

You might also like