Constructing NARMAX Models Using ARMAX
Constructing NARMAX Models Using ARMAX
Abstract
This paper outlines how it is possible to decompose a complex non-linear modelling prob-
lem into a set of simpler linear modelling problems. Local ARMAX models valid within
certain operating regimes are interpolated to construct a global NARMAX (non-linear
NARMAX) model. Knowledge of the system behavior in terms of operating regimes is
the primary basis for building such models, hence it should not be considered as a pure
black-box approach, but as an approach that utilizes a limited amount of a priori system
knowledge. It is shown that a large class of non-linear systems can be modelled in this way,
and indicated how to decompose the systems range of operation into operating regimes.
Standard system identi
cation algorithms can be used to identify the NARMAX model,
and several aspects of the system identi
cation problem is discussed and illustrated by a
simulation example.
1 Introduction
Modelling complex systems using
rst principles is in many cases resource demanding. In
some cases our system knowledge is so limited that detailed modelling is dicult. In other
cases, the instrumentation and logged data from the system are so sparse or noisy that
it is dicult to identify a large number of unknown physical parameters in the model.
Examples of this are found in e.g. metallurgical and biochemical process industry.
In some cases, resources can be saved by using black-box models describing the input/output
behavior of the system. Such models represents the controllable and observable part of the
system. The structure and parameters of black-box models has in general no direct inter-
pretation in terms of the physical properties of the system. The ARMAX model is a well
known linear input/output model representation. The NARMAX (Nonlinear ARMAX)
model representation is an extension of the linear ARMAX model, and represents the sys-
tem by a nonlinear mapping of past inputs, outputs and noise terms to future outputs. In
this paper we discuss how NARMAX models can be represented, and in particular discuss
how a NARMAX model can be constructed from a set of ARMAX models.
1
We will concentrate on non-linear systems that are working in several operating regimes,
because systems that normally work within one operating regime may in many cases be
adequately described by a linear model. There are numerous examples of systems that must
work in several operating regimes, including most batch processes (Rippin 1989). Apart
from normal operating conditions, the control system may also have to take care of startup
and shutdown, operation during maintenance and faulty operation, which obviously lead
to dierent operating regimes.
Traditionally, the problem with multiple operating regimes is solved by non-linear
rst
principles models covering several operating regimes, gain scheduling, or simply by manual
or rule-based control of the system when operating outside the normal operating regimes.
From an engineering point of view, it may seem appealing to decompose the modelling
problem into a set of simpler modelling problems. This is exactly what we propose here:
First the system operation is decomposed into a set of operating regimes that are assumed
to cover the full range of operation we want our model to cover. Next, for each operating
regime we design a simple (typically linear) local model. It is usually not natural to de
ne
the operating regimes as crisp sets, since there will usually be a smooth transition from
one regime to another, not a jump. Hence, it makes sense to interpolate the local models
in a smooth fashion to get a global model. The interpolation is such that the local model
that is assumed to be the best model at the current operating point will be given most
weight in the interpolation, while neighboring local models may be given some weight, and
local models corresponding to distant operating regimes will not contribute to the global
model at that operating point. To do the smooth interpolation at a given operating point,
we need to know which of the local models describe the system well around that operating
point. For that purpose, to each local model we associate a local model validity function,
i.e. a function that indicates the relative validity of the local models at a given operating
point.
The use of local linear models without interpolation, i.e. piecewise linear models, have been
suggested by several authors, including Skeppstedt, Ljung & Millnert (1992), Skeppstedt
(1988), Hilhorst, van Amerongen & Lohnberg (1991), Billings & Voon (1987), and Tong
& Lim (1980). A related technique is the use of splines (Friedman 1991) for representing
dynamics models (Psichogios, De Veaux & Ungar 1992). Splines are also local models,
but unlike piecewise linear models, there are constraints that enforces smoothness on the
boundaries between the local models. Dierent variations of interpolating memories (Tolle,
Parks, Ersu, Hormel & Militzer 1992, Lane, Handelman & Gelfand 1992, Omohundro
1987) is a related local modelling technique, where a number of input/output-pairs of the
system is memorized and interpolated to give a model. Our approach can be thought of
as a generalization of these techniques, since we interpolate local models.
This paper is organized as follows: First, in section 2, we present a model representation
based on local models. Then we discuss the approximation capabilities of this represen-
tation, and show that a large class of non-linear systems can be represented. The notion
of operating regimes is introduced and we present a general result guiding the choice of
operating point vector. Thereafter, we discuss some practical aspects of modelling using
local models in section 3, and some aspects of system identi
cation in section 4. In sec-
tion 5, the concepts are illustrated by a simulation example, and section 6 contains some
discussions and conclusions.
2
2 Model Representation
The NARMAX model representation
y (t) = f (y (t ; 1) ::: y(t ; ny ) u(t ; 1) ::: u(t ; nu ) e(t ; 1) ::: e(t ; ne )) + e(t) (1)
is shown by Leontaritis & Billings (1985) and Chen & Billings (1989) to represent a large
class of discrete-time nonlinear dynamic systems. Here y (t) 2 Y Rm is the output
vector, u(t) 2 U Rr is the input vector, and e(t) 2 E Rm is equation error. We
introduce the (m(ny + ne ) + rnu )-dimensional information vector
(t ; 1) = y T (t ; 1) ::: yT (t ; ny ) uT (t ; 1) ::: uT (t ; nu ) eT (t ; 1) ::: eT (t ; ne )]T
where (t ; 1) is in the set = Y ny U nu E ne . This enables us to write equation (1)
in the form
y(t) = f ( (t ; 1)) + e(t) (2)
Provided that necessary smoothness conditions on f : ! Y are satis
ed, a general way
of representing functions is by series expansions. Using a 1st order Taylor-series expanded
about the systems equilibrium point yields a standard ARMAX model. Second-order
Taylor-expansions are possible, while higher-order Taylor-expansions are not very useful
in practice because the number of parameters in the model increases drastically with the
expansion order, and because of the poor extrapolation and interpolation capabilities of
higher-order polynomials. Splines oers a solution to this problem, but the approxima-
tion in higher dimensional spaces may be dicult due to the smoothness constraints on
the boundaries. Chen, Billings & Grant (1990a) have proposed to use a sigmoidal neu-
ral network expansion, Billings & Voon (1987) uses a piecewise linear model, and Chen,
Billings, Cowan & Grant (1990b) have applied a radial basis function expansion as a means
of representing f . A generic model representation based on local models was introduced
in (Johansen & Foss 1992b, Johansen & Foss 1992a), inspired by work by Stokbro, Hertz
& Umberger (1990) and Jones et al. (1991). We will in the following study this model
representation in detail.
3
gives the approximation
NX
;1
f ( ) = f^i ( )w~ i()
^ (6)
i=0
In this equation, we can interpret w~i as a function that gives a value close to 1 in parts
of where the function f^i is aPgood approximation to f , and close to zero elsewhere. By
de
nition of w~i we know that Ni=0;1 w~i ( ) = 1 for all 2 , and we call the functions w~i
interpolation functions because they are used to interpolate local models f^i . We call f^i a
local model since it is assumed to be an accurate description of the true f locally (where
~i is not close to zero).
The set of all functions of the form (6) with local models of polynomial order p and smooth
interpolation functions is denoted
( NX
;1 )
F~p = f^ : ! Y j f^() = f^i ( )w~ i( )
i=0
At the extreme, 0th order Taylor-expansions of f about i 2 i may be used to de
ne f^i :
f^i ( ) = f (i) = i (7)
where i is a parameter vector. Such a simple local model is closely related to an interpo-
lating memory (Tolle et al. 1992) and requires a large number of interpolation functions,
since this means that the value f (i) is extrapolated locally. This case is in fact identi-
cal to neural networks with localized receptive
elds, (Moody & Darken 1989, Stokbro et
al. 1990). Considering fw~igNi=0;1 as a set of basis-functions, the method is also similar to
radial basis-function expansions (Broomhead & Lowe 1988), for the following reason: If
the functions ~i are chosen as local radial functions, the normalized function w~i de
ned by
(5) will not be radial in general, but it will qualitatively have much the same shape and
features as ~i, except near the boundary of .
A 1st order Taylor-expansion of f about i provides better extrapolation and interpolation
than the 0th order expansion (7). Assuming the 1st derivative of f exists, the local models
are given by
f^i ( ) = f (i) + rf (i)( ; i) = i + i ( ; i ) (8)
where i is a parameter vector and i is a parameter matrix. Observe that (8) is actually
an ARMAX model resulting from a linearization about i . Both the Weighted Linear
Maps of Stokbro et al. (1990), Stokbro (1991) and Stokbro & Umberger (1990) and the
Connectionist Normalized Linear Spline Networks of Jones et al. (1991) and Jones, Lee,
Barnes, Flake, Lee, Lewis & Qian (1989) uses a 1st order expansion locally. This repre-
sentation makes it possible to build a NARMAX model by interpolating between a set of
ARMAX models.
Higher order local models can of course also be used. Furthermore, there is no requirement
that all the local models should have the same structure. Some of the local models may
be based on
rst principles modelling, while others may be generic black-box models.
Johansen & Foss (1992c) uses this approach to integrate
rst principles models with neural
network type models.
Approximation Properties
It seems reasonable that the approximation can be made arbitrary good by choosing a
sucient number of local models. This is indeed the case, as illustrated in the following.
4
We use the following norm to measure the approximation accuracy
jjf ; f^jj1 = sup jjf () ; f^()jj2
2
where jj jj2 denotes Euclidian norm.
The (p + 1)th derivative of the vector function f at the point is denoted by rp+1f ( ).
Assume f is continuously dierentiable p + 1 times, and ff^i gNi=0;1 are local models equal
to the
rst p terms of the Taylor-series expansion of f about i . For any 2 , we have
NX
;1
f ( ) ; f^( ) = (f ( ) ; f^i ( ))w~ i( )
i=0
If we assume jjrp+1f ( )jj < M for all 2 , where jj jj denotes the induced operator
norm, we obtain by Taylors theorem
NX;1 M
^
jjf () ; f ()jj2 < jj ; ijjp2+1 w~i()
i=0 ( p + 1)!
In order to ensure that this norm is smaller than an arbitrary > 0, we must ensure that
for any 2 the following condition holds
NX
;1 NX
;1
jj ; ijjp2+1~i() < (p M
+ 1)! ~i ( ) (9)
i=0 i=0
De
ning the set of functions fgi : ! RgNi=0;1 by
gi ( ) = jj ; i jjp2+1 ; (p M
+ 1)!
and rewriting (9) gives the following condition that must hold for any 2
NX
;1
gi()~i( ) < 0 (10)
i=0
or equivalently if we divide
P
(10) by Ni=0;1 ~i ( )
NX
;1
gi ()w~ i( ) < 0 (11)
i=0
The problem is now to
nd the conditions on N and the functions f~igNi=0;1 to ensure that
equation (10) holds for any given > 0. A geometric interpretation of (10) is given in Figure
1. Certainly, this equation holds if the negative contribution of one term gi ( )~i( ) in (10)
dominates the (possibly positive) contribution of all other terms. A necessary condition
is gi ( )~i( ) ! 0 as jj jj2 ! 1. This will certainly be ensured if we choose ~i as an
exponential or Gaussian function.
Notice that the shape of the gi -functions are
xed and given by the speci
cations. We are,
however, free to choose the location and number N of local models. Let us choose the set
figNi=0;1 so large and \suciently dense in " that at least one of the functions fgigNi=0;1
will be negative at any 2 . Then the functions fgi gNi=0;1 are
xed, and we must choose
the ~i -functions such that (10) hold.
5
This can be done in several ways. In the limit when the width of the ~i-functions go to
zero the interpolation functions w~i will approach step-functions as shown in Figure 2. The
model will then approach a piecewise constant model if p = 0, a piecewise linear model if
p = 1, etc. In this limit, at any 2 there will exist a j such that
(
w~i ( ) = 10 IfIf ii = j
6= j
By the choice of figNi=0;1 we know that gj ( ) < 0, and since w~i ( ) = 0 for i 6= j , (11) will
hold. We can now provide a result for the case when is a bounded set:
Theorem 1 Suppose given any integer p 0, and suppose f has continuous (p + 1)-th
derivative. If is bounded, then for any > 0 there is a f^ 2 F~p (with nite N , which
may depend on ) such that
jjf ; f^jj1 < (12)
Proof: Since rp+1f () is continuous, it is bounded (by M < 1) on . Since is
bounded, a
nite N is sucient to ensure that one gi -function is negative at any point.
Since N is
nite, we do not have to go to the limit and make fw~i gNi=0;1 step-functions, but
can stop when we are suciently close. Then f~igNi=0;1 can be chosen as smooth functions
such that (11) holds. Since 2 was arbitrary, the theorem is proved.
2
This is an existence theorem. However, the proof is constructive and gives indications on
how to construct the approximator. In order to use this proof to formulate an upper bound
on the approximation error, we introduce the following de
nition of distance between sets,
similar to the Haussdorf metric:
Denition 1 Assume A and B are two nonempty subsets of a vector space. Then the
distance between the sets is dened as
D(A B) = ainf sup jja ; bjj2
2A b2B
2
The crux in the proof of Theorem 1 is that at any point 2 one of the gi -functions is
negative and that the ~i-functions are chosen such that at any point 2 , a negative
term gi ( )~i( ) will dominate the sum (10). At least one gi -function will be negative at
any 2 if the following condition holds
p+11
D(f g ) (p + 1)!
i M (13)
If the set fi g is dense in , this distance will be zero. The term \suciently dense" used
informally above means that the set figNi=0;1 should be chosen such that (13) holds for
the given .
Theorem 2 Suppose given an integer p 0. If is bounded and f has bounded (p +1)-th
derivative, i.e. jjrp+1f ( )jj M for all 2 , then for any f^ 2 F~p with nite N and
suciently narrow functions f ~i gNi=0;1 , an upper bound on the approximation error is given
by
jjf ; f^jj1 (p +M1)! (D(fig ))p+1 (14)
6
Proof: (13) will hold for equal to the right-hand side of 14). From the previous discus-
sion, it is evident that (11) holds for any 2 . Hence, jjf ; f^jj1 will be bounded by ,
and the result follows.
2
This is under the condition that the ~i -functions are chosen narrow. This bound is conser-
vative, meaning that for ~i -functions that are not too narrow and not too wide, one may
expect better accuracy. However, if the functions f~igNi=0;1 are not narrow, the result does
not hold.
From (14) we see that if the polynomial order p of the local models is increased, then the
accuracy will improve. If is not bounded and M > 0, N must be in
nite in order to
guarantee a bounded error.
If f does not satisfy the smoothness conditions in Theorem 1, the proof obviously does
not hold. If, however, f is such that it can be approximated arbitrary well by a su-
ciently smooth function, then we can show that f can be approximated arbitrary well by
interpolating local models. In patricular we have:
Corollary 1 The results of Theorem 1 also holds if the smoothness assumption on f is
relaxed to assuming only continuity. In other words, the set F~p is dense in the set of
continuous functions from into Y .
Proof: By the Weierstrass approximation theorem, e.g. (Stromberg 1981), for any >
0 there exists a polynomial f~ such that jjf ; f~jj1 =2. By theorem 1, f~ can be
approximated by a f^ 2 F~p on the bounded set such that jjf~ ; f^jj1 < =2. Using the
triangle inequality we get jjf ; f^jj1 < .
2
Example 1
Assume p = 1, i.e. the local models are ARMAX models. Then (14) can be written
jjf ; f^jj1 M2 (D(fig ))2 = (15)
Example 2
With a simple example we illustrate the use of Theorem 2. Consider the function f :
0 2] ! R given by f ( ) = 2 + 1. Assume that we have two local linear models located
7
at 0 = 0:5 and 1 = 1:5. Then D(f0:5 1:5g 0 2]) = 0:5, p = 1 and M = 2. Theorem
2 predicts the bound = 0:25 on the approximation accuracy. As shown by Figure
3, this bound is exact when using in
nitly narrow functions ~i, i.e. a piecewise linear
approximation. The reason for this is that M = f 00 ( ) = 2 for all , hence there is no
regions where f is \less nonlinear". As we shall see later, better approximations can be
achieved using well-chosen ~i-functions. From this
gure we also see that the local linear
models are not chosen as a
rst order Taylor-expansion, but chosen on the basis of e.g. a
least squares regression, improvement might also be achieved.
2
Since the system function f can be approximated arbitrary well, we are able to make
arbitrary good prediction on a
nite horizon if there is no noise, provided the intial values
are correct and the inputs and outputs are such that they give vectors that remain in
(Polycarpou, Ioannou & Ahmed-Zaid 1992). However, it is well known that the solution
to some dierence equations are sensitive with respect to intial values or modelling errors.
Examples of such systems are chaotic or unstable systems.
Operating regimes
In the rest of this paper we will usually assume p = 1, i.e. we use linear ARMAX models
locally to build a non-linear NARMAX model. In the representation (6) the interpolation
functions fw~igNi=0;1 are de
ned on the set . This is a subset of the information-space.
If the information-space has high dimension (as it ofte has), the curse of dimensionallity
problem arises. This problem was
rst described by Bellman (1961) is essentially that
the number of local models needed to uniformly cover a region of this space increases
exponentially with the dimension of the space. In practise, uniform coverage it usually not
necessary, but the problem is still severe. In some cases the interpolation functions may
be de
ned on a space of smaller dimension. This is our motivation for introducing the
terms operating regime and operating point. First, we de
ne ! to be the set of operating
points. Motivated by the fact that we want to model a nonlinear system with a set of
linear models, it is convenient to de
ne an operating regime as a subset of ! where the
system behaves approximately linearly.
Denition 2 An operating regime is a set of operating points !i ! where the system
behaves approximately linearly.
A model validity function i : ! ! 0 1] is smooth and satis
es i( )
1 for 2 !i , and
goes to zero outside !i . The interpolation functions wi : ! ! 0 1] are now de
ned as
wi( ) = PN;i1( )
j =0 j ( )
assuming that at every operating point 2 !, not all model validity functions i vanish.
In many cases there will exist a function H : ! ! such that at any t will (t) = H ( (t)).
The function H will typically be a projection, i.e. ! will be in a space of lower dimension
than . In cases where the operating point is calculated on the basis of
ltered or estimated
quantities, the relationship between (t) and (t) is more complex, and must be described
by an operator H. This may be the case when is estimated using a recursive algorithm or
a recursive
lter to depress noise. Although very important, this complicates the analysis
8
considerably, and we will not consider this case here, but leave it as a topic for future
research.
To summarize, the representation we address at this stage is
NX
;1
y^(t) = f^( (t ; 1)) = f^i ( (t ; 1))wi( (t ; 1)) (16)
i=0
where the local models
f^i ( (t ; 1)) = i + i ( (t ; 1) ; i) (17)
are ARMAX models. We de
ne the set
( NX
;1 )
Fp = f^ : ! Y j f^( ) = f^i ( )wi( )
i=0
where p is the polynomial order of f^i , the interpolation functions fwi gNi=0;1 are smooth, and
= H ( ). Now we want to state some general results regarding the transform H from the
information vector to the operating point vector. In general, f can be written as an ane
function of some of its argument. We rearrange the elements of into T = LT NT ] such
that
f ( ) = f (L N ) = f1 (N ) + f2(N )L (18)
Assume L and N are the subsets of the information-space corresponding to L and N
respectively. f1 : N ! Rm and f2 : N ! Rmm are non-linear vector- and matrix-
valued functions, respectively. Our principal result guiding the choice of is the following,
which indicates that must be chosen such that it captures the systems non-linearities:
Theorem 3 Assume f given in (18) is continuous, and is bounded. Then for any > 0
there is a f^ 2 F1 with = N , and nite N such that jjf ; f^jj1 < .
Proof: Fix an arbitrary 2 such that T = LT NT ].
NX;1
jjf () ; f^()jj2 = jjf (N L) ; f^i(N L)wi(N )jj2
i=0
NX
;1
= jj (f1 (N ) + f2 (N )L ; f^i (N L))wi(N )jj2
i=0
NX;1
= jj (f1 (N ) ; f^Ni (N ) + f2 (N )L ; f^Li (L ))wi(N )jj2
i=0
In the last line we split the linear function f^i : ! Rm into two linear functions f^Ni :
N ! Rm and f^Li : L ! Rm . Now we choose f^Li (L) =
i L where
i is a not yet
speci
ed constant parameter matrix. Then we have
NX
;1
jjf () ; f^()jj2 jj (f1 (N ) ; f^Ni (N ) + (f (N ) ;
i )L)wi (N )jj2
i=0
NX;1 X N ;1
jj (f1 (N ) ; f^Ni (N ))wi(N )jj2 + jj (f2 (N ) ;
i )Lwi (N )jj2
i=0 i=0
NX
;1 NX
;1
jjf1(N ) ; f^Ni (N )wi(N )jj2 + jjLjj2 jjf2(N ) ;
i wi(N )jj2
i=0 i=0
9
The
rst term in this equation can be made arbitrary small by Corollary 1 with p = 1
since f^Ni is linear. Since is bounded, the second term can be made arbitrary small by
the same corollary with p = 0 through the choice of
i . Hence, for any > 0 we can make
jjf () ; f^()jj2 < and since is arbitrary we get jjf ; f^jj1 < .
2
Using the same notation as before, the attainable approximation error is bounded by
jjf ; f^jj1 M2 (D(f ig !))2 + 2jjLjjD(f ig !) = (19)
where
jjLjj = sup jjLjj2
L 2L
The motivation for introducing the operating point is that in many cases this vector
may be of a signi
cantly lower dimension than . With a
xed N the
rst term in (19)
will be signi
cantly smaller than the corresponding term
M (D(fig ))2
2
However, the second term in (19) will make the error increase, but in most cases when !
is of smaller dimension than , the approximation (16)-(17) will give better accuracy than
(6), (8). Another important fact is that a low dimension of ! makes it easier to partition
the set into operating regimes.
Example 3
For example, if f is linear in the control variables u(t ; 1), then need not contain any
u(t ; 1)-terms. If we have the system
y(t) = f (y (t ; 1) u(t ; 1)) = f1 (y (t ; 1)) + f2 (y(t ; 1))u(t ; 1)
we can choose (t ; 1) = y (t ; 1) without loosing accuracy in the approximation.
2
We now generalize this result for local expansions of (polynomial) order p. We split into
two parts and rearrange T = LT HT ] such that
f ( ) = f (L H ) = fH 1 (H ) + fH 2(H )fL (L ) (20)
where fL : L ! Rm is of polynomial order less than or equal to p, fH 1 : H ! Rm , and
fH 2 : H ! Rmm may be of higher order.
Theorem 4 Suppose f given in (20) is continuous and is a bounded set. Then for any
> 0 there is a f^ 2 Fp with = H , and nite N such that jjf ; f^jj1 < .
Proof: The proof follows the same idea as the proof of Theorem 3, but requires some
tedious notation, and is therefore ommitted.
2
10
Some Comparisions
Using local linear models we can write the model representation (16) - (17) as
NX
;1
y^(t) = (i + i ( (t ; 1) ; i))wi( (t ; 1))
i=0NX
;1 ! NX;1 !
= (i ; i i )wi( (t ; 1)) + i wi ( (t ; 1)) (t ; 1)
i=0 i=0
= ( (t ; 1)) + ( (t ; 1)) (t ; 1)
This means that the non-linear model can be written as an apparently linear model, where
the parameters are dependent on the operating point. Priestley (1981) introduced State-
dependent models which can be written
y^(t) = (x(t ; 1)) + (x(t ; 1)) (t ; 1) (21)
where x is the \state-vector", is a state-dependent vector, and is a state-dependent
matrix. In general x = was suggested, but it was also observed that this might be
redundant, so a simpler vector may be used to describe the parameter dependence. The
present approach with x = has obvious similarities. Billings & Voon (1987) discusses
the use of models with signal-dependent parameters, which are similar to (21) this x = ! ,
where ! (t) is the auxilliary signal. In (Billings & Voon 1987) polynomials was used to
de
ne the dependence of the parameters on the auxilliary signal, i.e. (! (t)) and (! (t))
are polynomials in ! (t). A similar approach was proposed by Cyrot-Normand & Mien
(1980). Our approach is also similar, but system knowledge is applied to choose the i -
functions, which again de
nes ( (t)) and ( (t)). The Threshold AR Model by Tong &
Lim (1980) can also be written in the form (21) with x(t ; 1) = y (t ; 1)
(
(y(t ; 1)) = 1 If y(t ; 1) 2 Y1
If y(t ; 1) 2 Y2
( 2
(y (t ; 1)) = 1 If y (t ; 1) 2 Y1
2 If y (t ; 1) 2 Y2
where Y = Y1 Y2 . Here the parameters are switched between two possible parameter sets,
and the decision is based on the value of y (t ; 1). The resulting model is a piecewise linear
model and related to our approach if (t ; 1) = y (t ; 1) and the interpolation functions
are step-functions.
The notion of operating points and model validity functions oers a complementary method
for parameterizing the state-dependence of the parameters given in (Priestley 1988).
Takagi & Sugeno (1985) suggested a fuzzy logic based technique for combining in a smooth
fashion a set of linear models into a global model. It turns out that if the operating regimes
!i are viewed as fuzzy sets with membership functions equal to the model validity functions,
then inference on a rulebase of the form
IF (t ; 1) 2 !i THEN y^(t) = f^i ( (t ; 1))
gives a resulting global model of the same form as the one analysed in the present paper,
provided the fuzzy operations are properly de
ned. This suggests the use of fuzzy sets
and rules as a means of de
ning the operating regimes and local model validity functions.
11
This is appealing since this gives a direct method of representing the empirical knowledge
the engineers and operators have about the system and local models.
A related non-linear modelling approach is radial basis-functions (RBF), (Powell 1987),
(Broomhead & Lowe 1988). Using RBF's, a non-linear function may be modelled as
NX
;1
f^( ) = i ri(jj ; ijj)
i=0
where ri : R+ ! R is typically chosen as a Gaussian function. The relationship between
some of these approaches is best illustrated by an example.
Example 4
We consider again the function in Example 2, and the following 4 modelling approaches:
1. Two piecewise linear models, as Example 2, centered at 0 = 0:5 and 1 = 1:5. This
may also be interpreted as a Thresholded AR model.
2. We choose Gaussian model validity functions
2!
1
i ( ) = exp ; 2 # ; i (22)
i
with 0 = 0:5 1 = 1:5 #i = 0:52, and use 2 local linear models.
3. 5 local 0th order models centered at 0 = 0 1 = 0:5 2 = 1 3 = 1:5 4 = 2, and
Gaussian model validity functions with #i = 0:52,
4. A radial basis-function expansion with 5 Gaussian basis-functions centered at 0 =
0 1 = 0:5 2 = 1 3 = 1:5 4 = 2, and #i = 0:52.
Linear regression is used to estimate the model parameters, and the results are shown in
Figure 4. By comparing Figure 4a with 4b, it is obvious that interpolating local linear
models using well chosen model validity functions can improve the accuracy compared to
piecewice linear models.
Notice that f now is de
ned on ;1 3], while data on 0 2] is used for parameter estima-
tion. The extrapolation capabilities can thus be evaluated, and we see that the local linear
approximations give 1st order extrapolation, as would be expected, while the local 0th
order models give 0th order extrapolation. The RBF approach does not given any extrap-
olation at all, since all basis-functions go to 0. A feed-forward neural net with one hidden
layer (of sigmoidal basis-functions) would give an extrapolation qualitatively similar to the
0th order models in
gure 4c. As we see, there are fundamental dierences concerning the
extrapolation capabilities.
2
3 Modelling
The representation (16-17) is appealing since ARMAX models are simple. We represent
a complex nonlinear system with a number of simple linear systems. A piecewise linear
12
model will have the same features, but unless we enforce the model to be continuous on
the boundary between the local models, the resulting global model may be discontinuous,
which may be undesirable in some cases. Enforcing continuity poses restrictions on the
parameter space, giving lost representational power of the model class. The same problem
arises to an even larger extent using e.g. cubic splines. Unlike the piecewise linear model,
the model (16) will be smooth when the model validity functions are chosen as smooth
functions. In practice, there are at least 4 ways a NARMAX model can be constuced using
local ARMAX models and interpolation:
1. First we choose a set of operating regimes !i that correspond to the normal equi-
librium points and major transient operating regimes of the system in question.
This means that we partition the set of operating points into parts where be believe
that the system will behave linearly. Then we perform experiments on the system
and identify a local ARMAX model f^i for each operating regime, using cost indices
corresponding to each local model
1 mi
X T
Ji = m y(t) ; f^i ((t ; 1)) $i y (t) ; f^i ( (t ; 1)) (23)
i t=1
where mi is the number of data-points for regime !i , and $i is a scaling matrix.
Then the local models are integrated using system knowledge to choose sensible
model validity functions fi gNi=0;1 such that the set ! is covered, as shown in
gure 5.
The choice of these functions will strongly in%uence the accuracy of the global model,
since the local ARMAX models are identi
ed before the functions i are chosen.
2. Instead of choosing the model validity functions figNi=0;1 empirically, we may try
to
nd optimal validity functions after we have found the local ARMAX models.
Choosing
xed structures for the functions figNi=0;1 is necessary to make the problem
nite-dimensional. Keeping the parameters of the identi
ed ARMAX models
xed,
we may search for optimal parameters of figNi=0;1 using the same data used for
identifying the ARMAX models, and a global performance index. This procedure
leads to a two-step optimization procedure.
3. A more direct approach is to
rst choose the model validity functions corresponding
to the operating regimes !i using system knowledge. Keeping the model validity
functions
xed, we minimize the global index
1 m
X T
J= y(t) ; f^((t ; 1)) $ y (t) ; f^( (t ; 1))
m t=1 (24)
with respect to all parameters in the local models. Here m is the number of data-
points available, and $ is a suitable scaling matrix. Now the shape of the model
validity functions will be taken into consideration when
nding the optimal param-
eters for the local models. This has two side eects: First, the accuracy of the
global model is less sensitive to the choise of fi gNi=0;1. Second, the local models (17)
are in%uenced by the user-speci
ed functions figNi=0;1 . Hence, they are no longer
linearizations of f about figNi=0;1 .
4. An obvious improvement to methods 2 and 3 would be to search for ARMAX and
local model validity function parameters simultaneously. This leads to a complex
non-linear programming problem.
13
Required A Priori System Knowledge
When building models, dierent kinds of system knowledge must be available. For AR-
MAX models, one must
rst estimate the dominant time-constants of the system to choose
the sampling interval. Second, the ARMAX model order must be chosen and the structure
of the system disturbances must be known in order to select the MA-part of the model.
When building NARMAX models, in addition it is necessary to
nd a suitable structure
for the system function f . First principles knowledge can be applied here, if available.
We have proposed a generic structure that do not require
rst principles modelling. It is,
however, not a completely black-box approach, as some limited system knowledge must be
included. In order to use the local modelling approach introduced here, a priori knowledge
in terms of operating regimes must be available. One must be able to estimate operating
regions in which the system will behave approximately linearly.
In general, the technique with local models and interpolation may be used in an elegant
fashion to integrate
rst principles models with black-box models, since it is completely
feasible that some of the local models may be derived based on
rst principles, while
others may be black box models (Johansen & Foss 1992c). In operating regimes where
the dominating physical pheomena are well understood and possible to model, and in
operating regimes where the data is so spares that black-box modelling is not possible, it
makes sense to use local
rst principles models. In the remaining regimes, black-box models
can be constructed, and the proposed technique with operating regime decomposition can
be applied to integrate the dierent models. Of course, in regimes where we have limited
data and limited knowledge, modelling is impossible, and the best we can hope for is some
reasonable extrapolation of the neighboring local models into those regimes.
De
ning the operating point in a suitable manner is important. If the local models are
linear, we have shown in Theorem 3 that the operating point must capture the systems
non-linearities. Given a set of data from the system, dierent tests for linear reationships
between some inputs and the outputs is of great interest (Haber 1985). In the case with
signal-dependent piecewise linear models, it is observed by Billings & Voon (1987) that
the model may be input-sensitive. This must certainly be expected if the data used for
identi
cation does not cover the full range of operation. Input sensitivity and biased
models may also be the result if the operating point vector is not suitably chosen, i.e. only
some of the non-linearities are captured by the operating point.
4 Identication
First we consider identi
cation of local model parameters based on the local cost indices
(23), and second we consider the global cost index (24). Finally, we consider identi
cation
of local model validity function parameters and model structure identi
cation.
14
The cost index Ji associated with the local model can be written as
Xmi
J = 1 T (t)$ (t)
i mi t=1 i ii (25)
Consider the local ARMAX model (17). The local models are parameterized by a vector
i and a matrix i . Since all local models (17) are linear functions of the parameters, the
representation is basically linear in the parameters, and standard identi
cation methods
can be applied, e.g. (Soderstrom & Stoica 1988).
Assume
rst the noise e(t) is sequentially uncorrelated. Since the information vector
(t ; 1) do not contain noise terms e(t ; 1) ::: e(t ; ne ), the model can be written on the
linear regression form
f^i ( (t ; 1)) = 'Ti (t ; 1)i (26)
where i is a parameter vector and 'i (t ; 1) is a regression matrix.
The parameters can be estimated using the least squares (LS) method. The regression
matrix, 'i (t ; 1), is a matrix of computable or measurable quantities not depending on
the parameter vector and not correlated with e(t). The least squares estimate minimizing
(25) can be written as
!;1 X !
^ i = 1 X
mi
T 1 mi
m 'i(t)$i'i (t)
i t=1 m $i 'i (t)y (t)
i t=1
(27)
In the general case when delayed noise, e(t ; 1) e(t ; 2) ::: e(t ; ne ), is included in (t),
we use the prediction error (PE) method. Since the noise is assumed not to be measurable
we do not know the values of e(t ; 1) ::: e(t ; ne ). If the model matches the true system,
then i (t) = e(t). Now i (t ; 1) depends on the parameters i since it is the prediction
error. Since 'i (t) depends on i (t ; 1) and hence i , we may conclude that the predictor
(26) is no longer linear in the parameters. Hence it is not possible to
nd a simple analytic
solution like (27). The cost indices (25) must be minimized numerically using e.g. the
Newton-Raphson algorithm
^ (i k+1) = ^ (i k) ;
k r2Ji ^(i k) ;1 rJi ^ (i k)
1. The prediction error using a good model should be uncorrelated with past prediction
errors and inputs (and future inputs if the system operates in open loop). There are
several correlation-based test, see eg. (Billings & Voon 1986).
16
2. If we consider the expected error criterion J~ on future data, we obtain (Soderstrom
& Stoica 1988)
J~ = (1 + P=m)J (32)
This depends on the number of parameters P and the numbers of data points m used
for identi
cation. This is also related to the Akaike Information Criterion, (Akaike
1969)
In general J will decrease when more parameters are introduced in the model, e.g. when
new local models are added. However, P will increase, and at some stage the increment
in 1 + P=m will be larger than the decrease in J , if m is kept
xed. The index J~ will then
increase, indicating that the quality of the model decreases. It is therefore important to
keep the number of local models at a minimum, and use as simple local models as possible.
Both these methods can be implemented o-line by an exhaustive search. Not all of
them are suited for on-line structure identi
cation, however. It requires large amounts
of computer power to test more than 2-3 model hypotheses simultaneously. Generally a
batch of data is required to perform any statistical test with some signi
cance level. A large
prediction error at some time sample may be due to noise, parameter errors or inadequate
model structure. Collecting a batch of data over some time will reduce the impact of noise.
If the batch is so large that the parameter estimator is expected to converge during the
batch, the impact of parameter errors may also be insigni
cant. Hence, if the batch is large
and the prediction error is biased or correlated, we can infer that the model structure is
not good. In our approach with local models, we must decide which local models are the
cause for the mismatch. This can be found by collecting statistics locally for every local
model. The problem of on-line structure identi
cation is discussed in (Johansen & Foss
1992c).
Time-varying systems
In case of time varying parameters the RPE-formulation (28)-(30) must be modi
ed. This
is usually done by introducing a method for arti
cially increasing the covariance matrix
estimate P so it will not go to zero. The most common schemes are linear increase, which
correspond to the Kalman
lter, exponential increase, and covariance resetting.
At one time-step we get the information vector (t). The information vector is transformed
to the vector w(t) = w0( (t)) wN ;1( (t))]T . This vector correspond to the direction
in interpolation-space where we get information. The interpolation-space consists of N
dimensions, one for each local model. If forgetting is to be avoided, we should only update
models along the direction in the model space where we get new information. This means
that we should only forget if we know that new information will arrive to compensate for
the forgotten information. At the operating regime level, this is rather simple in our case
since by construction the components of w(t) will be close to zero when the information
is not relevant for the corresponding local models. Hence, by thresholding the parameter
update, only the local models we get new information about will be updated. This leads
to a small set of local models to be updated at each time-step. Within each local model,
techniques based on the same type of reasoning can be applied to only update in the
directions in parameters-space where we get new information, (Fortescue, Kershenbaum
& Ydstie 1981, S&lid, Egeland & Foss 1985, Parkum, Poulsen & Holst 1990).
17
5 Simulation example
We will use the proposed approach to identify a model of a Continuous Stirred Tank
Reactor (CSTR) where a
rst order, exothermic chemical reaction A ! B takes place.
The system is described by the following mass- and energy-balance
V dtd cA = cAiqi ; cAqo ; V rA (33)
cpV dtd T = cpTiqi ; cpTqo + Q ; 'Hr V rA (34)
rA = k0cA exp ; ERA ( T1 ; T1 ) (35)
R
with symbols as de
ned in Table 1.
Inputs to the system are the power Q and the inlet %ow qi . Outputs are temperature T
and concentration cA . We de
ne the vectors y = 5 10;4 cA 5 10;3 T ]T and u =
50 qi 10;7 Q]T . The system is simulated as a continuous-time system. Two independent
sequences f(y (t) u(t))g1000
t=1 are collected by sampling the system every 2 minutes, see
gure
6. One set is used for parameter identi
cation only, and the other is used for model
validation only.
The reactor is open-loop unstable. There are therefore two stabilizing single-loop PI-
controllers. The reference signal to the controllers are LP-
ltered white noise signals,
ensuring the input is rich.
In our example we choose the information vector T (t ; 1) = y T (t ; 1) uT (t ; 1)]. We
have performed 4 simulations, and a least squares algorithm is used in all cases:
18
Simulation 1
First, an ARMAX model
y^(t) = 'T (t ; 1) (36)
0 1 0 1T 0 1 1
BB 0 1 CC BB 2 CC
BB y1(t ; 1) ; y1? 0 CC BB 3 CC
BB y2(t ; 1) ; y? 0 CC BB 4 CC
BB 0
2
y1(t ; 1) ; y1? C CC BBB 5 CC
= B BB y2(t ; 1) ; y2? C CC
BB u1(t ;01) ; u?1 CC BBB 6 CC
BB u2(t ; 1) ; u? 0 CC BB 7 CC
B@ 2 0 CC BB 8 CC
0 u1(t ; 1) ; u?1 A @ 9 A
0 u2(t ; 1) ; u?2 10
is
tted using linear regression on the data. The points y1? y2?]T and u?1 u?2]T are the points
the ARMAX model is linearized about, and are chosen as mean values of the respective
signals.
Simulation 2
Second, a NARMAX model with two local ARMAX models (36) is
tted. If cAi and Ti
are constant, the model (33)-(35) can be written
d y(t) = f (y(t) u(t)) = f (y (t)) + f (y (t))u(t) (37)
dt 1 2
Simulation 3
The same models structure as simulation 2, but a global cost index (24) is used to simul-
taneously identify the parameters of the two ARMAX models.
Simulation 4
Finally, we use 4 local ARMAX models and a global cost index (24). The local model
validity functions are still Gaussians, centered at 0 = 1:775 1 = 1:825 2 = 1:875, and
19
3 = 1:925 corresponding to 355 365 375 385 K respectively. The width parameter is
chosen as #i = 0:02, corresponding to 4 K .
Results
The results are summarized in table 2 (all the results are using the identi
ed model for
prediction on the data independent of the data used for identi
cation). We see that all
NARMAX approaches give signi
cantly better results than the ARMAX approach. As
would be expected, better results are achieved with a global performance index than local
indices, and increasing the number of local models also improve on the model accuracy.
One-step-ahead prediction errors are shown in
gure 7. The curves indicates that the
prediction error is considerably reduced using the NARMAX models compared to the
ARMAX models.
The covariance functions for the prediction errors are
!
E ((t) ; )((t + ) ; )T ] = r11( ) r12( ) (39)
r21( ) r22( )
Estimates of the autocorrelation functions are shown in
gure 8. It is well known that an
unbiased model gives an autocorrelation function equal to a -function (This is not a suf-
cient condition. A more detailed model validation should also consider E (u(t) ; u)((t +
) ; )T ] and higher order covariances (Billings & Voon 1986)). However, we may con-
clude from the curves that the NARMAX models strongly improve on the model accuracy
compared to the ARMAX models. We may not expect a perfect model because the model
structure is dierent than the true system: First, there is fundamental structural dierence
between the state-space system and the model based on local models and interpolation.
The low number of local models limits the accuracy. Second, there is a structure error
introduced by sampling the continuous system. Third, the simpli
ed choise of operating
point introduces a nonsystematic error.
We have also done simulations using optimization of the #i -parameters in the local model
validity functions after the local ARMAX parameters was identi
ed using local cost indices.
This gave substantial improvements, but the results was not as good as using a global index
with prechosen (not optimal) #i -parameters. Including y1 (t) in the operating point vector
also gave some improvements, but not as much as one may have expected. Hence, we can
conclude that the choise of operating point is sensible.
20
Simulation Parameters Variance:
( (t) ; )( (t) ; )T
0 0:5055 1 E
BB 1:8533 CC
BB 0:6256 CC
BB ;01:0755 :9255 C
C 9:40 ;1:77
Simulation 1:
model
ARMAX = B
BB 1:4010 CCC ;1:77 0:34 10;4
BB 0:8599 CC
B@ ;0:1189 CA
;0:0336
0 00:1758 :5985
1 0 0:3931 1
BB 1:7812 CC BB 1:9235 CC
Simulation 2: NARMAX BB 0:7470 CC BB 0:4694 CC
model with 2 local ARMAX BB ;01:0514 :3276 C
C BB ;2:9493 CC 2:10 ;0:40
models. The local models 1 = B
BB 1:2817 CC 2 = BBB C 0:1069
1:5959
CC ;0:40 0:09 10;4
are tted separately using BB 0:9842 CC BB 0:8765
CC
local prediction errors. B@ ;0:1043 CA B@ ;0:1875 CC
;0:0486 ;0:0410 A
0 00::1721 5771 1 0 0:1890
0:3865 1
BB 1:7859 CC BB 1:9255 CC
Simulation 3: NARMAX BB 0:9096 CC BB 0:3176 CC
model two local ARMAX BB ;1:4806 CC BB ;3:3892 CC
models. The local models 1 = B BB 1:294 CCC 2 = BBB
0:0185 0:1381
1:6952
CC ;00:96:19 ;00:04:19 10;4
are tted using the global BB 1:0534 CC BB 0:8493
CC
prediction error. B@ ;0:0549 CA B@ ;0:2342 CC
;0:0534 ;0:0418 A
0 00::1620 6185 1 0 0:1983
0:5659 1
BB 1:7528 CC BB 1:8131 CC
BB 0:8624 CC BB 0:7364 CC
Simulation 4: NARMAX BB ;01:0276 :9076 C
C BB ;1:7990 CC 0:38 ;0:08
model with 4 local NAR- 1 = B B C
CC 2 = BBB
0:0538 CC ;0:08 0:02 10;4
MAX models. The global
prediction error is used.
BB 11::1933 CC BB
1:3712 CC
BB ;00209 0:9867 CC
@ ;0:0467 A B@
:0573 C ;0:1131
;0:0492 A
0 00::1626 4643 1 0 0:1737
0:2971 1
BB 1:8839 CC BB 1:9696 CC
BB 0:5436 CC BB 0:1970 CC
BB ;2:7944 CC BB ;4:0579 CC
3 = B BB 1:5668 CCC 4 = BBB CC
0:0918 0:1642
1:8275 CC
BB 0:9428 CC BB 0:8471 CC
B@ ;0:1685 CA B@ ;0:2624 A
;0:0505 ;0:0433
0:1847 0:2042
Table 2: Results for ARMAX and NARMAX model
tting.
21
Decomposing the system operation into several operating regimes and using local ARMAX
models to describe each operating regime is appealing for several reasons:
A fundamental problem with all local modelling methods is the curse of dimensionallity
problem (Bellman 1961, Moody & Darken 1989, Tolle et al. 1992, Friedman 1991). In our
case, we have shown that this problem may sometimes be reduced considerably, since the
operating point may be of lower dimension than the information vector.
To summarize, we have investigated how non-linear systems can be modelled using NAR-
MAX models based on local ARMAX models. The primary result is that given a sucient
number of local models and well de
ned operating regimes, the system function can be
approximated to arbitrary accuracy. In practice, noise and the amount of data available
will limit the attainable accuracy. Standard identi
cation algorithms can easily be applied
since the proposed representation will be linear in the parameters. The empirical choice
of model validity function may however complicate the problem. Several practical aspects
of building such models are outlined and illustrated by a simulation example.
The approach falls somewhat between
rst principles modelling and pure black-box mod-
elling. Available local
rst principles models as well as a priori knowledge in terms of
operating regimes, can be incorporated in the model.
Acknowledgements
This work was supported by the Royal Norwegian Council for Scienti
c and Industrial
Research (NTNF) under doctoral scholarship grant no. ST. 10.12.221718 given to the
rst
author.
References
Akaike, H. (1969), `Fitting autoregressive models for prediction', Ann. Inst. Stat. Math.
21, 243{247.
22
Bellman, R. E. (1961), Adaptive Control Processes, Princeton Univ. Press.
Billings, S. A. & Voon, W. S. F. (1986), `Correlation based model validity tests for non-
linear models', Int. J. Control 44(1), 235{244.
Billings, S. A. & Voon, W. S. G. (1987), `Piecewise linear identi
cation of non-linear
systems', Int. J. Control 46, 215{235.
Broomhead, D. S. & Lowe, D. (1988), `Multivariable functional interpolation and adaptive
networks', Complex Systems 2, 321{355.
Chen, S. & Billings, S. A. (1989), `Representation of non-linear systems: the NARAMX
model', Int. J. Control 49(3), 1013{1032.
Chen, S., Billings, S. A. & Grant, P. M. (1990a), `Non-linear system identi
cation using
neural networks', Int. J. Control 51(6), 1191{1214.
Chen, S., Billings, S. A., Cowan, C. F. N. & Grant, P. (1990b), `Practical identi
cation of
NARMAX models using radial basis functions', Int. Journal of Control 52(6), 1327{
1350.
Cyrot-Normand, D. & Mien, H. D. V. (1980), Non-linear state-ane identi
cation meth-
ods: Application to electrical power plants, in `Proc. IFAC Symposium on Automatic
Control in Power Generation, Distribution and Protection', pp. 449{462.
Fortescue, T. R., Kershenbaum, L. S. & Ydstie, B. E. (1981), `Implementation of self-
tuning regulators with variable forgetting factors', Automatica 17, 831{835.
Friedman, J. H. (1991), `Multivariable adaptive regression splines', The Annals of Statistics
19, 1{141.
Gill, P., Murray, W. & Wright, M. (1981), Practical optimization, Academic Press, Inc.
Haber, R. (1985), Nonlinearity tests for dynamics processes, in `7th IFAC Symp. on Iden-
ti
cation and System Parameter Estimation', pp. 409{414.
Hilhorst, R. A., van Amerongen, J. & Lohnberg, P. (1991), Intelligent adaptive control of
mode-switch processes, in `Proc. IFAC Internatinal Symposium on Intelligent Tuning
and Adaptive Control, Singapore'.
Johansen, T. A. & Foss, B. A. (1992a), `A NARMAX model representation for adaptive
control based on local models', Modeling, Identication, and Control 13(1), 25{39.
Johansen, T. A. & Foss, B. A. (1992b), Nonlinear local model representation for adap-
tive systems, in `Proceeding of the Singapore Int. Conf. on Intelligent Control and
Instrumentation', Vol. 2, pp. 677{682.
Johansen, T. A. & Foss, B. A. (1992c), Representing and learning unmodeled dynamics
with neural network memories, in `Proceedings of the American Control Conference,
Chicago, Il.', Vol. 3.
Jones, R. D., Lee, Y. C., Barnes, C. W., Flake, G. W., Lee, K., Lewis, P. S. & Qian,
S. (1989), Function approximation and time series prediction with neural networks,
Technical Report 90-21, Los Alamos National Lab., New Mexico.
Lane, S. H., Handelman, D. A. & Gelfand, J. J. (1992), `Theory and development of
higher-order CMAC neural networks', IEEE Control System Magazine 12(2), 23{30.
23
Leontaritis, I. J. & Billings, S. A. (1985), `Input-output parametric models for non-linear
systems', Int. Journal of Control 41(2), 303{344.
Jones et al., R. D. (1991), Nonlinear adaptive networks: A little theory, a few applications,
Technical Report 91-273, Los Alamos National Lab., New Mexico.
S&lid, S., Egeland, O. & Foss, B. (1985), `A solution to the blow-up problem in adaptive
controllers', Modeling, Identication and Control 6(1), 39{56.
Soderstrom, T. & Stoica, P. (1988), System Identication, Prentice Hall.
Moody, J. & Darken, C. J. (1989), `Fast learning in networks of locally-tuned processing
units', Neural Computation 1, 281{294.
Nguyen, D. H. & Widrow, B. (1990), `Neural networks for self-learning control systems',
IEEE Control Systems Magazine 10(3), 18{23.
Omohundro, S. M. (1987), `Ecient algorithms with neural network behavior', Complex
Systems 1, 273{347.
Parkum, J. E., Poulsen, N. K. & Holst, J. (1990), Selective forgetting in adaptive proce-
dures, in `Proc. 11th IFAC World Congress, Tallin, Estonia'.
Polycarpou, M. M., Ioannou, P. A. & Ahmed-Zaid, F. (1992), Neural networks and on-line
approximators for discrete-time nonlinear system identi
cation, Submitted to IEEE
Trans. Control Systems Technology.
Powell, M. J. D. (1987), Radial basis function approximations to polynomials, in `12th
Biennal Numerical Analysis Conference, Dundee', pp. 223{241.
Priestley, M. B. (1981), Spectral Analysis and Time Series, Academic Press.
Priestley, M. B. (1988), Non-linear and Non-stationary Time Series Analysis, Academic
Press.
Psichogios, D. C., De Veaux, R. D. & Ungar, L. H. (1992), Nonparametric system identi-
cation: A comparison of MARS and neural nets, in `Proc. American Control Con-
ference, Chicago, Il.'.
Rippin, D. W. T. (1989), Control of batch processes, in `Proceedings DYCORD+ 89,
August, Maastrict, The Netherlands', pp. 115{125.
Skeppstedt, A. (1988), Construction of Composite Models from large data-sets, PhD thesis,
University of Linkoping.
Skeppstedt, A., Ljung, L. & Millnert, M. (1992), `Construction of composite models from
observed data', Int. J. Control 55(1), 141{152.
Stokbro, K. (1991), Predicting chaos with weighted maps, Technical Report 91/10 S,
NORDITA, Copenhagen.
Stokbro, K. & Umberger, D. K. (1990), Forecasting with weighted maps, in `Proc. 1990
Workshop on Nonlinear Modeling and Forecasting, Santa Fe Institute'.
Stokbro, K., Hertz, J. A. & Umberger, D. K. (1990), Exploiting neurons with localized
receptive
elds to learn chaos, Preprint 28, Niels Bohr Institute and NORDITA,
Copenhangen. Submitted to Journal of Complex Systems.
24
Stromberg, K. R. (1981), An Introduction to Classical Real Analysis, Wadsworth, Inc.,
Belmont, Ca.
Takagi, T. & Sugeno, M. (1985), `Fuzzy identi
cation of systems and its application to
modeling and control', IEEE Trans. Systems, Man, and Cybernetics 15, 116{132.
Tolle, H., Parks, P. C., Ersu, E., Hormel, M. & Militzer, J. (1992), `Learning control with
interpolating memories { general idead, design lay-out, theoretical approaches and
practical applications', Int. J. Control 56, 291{317.
Tong, H. & Lim, K. S. (1980), `Threshold autoregression, limit cycles and cyclical data',
J. Royal Stat. Soc. B 42, 245{292.
25
6
3 g0 g1 g2
2
1
~0 ~1 ~2
0
0 1 2
(p+1)!
M -1
-1 0 1 2 3 4 5 6 7 8 9
Figure 1: A geometric interpretation of the constraints on ~i
3
g0 g1 g2
2
1
w~0 w~1 w~2
~0 ~1 ~2
0
0 1 2
(p+1)!
M -1
-1 0 1 2 3 4 5 6 7 8 9
Figure 2: Situation when the width of ~i goes to zero.
26
6
f ( )
2
1
f^( )
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Figure 3: Approximation of f ( ) = 2 + 1 using two local linear models.
27
f ( )
10 10
5 5
0 0
-5 -5
-1 0 1 2 3 -1 0 1 2 3
(a) (b)
10 10
8 8
6 6
4 4
2 2
0 0
-1 0 1 2 3 -1 0 1 2 3
(c) (d)
Figure 4: a) Approximation of f ( ) = 2 + 1 using a piecewise linear model with two
local linear models. b) Approximation using two local linear models and Gaussian model
validity functions. c) Approximation using 5 local 0th order expansions (constant) d)
Approximation using a radial basis-function expansion with 5 Gaussian basis-functions.
28
! !2 !1
!4
!3
!5 !6
Figure 5: The set ! of operating points is covered with local models. The
gure shows
lines where the model validity functions i are constant.
29
2.5
2 y2(t)
1.5
y1(t)
1
0.5
0
0 100 200 300 400 500 600 700 800 900 1000
t
1
0.8
0.6
0.4
u1(t)
0.2
-0.2
-0.4
u2(t)
-0.6
-0.8
-1
0 100 200 300 400 500 600 700 800 900 1000
t
2.5
2
y2(t)
1.5
1
y1(t)
0.5
0
0 100 200 300 400 500 600 700 800 900 1000
t
1
0.8
u1(t)
0.6
0.4
0.2
-0.2
u2(t)
-0.4
-0.6
-0.8
-1
0 100 200 300 400 500 600 700 800 900 1000
t
Figure 6: The two
rst curves show output and input data used for identi
cation, while
the two last curves show data used for validation.
30
0.1
1 0.05
-0.05
-0.1
Sim. 1 0 100 200 300 400 500 600 700 800 900 1000
0.1
2 0.05
-0.05
-0.1
0 100 200 300 400 500 600 700 800 900 1000
0.1
1 0.05
-0.05
-0.1
Sim. 2 0 100 200 300 400 500 600 700 800 900 1000
0.1
2 0.05
-0.05
-0.1
0 100 200 300 400 500 600 700 800 900 1000
0.1
1 0.05
-0.05
Sim. 3
-0.1
0 100 200 300 400 500 600 700 800 900 1000
0.1
2 0.05
-0.05
-0.1
0 100 200 300 400 500 600 700 800 900 1000
0.1
1 0.05
-0.05
Sim. 4
-0.1
0 100 200 300 400 500 600 700 800 900 1000
0.1
2 0.05
-0.05
-0.1
0 100 200 300 400 500 600 700 800 900 1000
t
Figure 7: Prediction errors on the independent validation data for the resulting models of
the 4 simulations. Here 1 = y1 ; y^1 , and 2 = y2 ; y^2 .
31
1 1
0.5 1 0.5
2
3
4
0 0
4
3
-0.5 -0.5 2
1
r11 ( )
r11 (0)
pr r12(0)( r)
11 22 (0)
-1 -1
0 5 10 0 5 10
1 1
1
0.5 0.5 2
3
4
0 0
4
3
2
-0.5 1 -0.5
pr11r21(0)( r22) (0) r22 ( )
r22 (0)
-1 -1
0 5 10 0 5 10
Figure 8: The curves show the autocorrelation functions for the prediction errors, for the
resulting models of the 4 simulations.
32