0% found this document useful (0 votes)
58 views32 pages

Constructing NARMAX Models Using ARMAX

This document outlines a method for constructing nonlinear autoregressive moving average with exogenous inputs (NARMAX) models using linear autoregressive moving average with exogenous inputs (ARMAX) models. Specifically, it proposes decomposing the system's operating range into multiple operating regimes and constructing local ARMAX models that are valid within each regime. These local linear models are then interpolated using weighting functions to create a global nonlinear NARMAX model that approximates the system's behavior across its entire operating range. The document discusses how this approach can represent a large class of nonlinear systems and provides practical guidance on identifying the appropriate operating regimes and constructing the corresponding local and global models.

Uploaded by

Akustika Horoz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views32 pages

Constructing NARMAX Models Using ARMAX

This document outlines a method for constructing nonlinear autoregressive moving average with exogenous inputs (NARMAX) models using linear autoregressive moving average with exogenous inputs (ARMAX) models. Specifically, it proposes decomposing the system's operating range into multiple operating regimes and constructing local ARMAX models that are valid within each regime. These local linear models are then interpolated using weighting functions to create a global nonlinear NARMAX model that approximates the system's behavior across its entire operating range. The document discusses how this approach can represent a large class of nonlinear systems and provides practical guidance on identifying the appropriate operating regimes and constructing the corresponding local and global models.

Uploaded by

Akustika Horoz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Submitted to International Journal of Control

Constructing NARMAX models using ARMAX


models
May 5, 1992, Revised April 20, 1993

Tor A. Johansen and Bjarne A. Foss


Department of Engineering Cybernetics
Norwegian Institute of Technology
N-7034 Trondheim

Abstract
This paper outlines how it is possible to decompose a complex non-linear modelling prob-
lem into a set of simpler linear modelling problems. Local ARMAX models valid within
certain operating regimes are interpolated to construct a global NARMAX (non-linear
NARMAX) model. Knowledge of the system behavior in terms of operating regimes is
the primary basis for building such models, hence it should not be considered as a pure
black-box approach, but as an approach that utilizes a limited amount of a priori system
knowledge. It is shown that a large class of non-linear systems can be modelled in this way,
and indicated how to decompose the systems range of operation into operating regimes.
Standard system identi cation algorithms can be used to identify the NARMAX model,
and several aspects of the system identi cation problem is discussed and illustrated by a
simulation example.

1 Introduction
Modelling complex systems using rst principles is in many cases resource demanding. In
some cases our system knowledge is so limited that detailed modelling is dicult. In other
cases, the instrumentation and logged data from the system are so sparse or noisy that
it is dicult to identify a large number of unknown physical parameters in the model.
Examples of this are found in e.g. metallurgical and biochemical process industry.
In some cases, resources can be saved by using black-box models describing the input/output
behavior of the system. Such models represents the controllable and observable part of the
system. The structure and parameters of black-box models has in general no direct inter-
pretation in terms of the physical properties of the system. The ARMAX model is a well
known linear input/output model representation. The NARMAX (Nonlinear ARMAX)
model representation is an extension of the linear ARMAX model, and represents the sys-
tem by a nonlinear mapping of past inputs, outputs and noise terms to future outputs. In
this paper we discuss how NARMAX models can be represented, and in particular discuss
how a NARMAX model can be constructed from a set of ARMAX models.

1
We will concentrate on non-linear systems that are working in several operating regimes,
because systems that normally work within one operating regime may in many cases be
adequately described by a linear model. There are numerous examples of systems that must
work in several operating regimes, including most batch processes (Rippin 1989). Apart
from normal operating conditions, the control system may also have to take care of startup
and shutdown, operation during maintenance and faulty operation, which obviously lead
to dierent operating regimes.
Traditionally, the problem with multiple operating regimes is solved by non-linear rst
principles models covering several operating regimes, gain scheduling, or simply by manual
or rule-based control of the system when operating outside the normal operating regimes.
From an engineering point of view, it may seem appealing to decompose the modelling
problem into a set of simpler modelling problems. This is exactly what we propose here:
First the system operation is decomposed into a set of operating regimes that are assumed
to cover the full range of operation we want our model to cover. Next, for each operating
regime we design a simple (typically linear) local model. It is usually not natural to de ne
the operating regimes as crisp sets, since there will usually be a smooth transition from
one regime to another, not a jump. Hence, it makes sense to interpolate the local models
in a smooth fashion to get a global model. The interpolation is such that the local model
that is assumed to be the best model at the current operating point will be given most
weight in the interpolation, while neighboring local models may be given some weight, and
local models corresponding to distant operating regimes will not contribute to the global
model at that operating point. To do the smooth interpolation at a given operating point,
we need to know which of the local models describe the system well around that operating
point. For that purpose, to each local model we associate a local model validity function,
i.e. a function that indicates the relative validity of the local models at a given operating
point.
The use of local linear models without interpolation, i.e. piecewise linear models, have been
suggested by several authors, including Skeppstedt, Ljung & Millnert (1992), Skeppstedt
(1988), Hilhorst, van Amerongen & Lohnberg (1991), Billings & Voon (1987), and Tong
& Lim (1980). A related technique is the use of splines (Friedman 1991) for representing
dynamics models (Psichogios, De Veaux & Ungar 1992). Splines are also local models,
but unlike piecewise linear models, there are constraints that enforces smoothness on the
boundaries between the local models. Dierent variations of interpolating memories (Tolle,
Parks, Ersu, Hormel & Militzer 1992, Lane, Handelman & Gelfand 1992, Omohundro
1987) is a related local modelling technique, where a number of input/output-pairs of the
system is memorized and interpolated to give a model. Our approach can be thought of
as a generalization of these techniques, since we interpolate local models.
This paper is organized as follows: First, in section 2, we present a model representation
based on local models. Then we discuss the approximation capabilities of this represen-
tation, and show that a large class of non-linear systems can be represented. The notion
of operating regimes is introduced and we present a general result guiding the choice of
operating point vector. Thereafter, we discuss some practical aspects of modelling using
local models in section 3, and some aspects of system identi cation in section 4. In sec-
tion 5, the concepts are illustrated by a simulation example, and section 6 contains some
discussions and conclusions.

2
2 Model Representation
The NARMAX model representation
y (t) = f (y (t ; 1) ::: y(t ; ny ) u(t ; 1) ::: u(t ; nu ) e(t ; 1) ::: e(t ; ne )) + e(t) (1)
is shown by Leontaritis & Billings (1985) and Chen & Billings (1989) to represent a large
class of discrete-time nonlinear dynamic systems. Here y (t) 2 Y  Rm is the output
vector, u(t) 2 U  Rr is the input vector, and e(t) 2 E  Rm is equation error. We
introduce the (m(ny + ne ) + rnu )-dimensional information vector
 (t ; 1) = y T (t ; 1) ::: yT (t ; ny ) uT (t ; 1) ::: uT (t ; nu ) eT (t ; 1) ::: eT (t ; ne )]T
where  (t ; 1) is in the set  = Y ny  U nu  E ne . This enables us to write equation (1)
in the form
y(t) = f ( (t ; 1)) + e(t) (2)
Provided that necessary smoothness conditions on f :  ! Y are satis ed, a general way
of representing functions is by series expansions. Using a 1st order Taylor-series expanded
about the systems equilibrium point yields a standard ARMAX model. Second-order
Taylor-expansions are possible, while higher-order Taylor-expansions are not very useful
in practice because the number of parameters in the model increases drastically with the
expansion order, and because of the poor extrapolation and interpolation capabilities of
higher-order polynomials. Splines oers a solution to this problem, but the approxima-
tion in higher dimensional spaces may be dicult due to the smoothness constraints on
the boundaries. Chen, Billings & Grant (1990a) have proposed to use a sigmoidal neu-
ral network expansion, Billings & Voon (1987) uses a piecewise linear model, and Chen,
Billings, Cowan & Grant (1990b) have applied a radial basis function expansion as a means
of representing f . A generic model representation based on local models was introduced
in (Johansen & Foss 1992b, Johansen & Foss 1992a), inspired by work by Stokbro, Hertz
& Umberger (1990) and Jones et al. (1991). We will in the following study this model
representation in detail.

Approximation using local models and interpolation


Given a set of functions f~i :  ! 0 1]gNi=0;1, the following equation is trivially true
PN ;1 f ()~ ()
f ( ) = iP =0 i (3)
N ;1 ~ ( )
i=0 i
assuming that at any point  2 , not all ~i vanish. We will assume the function ~i is
chosen such that it is localized in a subset i  . This means that ~i ( ) is signi cantly
larger than zero for  2 i , and ~i( ) is very close to zero for  2= i . Since ~i is close
to zero for  2= i , we can substitute f on the right-hand side in (3) with a f^i that is a
good approximation only in i
PN ;1 f^ ()~ ()
=0 i i
f^( ) = iP N ;1 ~ ( ) (4)
i=0 i
Introducing the normalized functions w~i :  ! 0 1] de ned by
w~i () = PN~;i1() (5)
j =0 ~j ( )

3
gives the approximation
NX
;1
f ( ) = f^i ( )w~ i()
^ (6)
i=0
In this equation, we can interpret w~i as a function that gives a value close to 1 in parts
of  where the function f^i is aPgood approximation to f , and close to zero elsewhere. By
de nition of w~i we know that Ni=0;1 w~i ( ) = 1 for all  2 , and we call the functions w~i
interpolation functions because they are used to interpolate local models f^i . We call f^i a
local model since it is assumed to be an accurate description of the true f locally (where
~i is not close to zero).
The set of all functions of the form (6) with local models of polynomial order p and smooth
interpolation functions is denoted
( NX
;1 )
F~p = f^ :  ! Y j f^() = f^i ( )w~ i( )
i=0
At the extreme, 0th order Taylor-expansions of f about i 2 i may be used to de ne f^i :
f^i ( ) = f (i) = i (7)
where i is a parameter vector. Such a simple local model is closely related to an interpo-
lating memory (Tolle et al. 1992) and requires a large number of interpolation functions,
since this means that the value f (i) is extrapolated locally. This case is in fact identi-
cal to neural networks with localized receptive elds, (Moody & Darken 1989, Stokbro et
al. 1990). Considering fw~igNi=0;1 as a set of basis-functions, the method is also similar to
radial basis-function expansions (Broomhead & Lowe 1988), for the following reason: If
the functions ~i are chosen as local radial functions, the normalized function w~i de ned by
(5) will not be radial in general, but it will qualitatively have much the same shape and
features as ~i, except near the boundary of .
A 1st order Taylor-expansion of f about i provides better extrapolation and interpolation
than the 0th order expansion (7). Assuming the 1st derivative of f exists, the local models
are given by
f^i ( ) = f (i) + rf (i)( ; i) = i + i ( ; i ) (8)
where i is a parameter vector and i is a parameter matrix. Observe that (8) is actually
an ARMAX model resulting from a linearization about i . Both the Weighted Linear
Maps of Stokbro et al. (1990), Stokbro (1991) and Stokbro & Umberger (1990) and the
Connectionist Normalized Linear Spline Networks of Jones et al. (1991) and Jones, Lee,
Barnes, Flake, Lee, Lewis & Qian (1989) uses a 1st order expansion locally. This repre-
sentation makes it possible to build a NARMAX model by interpolating between a set of
ARMAX models.
Higher order local models can of course also be used. Furthermore, there is no requirement
that all the local models should have the same structure. Some of the local models may
be based on rst principles modelling, while others may be generic black-box models.
Johansen & Foss (1992c) uses this approach to integrate rst principles models with neural
network type models.

Approximation Properties
It seems reasonable that the approximation can be made arbitrary good by choosing a
sucient number of local models. This is indeed the case, as illustrated in the following.
4
We use the following norm to measure the approximation accuracy
jjf ; f^jj1 = sup jjf () ; f^()jj2
2
where jj  jj2 denotes Euclidian norm.
The (p + 1)th derivative of the vector function f at the point  is denoted by rp+1f ( ).
Assume f is continuously dierentiable p + 1 times, and ff^i gNi=0;1 are local models equal
to the rst p terms of the Taylor-series expansion of f about i . For any  2 , we have
NX
;1
f ( ) ; f^( ) = (f ( ) ; f^i ( ))w~ i( )
i=0
If we assume jjrp+1f ( )jj < M for all  2 , where jj  jj denotes the induced operator
norm, we obtain by Taylors theorem
NX;1 M
^
jjf () ; f ()jj2 < jj ; ijjp2+1 w~i()
i=0 ( p + 1)!
In order to ensure that this norm is smaller than an arbitrary  > 0, we must ensure that
for any  2  the following condition holds
NX
;1 NX
;1
jj ; ijjp2+1~i() <  (p M
+ 1)! ~i ( ) (9)
i=0 i=0
De ning the set of functions fgi :  ! RgNi=0;1 by

gi ( ) = jj ; i jjp2+1 ;  (p M
+ 1)!

and rewriting (9) gives the following condition that must hold for any  2 
NX
;1
gi()~i( ) < 0 (10)
i=0
or equivalently if we divide
P
(10) by Ni=0;1 ~i ( )
NX
;1
gi ()w~ i( ) < 0 (11)
i=0
The problem is now to nd the conditions on N and the functions f~igNi=0;1 to ensure that
equation (10) holds for any given  > 0. A geometric interpretation of (10) is given in Figure
1. Certainly, this equation holds if the negative contribution of one term gi ( )~i( ) in (10)
dominates the (possibly positive) contribution of all other terms. A necessary condition
is gi ( )~i( ) ! 0 as jj jj2 ! 1. This will certainly be ensured if we choose ~i as an
exponential or Gaussian function.
Notice that the shape of the gi -functions are xed and given by the speci cations. We are,
however, free to choose the location and number N of local models. Let us choose the set
figNi=0;1 so large and \suciently dense in " that at least one of the functions fgigNi=0;1
will be negative at any  2 . Then the functions fgi gNi=0;1 are xed, and we must choose
the ~i -functions such that (10) hold.

5
This can be done in several ways. In the limit when the width of the ~i-functions go to
zero the interpolation functions w~i will approach step-functions as shown in Figure 2. The
model will then approach a piecewise constant model if p = 0, a piecewise linear model if
p = 1, etc. In this limit, at any  2  there will exist a j such that
(
w~i ( ) = 10 IfIf ii = j
6= j
By the choice of figNi=0;1 we know that gj ( ) < 0, and since w~i ( ) = 0 for i 6= j , (11) will
hold. We can now provide a result for the case when  is a bounded set:
Theorem 1 Suppose given any integer p  0, and suppose f has continuous (p + 1)-th
derivative. If  is bounded, then for any  > 0 there is a f^ 2 F~p (with nite N , which
may depend on ) such that
jjf ; f^jj1 <  (12)
Proof: Since rp+1f () is continuous, it is bounded (by M < 1) on . Since  is
bounded, a nite N is sucient to ensure that one gi -function is negative at any point.
Since N is nite, we do not have to go to the limit and make fw~i gNi=0;1 step-functions, but
can stop when we are suciently close. Then f~igNi=0;1 can be chosen as smooth functions
such that (11) holds. Since  2  was arbitrary, the theorem is proved.
2
This is an existence theorem. However, the proof is constructive and gives indications on
how to construct the approximator. In order to use this proof to formulate an upper bound
on the approximation error, we introduce the following de nition of distance between sets,
similar to the Haussdorf metric:
Denition 1 Assume A and B are two nonempty subsets of a vector space. Then the
distance between the sets is dened as
D(A B) = ainf sup jja ; bjj2
2A b2B
2
The crux in the proof of Theorem 1 is that at any point  2  one of the gi -functions is
negative and that the ~i-functions are chosen such that at any point  2 , a negative
term gi ( )~i( ) will dominate the sum (10). At least one gi -function will be negative at
any  2  if the following condition holds
  p+11
D(f g )  (p + 1)!
i M (13)
If the set fi g is dense in , this distance will be zero. The term \suciently dense" used
informally above means that the set figNi=0;1 should be chosen such that (13) holds for
the given .
Theorem 2 Suppose given an integer p  0. If  is bounded and f has bounded (p +1)-th
derivative, i.e. jjrp+1f ( )jj M for all  2 , then for any f^ 2 F~p with nite N and
suciently narrow functions f ~i gNi=0;1 , an upper bound on the approximation error is given
by
jjf ; f^jj1 (p +M1)! (D(fig ))p+1 (14)

6
Proof: (13) will hold for  equal to the right-hand side of 14). From the previous discus-
sion, it is evident that (11) holds for any  2 . Hence, jjf ; f^jj1 will be bounded by ,
and the result follows.
2
This is under the condition that the ~i -functions are chosen narrow. This bound is conser-
vative, meaning that for ~i -functions that are not too narrow and not too wide, one may
expect better accuracy. However, if the functions f~igNi=0;1 are not narrow, the result does
not hold.
From (14) we see that if the polynomial order p of the local models is increased, then the
accuracy will improve. If  is not bounded and M > 0, N must be in nite in order to
guarantee a bounded error.
If f does not satisfy the smoothness conditions in Theorem 1, the proof obviously does
not hold. If, however, f is such that it can be approximated arbitrary well by a su-
ciently smooth function, then we can show that f can be approximated arbitrary well by
interpolating local models. In patricular we have:
Corollary 1 The results of Theorem 1 also holds if the smoothness assumption on f is
relaxed to assuming only continuity. In other words, the set F~p is dense in the set of
continuous functions from  into Y .
Proof: By the Weierstrass approximation theorem, e.g. (Stromberg 1981), for any  >
0 there exists a polynomial f~ such that jjf ; f~jj1 =2. By theorem 1, f~ can be
approximated by a f^ 2 F~p on the bounded set  such that jjf~ ; f^jj1 < =2. Using the
triangle inequality we get jjf ; f^jj1 < .
2

Example 1
Assume p = 1, i.e. the local models are ARMAX models. Then (14) can be written
jjf ; f^jj1 M2 (D(fig ))2 =  (15)

If f and  are scalars, M is a bound on the second derivative of f , in other words a


bound on the curvature. If the system is linear, then M = 0 and one local linear model is
sucient to make an arbitrary good global model (of course). M indicates the nonlinearity
of the function, and we expect  to increase with increasing M , i.e. increasing nonlinearity,
which is indeed the case as indicated by (15). However, using the upper bound M gives
a conservative result since the system may behave more linearly in some regions than
others. Hence, we need not have high density of local models where the system does not
have strong nonlinear behavior.
2

Example 2
With a simple example we illustrate the use of Theorem 2. Consider the function f :
0 2] ! R given by f ( ) =  2 + 1. Assume that we have two local linear models located
7
at 0 = 0:5 and 1 = 1:5. Then D(f0:5 1:5g 0 2]) = 0:5, p = 1 and M = 2. Theorem
2 predicts the bound  = 0:25 on the approximation accuracy. As shown by Figure
3, this bound is exact when using in nitly narrow functions ~i, i.e. a piecewise linear
approximation. The reason for this is that M = f 00 ( ) = 2 for all  , hence there is no
regions where f is \less nonlinear". As we shall see later, better approximations can be
achieved using well-chosen ~i-functions. From this gure we also see that the local linear
models are not chosen as a rst order Taylor-expansion, but chosen on the basis of e.g. a
least squares regression, improvement might also be achieved.
2
Since the system function f can be approximated arbitrary well, we are able to make
arbitrary good prediction on a nite horizon if there is no noise, provided the intial values
are correct and the inputs and outputs are such that they give vectors  that remain in 
(Polycarpou, Ioannou & Ahmed-Zaid 1992). However, it is well known that the solution
to some dierence equations are sensitive with respect to intial values or modelling errors.
Examples of such systems are chaotic or unstable systems.

Operating regimes
In the rest of this paper we will usually assume p = 1, i.e. we use linear ARMAX models
locally to build a non-linear NARMAX model. In the representation (6) the interpolation
functions fw~igNi=0;1 are de ned on the set . This is a subset of the information-space.
If the information-space has high dimension (as it ofte has), the curse of dimensionallity
problem arises. This problem was rst described by Bellman (1961) is essentially that
the number of local models needed to uniformly cover a region of this space increases
exponentially with the dimension of the space. In practise, uniform coverage it usually not
necessary, but the problem is still severe. In some cases the interpolation functions may
be de ned on a space of smaller dimension. This is our motivation for introducing the
terms operating regime and operating point. First, we de ne ! to be the set of operating
points. Motivated by the fact that we want to model a nonlinear system with a set of
linear models, it is convenient to de ne an operating regime as a subset of ! where the
system behaves approximately linearly.
Denition 2 An operating regime is a set of operating points !i  ! where the system
behaves approximately linearly.
A model validity function i : ! ! 0 1] is smooth and satis es i( )
1 for 2 !i , and
goes to zero outside !i . The interpolation functions wi : ! ! 0 1] are now de ned as
wi( ) = PN;i1( )
j =0 j ( )
assuming that at every operating point 2 !, not all model validity functions i vanish.
In many cases there will exist a function H :  ! ! such that at any t will (t) = H ( (t)).
The function H will typically be a projection, i.e. ! will be in a space of lower dimension
than . In cases where the operating point is calculated on the basis of ltered or estimated
quantities, the relationship between  (t) and (t) is more complex, and must be described
by an operator H. This may be the case when is estimated using a recursive algorithm or
a recursive lter to depress noise. Although very important, this complicates the analysis

8
considerably, and we will not consider this case here, but leave it as a topic for future
research.
To summarize, the representation we address at this stage is
NX
;1
y^(t) = f^( (t ; 1)) = f^i ( (t ; 1))wi( (t ; 1)) (16)
i=0
where the local models
f^i ( (t ; 1)) = i + i ( (t ; 1) ; i) (17)
are ARMAX models. We de ne the set
( NX
;1 )
Fp = f^ :  ! Y j f^( ) = f^i ( )wi( )
i=0
where p is the polynomial order of f^i , the interpolation functions fwi gNi=0;1 are smooth, and
= H ( ). Now we want to state some general results regarding the transform H from the
information vector to the operating point vector. In general, f can be written as an ane
function of some of its argument. We rearrange the elements of  into  T = LT NT ] such
that
f ( ) = f (L N ) = f1 (N ) + f2(N )L (18)
Assume L and N are the subsets of the information-space corresponding to L and N
respectively. f1 : N ! Rm and f2 : N ! Rmm are non-linear vector- and matrix-
valued functions, respectively. Our principal result guiding the choice of is the following,
which indicates that must be chosen such that it captures the systems non-linearities:
Theorem 3 Assume f given in (18) is continuous, and  is bounded. Then for any  > 0
there is a f^ 2 F1 with = N , and nite N such that jjf ; f^jj1 < .
Proof: Fix an arbitrary  2  such that T = LT NT ].
NX;1
jjf () ; f^()jj2 = jjf (N L) ; f^i(N L)wi(N )jj2
i=0
NX
;1
= jj (f1 (N ) + f2 (N )L ; f^i (N L))wi(N )jj2
i=0
NX;1
= jj (f1 (N ) ; f^Ni (N ) + f2 (N )L ; f^Li (L ))wi(N )jj2
i=0
In the last line we split the linear function f^i :  ! Rm into two linear functions f^Ni :
N ! Rm and f^Li : L ! Rm . Now we choose f^Li (L) =
i L where
i is a not yet
speci ed constant parameter matrix. Then we have
NX
;1
jjf () ; f^()jj2 jj (f1 (N ) ; f^Ni (N ) + (f (N ) ;
i )L)wi (N )jj2
i=0
NX;1 X N ;1
jj (f1 (N ) ; f^Ni (N ))wi(N )jj2 + jj (f2 (N ) ;
i )Lwi (N )jj2
i=0 i=0
NX
;1 NX
;1
jjf1(N ) ; f^Ni (N )wi(N )jj2 + jjLjj2  jjf2(N ) ;
i wi(N )jj2
i=0 i=0

9
The rst term in this equation can be made arbitrary small by Corollary 1 with p = 1
since f^Ni is linear. Since  is bounded, the second term can be made arbitrary small by
the same corollary with p = 0 through the choice of
i . Hence, for any  > 0 we can make
jjf () ; f^()jj2 <  and since  is arbitrary we get jjf ; f^jj1 < .
2
Using the same notation as before, the attainable approximation error is bounded by
 
jjf ; f^jj1 M2 (D(f ig !))2 + 2jjLjjD(f ig !) =  (19)
where
jjLjj = sup jjLjj2
L 2L
The motivation for introducing the operating point is that in many cases this vector
may be of a signi cantly lower dimension than  . With a xed N the rst term in (19)
will be signi cantly smaller than the corresponding term
M (D(fig ))2
2

However, the second term in (19) will make the error increase, but in most cases when !
is of smaller dimension than , the approximation (16)-(17) will give better accuracy than
(6), (8). Another important fact is that a low dimension of ! makes it easier to partition
the set into operating regimes.

Example 3
For example, if f is linear in the control variables u(t ; 1), then need not contain any
u(t ; 1)-terms. If we have the system
y(t) = f (y (t ; 1) u(t ; 1)) = f1 (y (t ; 1)) + f2 (y(t ; 1))u(t ; 1)
we can choose (t ; 1) = y (t ; 1) without loosing accuracy in the approximation.
2
We now generalize this result for local expansions of (polynomial) order p. We split  into
two parts and rearrange  T = LT HT ] such that
f ( ) = f (L H ) = fH 1 (H ) + fH 2(H )fL (L ) (20)
where fL : L ! Rm is of polynomial order less than or equal to p, fH 1 : H ! Rm , and
fH 2 : H ! Rmm may be of higher order.

Theorem 4 Suppose f given in (20) is continuous and  is a bounded set. Then for any
 > 0 there is a f^ 2 Fp with = H , and nite N such that jjf ; f^jj1 < .

Proof: The proof follows the same idea as the proof of Theorem 3, but requires some
tedious notation, and is therefore ommitted.
2

10
Some Comparisions
Using local linear models we can write the model representation (16) - (17) as
NX
;1
y^(t) = (i + i ( (t ; 1) ; i))wi( (t ; 1))
i=0NX
;1 ! NX;1 !
= (i ; i i )wi( (t ; 1)) + i wi ( (t ; 1))  (t ; 1)
i=0 i=0
= ( (t ; 1)) + ( (t ; 1)) (t ; 1)
This means that the non-linear model can be written as an apparently linear model, where
the parameters are dependent on the operating point. Priestley (1981) introduced State-
dependent models which can be written
y^(t) = (x(t ; 1)) + (x(t ; 1)) (t ; 1) (21)
where x is the \state-vector",  is a state-dependent vector, and  is a state-dependent
matrix. In general x =  was suggested, but it was also observed that this might be
redundant, so a simpler vector may be used to describe the parameter dependence. The
present approach with x = has obvious similarities. Billings & Voon (1987) discusses
the use of models with signal-dependent parameters, which are similar to (21) this x = ! ,
where ! (t) is the auxilliary signal. In (Billings & Voon 1987) polynomials was used to
de ne the dependence of the parameters on the auxilliary signal, i.e. (! (t)) and (! (t))
are polynomials in ! (t). A similar approach was proposed by Cyrot-Normand & Mien
(1980). Our approach is also similar, but system knowledge is applied to choose the i -
functions, which again de nes ( (t)) and ( (t)). The Threshold AR Model by Tong &
Lim (1980) can also be written in the form (21) with x(t ; 1) = y (t ; 1)
(
(y(t ; 1)) = 1 If y(t ; 1) 2 Y1
 If y(t ; 1) 2 Y2
( 2
(y (t ; 1)) = 1 If y (t ; 1) 2 Y1
2 If y (t ; 1) 2 Y2

where Y = Y1 Y2 . Here the parameters are switched between two possible parameter sets,
and the decision is based on the value of y (t ; 1). The resulting model is a piecewise linear
model and related to our approach if (t ; 1) = y (t ; 1) and the interpolation functions
are step-functions.
The notion of operating points and model validity functions oers a complementary method
for parameterizing the state-dependence of the parameters given in (Priestley 1988).
Takagi & Sugeno (1985) suggested a fuzzy logic based technique for combining in a smooth
fashion a set of linear models into a global model. It turns out that if the operating regimes
!i are viewed as fuzzy sets with membership functions equal to the model validity functions,
then inference on a rulebase of the form
IF (t ; 1) 2 !i THEN y^(t) = f^i ( (t ; 1))
gives a resulting global model of the same form as the one analysed in the present paper,
provided the fuzzy operations are properly de ned. This suggests the use of fuzzy sets
and rules as a means of de ning the operating regimes and local model validity functions.

11
This is appealing since this gives a direct method of representing the empirical knowledge
the engineers and operators have about the system and local models.
A related non-linear modelling approach is radial basis-functions (RBF), (Powell 1987),
(Broomhead & Lowe 1988). Using RBF's, a non-linear function may be modelled as
NX
;1
f^( ) = i ri(jj ; ijj)
i=0
where ri : R+ ! R is typically chosen as a Gaussian function. The relationship between
some of these approaches is best illustrated by an example.

Example 4
We consider again the function in Example 2, and the following 4 modelling approaches:
1. Two piecewise linear models, as Example 2, centered at 0 = 0:5 and 1 = 1:5. This
may also be interpreted as a Thresholded AR model.
2. We choose Gaussian model validity functions
  2!
1
i ( ) = exp ; 2 # ; i (22)
i
with 0 = 0:5 1 = 1:5 #i = 0:52, and use 2 local linear models.
3. 5 local 0th order models centered at 0 = 0 1 = 0:5 2 = 1 3 = 1:5 4 = 2, and
Gaussian model validity functions with #i = 0:52,
4. A radial basis-function expansion with 5 Gaussian basis-functions centered at 0 =
0 1 = 0:5 2 = 1 3 = 1:5 4 = 2, and #i = 0:52.
Linear regression is used to estimate the model parameters, and the results are shown in
Figure 4. By comparing Figure 4a with 4b, it is obvious that interpolating local linear
models using well chosen model validity functions can improve the accuracy compared to
piecewice linear models.
Notice that f now is de ned on ;1 3], while data on 0 2] is used for parameter estima-
tion. The extrapolation capabilities can thus be evaluated, and we see that the local linear
approximations give 1st order extrapolation, as would be expected, while the local 0th
order models give 0th order extrapolation. The RBF approach does not given any extrap-
olation at all, since all basis-functions go to 0. A feed-forward neural net with one hidden
layer (of sigmoidal basis-functions) would give an extrapolation qualitatively similar to the
0th order models in gure 4c. As we see, there are fundamental dierences concerning the
extrapolation capabilities.
2

3 Modelling
The representation (16-17) is appealing since ARMAX models are simple. We represent
a complex nonlinear system with a number of simple linear systems. A piecewise linear
12
model will have the same features, but unless we enforce the model to be continuous on
the boundary between the local models, the resulting global model may be discontinuous,
which may be undesirable in some cases. Enforcing continuity poses restrictions on the
parameter space, giving lost representational power of the model class. The same problem
arises to an even larger extent using e.g. cubic splines. Unlike the piecewise linear model,
the model (16) will be smooth when the model validity functions are chosen as smooth
functions. In practice, there are at least 4 ways a NARMAX model can be constuced using
local ARMAX models and interpolation:

1. First we choose a set of operating regimes !i that correspond to the normal equi-
librium points and major transient operating regimes of the system in question.
This means that we partition the set of operating points into parts where be believe
that the system will behave linearly. Then we perform experiments on the system
and identify a local ARMAX model f^i for each operating regime, using cost indices
corresponding to each local model
1 mi 
X T  
Ji = m y(t) ; f^i ((t ; 1)) $i y (t) ; f^i ( (t ; 1)) (23)
i t=1
where mi is the number of data-points for regime !i , and $i is a scaling matrix.
Then the local models are integrated using system knowledge to choose sensible
model validity functions fi gNi=0;1 such that the set ! is covered, as shown in gure 5.
The choice of these functions will strongly in%uence the accuracy of the global model,
since the local ARMAX models are identi ed before the functions i are chosen.
2. Instead of choosing the model validity functions figNi=0;1 empirically, we may try
to nd optimal validity functions after we have found the local ARMAX models.
Choosing xed structures for the functions figNi=0;1 is necessary to make the problem
nite-dimensional. Keeping the parameters of the identi ed ARMAX models xed,
we may search for optimal parameters of figNi=0;1 using the same data used for
identifying the ARMAX models, and a global performance index. This procedure
leads to a two-step optimization procedure.
3. A more direct approach is to rst choose the model validity functions corresponding
to the operating regimes !i using system knowledge. Keeping the model validity
functions xed, we minimize the global index
1 m 
X T  
J= y(t) ; f^((t ; 1)) $ y (t) ; f^( (t ; 1))
m t=1 (24)

with respect to all parameters in the local models. Here m is the number of data-
points available, and $ is a suitable scaling matrix. Now the shape of the model
validity functions will be taken into consideration when nding the optimal param-
eters for the local models. This has two side eects: First, the accuracy of the
global model is less sensitive to the choise of fi gNi=0;1. Second, the local models (17)
are in%uenced by the user-speci ed functions figNi=0;1 . Hence, they are no longer
linearizations of f about figNi=0;1 .
4. An obvious improvement to methods 2 and 3 would be to search for ARMAX and
local model validity function parameters simultaneously. This leads to a complex
non-linear programming problem.

13
Required A Priori System Knowledge
When building models, dierent kinds of system knowledge must be available. For AR-
MAX models, one must rst estimate the dominant time-constants of the system to choose
the sampling interval. Second, the ARMAX model order must be chosen and the structure
of the system disturbances must be known in order to select the MA-part of the model.
When building NARMAX models, in addition it is necessary to nd a suitable structure
for the system function f . First principles knowledge can be applied here, if available.
We have proposed a generic structure that do not require rst principles modelling. It is,
however, not a completely black-box approach, as some limited system knowledge must be
included. In order to use the local modelling approach introduced here, a priori knowledge
in terms of operating regimes must be available. One must be able to estimate operating
regions in which the system will behave approximately linearly.
In general, the technique with local models and interpolation may be used in an elegant
fashion to integrate rst principles models with black-box models, since it is completely
feasible that some of the local models may be derived based on rst principles, while
others may be black box models (Johansen & Foss 1992c). In operating regimes where
the dominating physical pheomena are well understood and possible to model, and in
operating regimes where the data is so spares that black-box modelling is not possible, it
makes sense to use local rst principles models. In the remaining regimes, black-box models
can be constructed, and the proposed technique with operating regime decomposition can
be applied to integrate the dierent models. Of course, in regimes where we have limited
data and limited knowledge, modelling is impossible, and the best we can hope for is some
reasonable extrapolation of the neighboring local models into those regimes.
De ning the operating point in a suitable manner is important. If the local models are
linear, we have shown in Theorem 3 that the operating point must capture the systems
non-linearities. Given a set of data from the system, dierent tests for linear reationships
between some inputs and the outputs is of great interest (Haber 1985). In the case with
signal-dependent piecewise linear models, it is observed by Billings & Voon (1987) that
the model may be input-sensitive. This must certainly be expected if the data used for
identi cation does not cover the full range of operation. Input sensitivity and biased
models may also be the result if the operating point vector is not suitably chosen, i.e. only
some of the non-linearities are captured by the operating point.

4 Identication
First we consider identi cation of local model parameters based on the local cost indices
(23), and second we consider the global cost index (24). Finally, we consider identi cation
of local model validity function parameters and model structure identi cation.

Identifying local model parameters using local cost indices


The prediction error at time t for the local model f^i is de ned to be
i (t) = y(t) ; f^i ( (t ; 1))

14
The cost index Ji associated with the local model can be written as
Xmi
J = 1 T (t)$  (t)
i mi t=1 i ii (25)

Consider the local ARMAX model (17). The local models are parameterized by a vector
i and a matrix i . Since all local models (17) are linear functions of the parameters, the
representation is basically linear in the parameters, and standard identi cation methods
can be applied, e.g. (Soderstrom & Stoica 1988).
Assume rst the noise e(t) is sequentially uncorrelated. Since the information vector
(t ; 1) do not contain noise terms e(t ; 1) ::: e(t ; ne ), the model can be written on the
linear regression form
f^i ( (t ; 1)) = 'Ti (t ; 1)i (26)
where  i is a parameter vector and 'i (t ; 1) is a regression matrix.
The parameters can be estimated using the least squares (LS) method. The regression
matrix, 'i (t ; 1), is a matrix of computable or measurable quantities not depending on
the parameter vector and not correlated with e(t). The least squares estimate minimizing
(25) can be written as
 !;1  X !
^ i = 1 X
mi
T 1 mi
m 'i(t)$i'i (t)
i t=1 m $i 'i (t)y (t)
i t=1
(27)

In the general case when delayed noise, e(t ; 1) e(t ; 2) ::: e(t ; ne ), is included in  (t),
we use the prediction error (PE) method. Since the noise is assumed not to be measurable
we do not know the values of e(t ; 1) ::: e(t ; ne ). If the model matches the true system,
then i (t) = e(t). Now i (t ; 1) depends on the parameters  i since it is the prediction
error. Since 'i (t) depends on i (t ; 1) and hence i , we may conclude that the predictor
(26) is no longer linear in the parameters. Hence it is not possible to nd a simple analytic
solution like (27). The cost indices (25) must be minimized numerically using e.g. the
Newton-Raphson algorithm
    
^ (i k+1) = ^ (i k) ; k r2Ji ^(i k) ;1 rJi ^ (i k)

or the Gauss-Newton algorithm, which is based on simpli ed calculations of the inverse


Hessian matrix.
Both the LS-method and PE-method can be formulated recursively. The RPE-algorithm
is
^i (t) = ^i (t ; 1) + Ki(t)i (t) (28)
Ki(t) = Pi(t)(t)$i (29)
 ;1 T ;1 T
Pi (t) = Pi(t ; 1) ; Pi (t ; 1)(t) $i +  (t)Pi(t ; 1)(t)  (t)Pi (t ; 1)(30)
with (t) = ; @@i i (t).

Identifying local model parameters using a global cost index


We de ne the global prediction error to be
(t) = y (t) ; f^( (t ; 1))
15
The global cost index (24) can be rewritten as
X
m
J = m1 T (t)$(t) (31)
t=1
The LS- and PE-methods can be formulated in the same manner as above. When the
identi cation is performed on-line, some rather heuristic modi cation may be desirable.
As pointed out in (Johansen & Foss 1992b), only the parameters of those local models that
are assumed to be valid in the current operating regime should be updated. This is easily
accomplished by assuring that the model validity function i is exactly zero for operating
points where the local model is not valid. In practice, however, we would like the functions
i to be smooth, i.e. they should go fast to zero instead of being exactly zero. This may
cause problems when the system is operating within one operating regime for a long time.
Then all the other local models will be updated slightly each time-step, and information
about other operating regimes will \leak out", in particular if forgetting factors are used.
The heuristics we propose to eliminate this problem is to update only the parameters of
those local models satisfying wi ( (t)) >  , where typically  2 0:1 0:5]. It is a danger that
parameters being an active part of the global model may never be updated. This requires
that the system is excitated such that the parameters of all local models will be updated
from time to time.

Identifying model validity function parameters


In general, the local model validity function parameters will enter the equations for the
prediction error nonlinearly. In particular, if these parameters are to be identi ed sim-
ulateously with the local model parameters, we get a complex non-linear programming
problem. We will not discuss this problem here, but refer to the vast litterature on non-
linear programming, e.g. (Gill, Murray & Wright 1981). We will like to point out that
most of our simulations so far indicate that a rough empirical choise of model validity
functions combined with local model identi cation based on a global cost index in most
cases gives better results than the use of local indices and subsequent optimization of the
local model validity functions.

Identifying model structure


We have suggested how knowledge about operating regimes may be used to decompose
the modelling problem into the problem of building simple ARMAX models corresponding
to each operating regime. Together with model validity functions, this gives a model
structure where only the parameters are unknown. In some cases, such knowledge may
not be available and we need methods to nd an adequate model structure, i.e., the number
N of local models, and the structure of each local model. From the parsimony principle
we know that the best model is the model with fewest parameters able to describe the
system adequately. There are several theoretical frameworks that deal with this problem.

1. The prediction error using a good model should be uncorrelated with past prediction
errors and inputs (and future inputs if the system operates in open loop). There are
several correlation-based test, see eg. (Billings & Voon 1986).

16
2. If we consider the expected error criterion J~ on future data, we obtain (Soderstrom
& Stoica 1988)
J~ = (1 + P=m)J (32)
This depends on the number of parameters P and the numbers of data points m used
for identi cation. This is also related to the Akaike Information Criterion, (Akaike
1969)

In general J will decrease when more parameters are introduced in the model, e.g. when
new local models are added. However, P will increase, and at some stage the increment
in 1 + P=m will be larger than the decrease in J , if m is kept xed. The index J~ will then
increase, indicating that the quality of the model decreases. It is therefore important to
keep the number of local models at a minimum, and use as simple local models as possible.
Both these methods can be implemented o-line by an exhaustive search. Not all of
them are suited for on-line structure identi cation, however. It requires large amounts
of computer power to test more than 2-3 model hypotheses simultaneously. Generally a
batch of data is required to perform any statistical test with some signi cance level. A large
prediction error at some time sample may be due to noise, parameter errors or inadequate
model structure. Collecting a batch of data over some time will reduce the impact of noise.
If the batch is so large that the parameter estimator is expected to converge during the
batch, the impact of parameter errors may also be insigni cant. Hence, if the batch is large
and the prediction error is biased or correlated, we can infer that the model structure is
not good. In our approach with local models, we must decide which local models are the
cause for the mismatch. This can be found by collecting statistics locally for every local
model. The problem of on-line structure identi cation is discussed in (Johansen & Foss
1992c).

Time-varying systems
In case of time varying parameters the RPE-formulation (28)-(30) must be modi ed. This
is usually done by introducing a method for arti cially increasing the covariance matrix
estimate P so it will not go to zero. The most common schemes are linear increase, which
correspond to the Kalman lter, exponential increase, and covariance resetting.
At one time-step we get the information vector  (t). The information vector is transformed
to the vector w(t) = w0( (t))    wN ;1( (t))]T . This vector correspond to the direction
in interpolation-space where we get information. The interpolation-space consists of N
dimensions, one for each local model. If forgetting is to be avoided, we should only update
models along the direction in the model space where we get new information. This means
that we should only forget if we know that new information will arrive to compensate for
the forgotten information. At the operating regime level, this is rather simple in our case
since by construction the components of w(t) will be close to zero when the information
is not relevant for the corresponding local models. Hence, by thresholding the parameter
update, only the local models we get new information about will be updated. This leads
to a small set of local models to be updated at each time-step. Within each local model,
techniques based on the same type of reasoning can be applied to only update in the
directions in parameters-space where we get new information, (Fortescue, Kershenbaum
& Ydstie 1981, S&lid, Egeland & Foss 1985, Parkum, Poulsen & Holst 1990).

17
5 Simulation example
We will use the proposed approach to identify a model of a Continuous Stirred Tank
Reactor (CSTR) where a rst order, exothermic chemical reaction A ! B takes place.
The system is described by the following mass- and energy-balance
V dtd cA = cAiqi ; cAqo ; V rA (33)
cpV dtd T = cpTiqi ; cpTqo + Q ; 'Hr V rA (34)
 
rA = k0cA exp ; ERA ( T1 ; T1 ) (35)
R
with symbols as de ned in Table 1.

Symbol Value Unit Description


T 350-400 K Reactor temperature
Ti 310 K Inlet temperature
TR 350 K Reference temperature
cA ca. 1 kmol=m3 Concentration of A in reactor
cAi 10 kmol=m3 Inlet concentration of A
EA 70000 kJ=kmol Activation energy
R 8.314 kJ=(Kkmol) Gas constant
k0 0.042 1=min Frequency factor
 1.0 kg=l Ave. density in reactor
cp 4.0 kJ=(Kkg) Ave. heat capacity in reactor
V 10.0 m 3 Reactor volume
'Hr -90000 kJ=kmol Reaction energy
qi ca. 50-500 l=min Inlet %ow
qo ca. 50-500 l=min Outlet %ow
Q ca. -6 - 3 MW External power from heat exchanger
rA kmol=l Reaction rate
Table 1: Symbols.

Inputs to the system are the power Q and the inlet %ow qi . Outputs are temperature T
and concentration cA . We de ne the vectors y = 5  10;4  cA 5  10;3  T ]T and u =
50  qi 10;7  Q]T . The system is simulated as a continuous-time system. Two independent
sequences f(y (t) u(t))g1000
t=1 are collected by sampling the system every 2 minutes, see gure
6. One set is used for parameter identi cation only, and the other is used for model
validation only.
The reactor is open-loop unstable. There are therefore two stabilizing single-loop PI-
controllers. The reference signal to the controllers are LP- ltered white noise signals,
ensuring the input is rich.
In our example we choose the information vector  T (t ; 1) = y T (t ; 1) uT (t ; 1)]. We
have performed 4 simulations, and a least squares algorithm is used in all cases:

18
Simulation 1
First, an ARMAX model
y^(t) = 'T (t ; 1) (36)
0 1 0 1T 0 1 1
BB 0 1 CC BB 2 CC
BB y1(t ; 1) ; y1? 0 CC BB 3 CC
BB y2(t ; 1) ; y? 0 CC BB 4 CC
BB 0
2
y1(t ; 1) ; y1? C CC BBB 5 CC
= B BB y2(t ; 1) ; y2? C  CC
BB u1(t ;01) ; u?1 CC BBB 6 CC
BB u2(t ; 1) ; u? 0 CC BB 7 CC
B@ 2 0 CC BB 8 CC
0 u1(t ; 1) ; u?1 A @ 9 A
0 u2(t ; 1) ; u?2 10
is tted using linear regression on the data. The points y1? y2?]T and u?1 u?2]T are the points
the ARMAX model is linearized about, and are chosen as mean values of the respective
signals.

Simulation 2
Second, a NARMAX model with two local ARMAX models (36) is tted. If cAi and Ti
are constant, the model (33)-(35) can be written
d y(t) = f (y(t) u(t)) = f (y (t)) + f (y (t))u(t) (37)
dt 1 2

Discretizing by the Euler method gives


y(t) = y(t ; 1) + 't(f1 (y(t ; 1)) + f2(y (t ; 1))u(t ; 1)) (38)
Comparing this with (18), Theorem 3 indicates that the operating point = y captures
the nonlinearities. In order to keep the example simple, we observe that the system exhibit
strongest non-linear behavior with respect to temperature. As a rst approximation, it is
therefore reasonable to choose (t) = y2 (t) = 5  10;4  T (t). The 2 local model validity
functions are chosen as Gaussians (22) with 0 = 1:8 and 1 = 1:9 (corresponding to 360 K
and 380 K). These values are found by examining the operating range the data span. The
size of the operating regimes are estimated to be #i = 0:05 (corresponding to 10 K). The
identi cation of the two ARMAX models is performed separately using local cost indices
(23).

Simulation 3
The same models structure as simulation 2, but a global cost index (24) is used to simul-
taneously identify the parameters of the two ARMAX models.

Simulation 4
Finally, we use 4 local ARMAX models and a global cost index (24). The local model
validity functions are still Gaussians, centered at 0 = 1:775 1 = 1:825 2 = 1:875, and
19
3 = 1:925 corresponding to 355 365 375 385 K respectively. The width parameter is
chosen as #i = 0:02, corresponding to 4 K .

Results
The results are summarized in table 2 (all the results are using the identi ed model for
prediction on the data independent of the data used for identi cation). We see that all
NARMAX approaches give signi cantly better results than the ARMAX approach. As
would be expected, better results are achieved with a global performance index than local
indices, and increasing the number of local models also improve on the model accuracy.
One-step-ahead prediction errors are shown in gure 7. The curves indicates that the
prediction error is considerably reduced using the NARMAX models compared to the
ARMAX models.
The covariance functions for the prediction errors are
 !
E ((t) ; )((t +  ) ; )T ] = r11( ) r12( ) (39)
r21( ) r22( )
Estimates of the autocorrelation functions are shown in gure 8. It is well known that an
unbiased model gives an autocorrelation function equal to a  -function (This is not a suf-
cient condition. A more detailed model validation should also consider E (u(t) ; u)((t +
 ) ; )T ] and higher order covariances (Billings & Voon 1986)). However, we may con-
clude from the curves that the NARMAX models strongly improve on the model accuracy
compared to the ARMAX models. We may not expect a perfect model because the model
structure is dierent than the true system: First, there is fundamental structural dierence
between the state-space system and the model based on local models and interpolation.
The low number of local models limits the accuracy. Second, there is a structure error
introduced by sampling the continuous system. Third, the simpli ed choise of operating
point introduces a nonsystematic error.
We have also done simulations using optimization of the #i -parameters in the local model
validity functions after the local ARMAX parameters was identi ed using local cost indices.
This gave substantial improvements, but the results was not as good as using a global index
with prechosen (not optimal) #i -parameters. Including y1 (t) in the operating point vector
also gave some improvements, but not as much as one may have expected. Hence, we can
conclude that the choise of operating point is sensible.

6 Discussions and Conclusions


As discussed in the introduction, this model representation may be useful when rst princi-
ples modelling is resource demanding or does not give satisfactory results. Today, ARMAX
models are no doubt the most widely used black-box type model in industry. Neural net-
works have over the past few years gained considerable popularity, at least academically.
The most popular feed-forward type networks can be used to build black box NARMAX
models, typical examples can be found in (Chen et al. 1990a, Nguyen & Widrow 1990).
Common to most types of neural networks is that the representation is very complex and
dicult to understand and validate, which may explain why so few industrial applications
are reported.

20
Simulation Parameters Variance:
( (t) ; )( (t) ; )T

0 0:5055 1 E

BB 1:8533 CC
BB 0:6256 CC
BB ;01:0755 :9255 C
C  9:40 ;1:77

Simulation 1:
model
ARMAX = B
BB 1:4010 CCC ;1:77 0:34  10;4
BB 0:8599 CC
B@ ;0:1189 CA
;0:0336
0 00:1758 :5985
1 0 0:3931 1
BB 1:7812 CC BB 1:9235 CC
Simulation 2: NARMAX BB 0:7470 CC BB 0:4694 CC
model with 2 local ARMAX BB ;01:0514 :3276 C
C BB ;2:9493 CC  2:10 ;0:40 
models. The local models 1 = B
BB 1:2817 CC 2 = BBB C 0:1069
1:5959
CC ;0:40 0:09  10;4
are tted separately using BB 0:9842 CC BB 0:8765
CC
local prediction errors. B@ ;0:1043 CA B@ ;0:1875 CC
;0:0486 ;0:0410 A
0 00::1721 5771 1 0 0:1890
0:3865 1
BB 1:7859 CC BB 1:9255 CC
Simulation 3: NARMAX BB 0:9096 CC BB 0:3176 CC
model two local ARMAX BB ;1:4806 CC BB ;3:3892 CC  
models. The local models 1 = B BB 1:294 CCC 2 = BBB
0:0185 0:1381
1:6952
CC ;00:96:19 ;00:04:19  10;4
are tted using the global BB 1:0534 CC BB 0:8493
CC
prediction error. B@ ;0:0549 CA B@ ;0:2342 CC
;0:0534 ;0:0418 A
0 00::1620 6185 1 0 0:1983
0:5659 1
BB 1:7528 CC BB 1:8131 CC
BB 0:8624 CC BB 0:7364 CC
Simulation 4: NARMAX BB ;01:0276 :9076 C
C BB ;1:7990 CC  0:38 ;0:08 
model with 4 local NAR- 1 = B B C
CC 2 = BBB
0:0538 CC ;0:08 0:02  10;4
MAX models. The global
prediction error is used.
BB 11::1933 CC BB
1:3712 CC
BB ;00209 0:9867 CC
@ ;0:0467 A B@
:0573 C ;0:1131
;0:0492 A
0 00::1626 4643 1 0 0:1737
0:2971 1
BB 1:8839 CC BB 1:9696 CC
BB 0:5436 CC BB 0:1970 CC
BB ;2:7944 CC BB ;4:0579 CC
3 = B BB 1:5668 CCC 4 = BBB CC
0:0918 0:1642
1:8275 CC
BB 0:9428 CC BB 0:8471 CC
B@ ;0:1685 CA B@ ;0:2624 A
;0:0505 ;0:0433
0:1847 0:2042
Table 2: Results for ARMAX and NARMAX model tting.

21
Decomposing the system operation into several operating regimes and using local ARMAX
models to describe each operating regime is appealing for several reasons:

It is possible to integrate several kinds of system knowledge, including local rst


principles models, with a black-box type model. ARMAX models are well understood
and widely used in industry, and is hence a convenient basis for building NARMAX
models.
The class of systems that can be represented is large, and a linear parameterization
of the model is sucient.
The concept is straightforward, and the model structure is easy to understand. This
is important, since the model structure can be easily validated. In addition, some
validation may be performed by validating each local model separately.
Describing a system by means of operating regimes is common practise in engineering.
Integrating the models for each operating regime by interpolation have not been
common so far, but seems to be a straightforward way to build models that are valid
within several operating regimes. The interpolation may improve the accuracy of
the model, compared to piecewise linear models. In addition, the smoothness of the
model is an inherent property.

A fundamental problem with all local modelling methods is the curse of dimensionallity
problem (Bellman 1961, Moody & Darken 1989, Tolle et al. 1992, Friedman 1991). In our
case, we have shown that this problem may sometimes be reduced considerably, since the
operating point may be of lower dimension than the information vector.
To summarize, we have investigated how non-linear systems can be modelled using NAR-
MAX models based on local ARMAX models. The primary result is that given a sucient
number of local models and well de ned operating regimes, the system function can be
approximated to arbitrary accuracy. In practice, noise and the amount of data available
will limit the attainable accuracy. Standard identi cation algorithms can easily be applied
since the proposed representation will be linear in the parameters. The empirical choice
of model validity function may however complicate the problem. Several practical aspects
of building such models are outlined and illustrated by a simulation example.
The approach falls somewhat between rst principles modelling and pure black-box mod-
elling. Available local rst principles models as well as a priori knowledge in terms of
operating regimes, can be incorporated in the model.

Acknowledgements
This work was supported by the Royal Norwegian Council for Scienti c and Industrial
Research (NTNF) under doctoral scholarship grant no. ST. 10.12.221718 given to the rst
author.

References
Akaike, H. (1969), `Fitting autoregressive models for prediction', Ann. Inst. Stat. Math.
21, 243{247.
22
Bellman, R. E. (1961), Adaptive Control Processes, Princeton Univ. Press.
Billings, S. A. & Voon, W. S. F. (1986), `Correlation based model validity tests for non-
linear models', Int. J. Control 44(1), 235{244.
Billings, S. A. & Voon, W. S. G. (1987), `Piecewise linear identi cation of non-linear
systems', Int. J. Control 46, 215{235.
Broomhead, D. S. & Lowe, D. (1988), `Multivariable functional interpolation and adaptive
networks', Complex Systems 2, 321{355.
Chen, S. & Billings, S. A. (1989), `Representation of non-linear systems: the NARAMX
model', Int. J. Control 49(3), 1013{1032.
Chen, S., Billings, S. A. & Grant, P. M. (1990a), `Non-linear system identi cation using
neural networks', Int. J. Control 51(6), 1191{1214.
Chen, S., Billings, S. A., Cowan, C. F. N. & Grant, P. (1990b), `Practical identi cation of
NARMAX models using radial basis functions', Int. Journal of Control 52(6), 1327{
1350.
Cyrot-Normand, D. & Mien, H. D. V. (1980), Non-linear state-ane identi cation meth-
ods: Application to electrical power plants, in `Proc. IFAC Symposium on Automatic
Control in Power Generation, Distribution and Protection', pp. 449{462.
Fortescue, T. R., Kershenbaum, L. S. & Ydstie, B. E. (1981), `Implementation of self-
tuning regulators with variable forgetting factors', Automatica 17, 831{835.
Friedman, J. H. (1991), `Multivariable adaptive regression splines', The Annals of Statistics
19, 1{141.
Gill, P., Murray, W. & Wright, M. (1981), Practical optimization, Academic Press, Inc.
Haber, R. (1985), Nonlinearity tests for dynamics processes, in `7th IFAC Symp. on Iden-
ti cation and System Parameter Estimation', pp. 409{414.
Hilhorst, R. A., van Amerongen, J. & Lohnberg, P. (1991), Intelligent adaptive control of
mode-switch processes, in `Proc. IFAC Internatinal Symposium on Intelligent Tuning
and Adaptive Control, Singapore'.
Johansen, T. A. & Foss, B. A. (1992a), `A NARMAX model representation for adaptive
control based on local models', Modeling, Identication, and Control 13(1), 25{39.
Johansen, T. A. & Foss, B. A. (1992b), Nonlinear local model representation for adap-
tive systems, in `Proceeding of the Singapore Int. Conf. on Intelligent Control and
Instrumentation', Vol. 2, pp. 677{682.
Johansen, T. A. & Foss, B. A. (1992c), Representing and learning unmodeled dynamics
with neural network memories, in `Proceedings of the American Control Conference,
Chicago, Il.', Vol. 3.
Jones, R. D., Lee, Y. C., Barnes, C. W., Flake, G. W., Lee, K., Lewis, P. S. & Qian,
S. (1989), Function approximation and time series prediction with neural networks,
Technical Report 90-21, Los Alamos National Lab., New Mexico.
Lane, S. H., Handelman, D. A. & Gelfand, J. J. (1992), `Theory and development of
higher-order CMAC neural networks', IEEE Control System Magazine 12(2), 23{30.
23
Leontaritis, I. J. & Billings, S. A. (1985), `Input-output parametric models for non-linear
systems', Int. Journal of Control 41(2), 303{344.
Jones et al., R. D. (1991), Nonlinear adaptive networks: A little theory, a few applications,
Technical Report 91-273, Los Alamos National Lab., New Mexico.
S&lid, S., Egeland, O. & Foss, B. (1985), `A solution to the blow-up problem in adaptive
controllers', Modeling, Identication and Control 6(1), 39{56.
Soderstrom, T. & Stoica, P. (1988), System Identication, Prentice Hall.
Moody, J. & Darken, C. J. (1989), `Fast learning in networks of locally-tuned processing
units', Neural Computation 1, 281{294.
Nguyen, D. H. & Widrow, B. (1990), `Neural networks for self-learning control systems',
IEEE Control Systems Magazine 10(3), 18{23.
Omohundro, S. M. (1987), `Ecient algorithms with neural network behavior', Complex
Systems 1, 273{347.
Parkum, J. E., Poulsen, N. K. & Holst, J. (1990), Selective forgetting in adaptive proce-
dures, in `Proc. 11th IFAC World Congress, Tallin, Estonia'.
Polycarpou, M. M., Ioannou, P. A. & Ahmed-Zaid, F. (1992), Neural networks and on-line
approximators for discrete-time nonlinear system identi cation, Submitted to IEEE
Trans. Control Systems Technology.
Powell, M. J. D. (1987), Radial basis function approximations to polynomials, in `12th
Biennal Numerical Analysis Conference, Dundee', pp. 223{241.
Priestley, M. B. (1981), Spectral Analysis and Time Series, Academic Press.
Priestley, M. B. (1988), Non-linear and Non-stationary Time Series Analysis, Academic
Press.
Psichogios, D. C., De Veaux, R. D. & Ungar, L. H. (1992), Nonparametric system identi-
cation: A comparison of MARS and neural nets, in `Proc. American Control Con-
ference, Chicago, Il.'.
Rippin, D. W. T. (1989), Control of batch processes, in `Proceedings DYCORD+ 89,
August, Maastrict, The Netherlands', pp. 115{125.
Skeppstedt, A. (1988), Construction of Composite Models from large data-sets, PhD thesis,
University of Linkoping.
Skeppstedt, A., Ljung, L. & Millnert, M. (1992), `Construction of composite models from
observed data', Int. J. Control 55(1), 141{152.
Stokbro, K. (1991), Predicting chaos with weighted maps, Technical Report 91/10 S,
NORDITA, Copenhagen.
Stokbro, K. & Umberger, D. K. (1990), Forecasting with weighted maps, in `Proc. 1990
Workshop on Nonlinear Modeling and Forecasting, Santa Fe Institute'.
Stokbro, K., Hertz, J. A. & Umberger, D. K. (1990), Exploiting neurons with localized
receptive elds to learn chaos, Preprint 28, Niels Bohr Institute and NORDITA,
Copenhangen. Submitted to Journal of Complex Systems.

24
Stromberg, K. R. (1981), An Introduction to Classical Real Analysis, Wadsworth, Inc.,
Belmont, Ca.
Takagi, T. & Sugeno, M. (1985), `Fuzzy identi cation of systems and its application to
modeling and control', IEEE Trans. Systems, Man, and Cybernetics 15, 116{132.
Tolle, H., Parks, P. C., Ersu, E., Hormel, M. & Militzer, J. (1992), `Learning control with
interpolating memories { general idead, design lay-out, theoretical approaches and
practical applications', Int. J. Control 56, 291{317.
Tong, H. & Lim, K. S. (1980), `Threshold autoregression, limit cycles and cyclical data',
J. Royal Stat. Soc. B 42, 245{292.

25
6

3 g0 g1 g2
2

1
~0 ~1 ~2
0
0 1 2
(p+1)!
 M -1
-1 0 1 2 3 4 5 6 7 8 9


Figure 1: A geometric interpretation of the constraints on ~i

3
g0 g1 g2
2

1
w~0 w~1 w~2
~0 ~1 ~2
0
0 1 2
(p+1)!
 M -1
-1 0 1 2 3 4 5 6 7 8 9


Figure 2: Situation when the width of ~i goes to zero.

26
6

f ( )
2

1
f^( )

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2


Figure 3: Approximation of f ( ) =  2 + 1 using two local linear models.

27
f ( )
10 10

5 5

0 0

-5 -5
-1 0 1 2 3 -1 0 1 2 3

(a) (b)
10 10

8 8

6 6

4 4

2 2

0 0
-1 0 1 2 3 -1 0 1 2 3
(c) (d)

Figure 4: a) Approximation of f ( ) =  2 + 1 using a piecewise linear model with two
local linear models. b) Approximation using two local linear models and Gaussian model
validity functions. c) Approximation using 5 local 0th order expansions (constant) d)
Approximation using a radial basis-function expansion with 5 Gaussian basis-functions.

28
! !2 !1

!4
!3

!5 !6

Figure 5: The set ! of operating points is covered with local models. The gure shows
lines where the model validity functions i are constant.

29
2.5

2 y2(t)
1.5

y1(t)
1

0.5

0
0 100 200 300 400 500 600 700 800 900 1000

t
1

0.8

0.6

0.4
u1(t)
0.2

-0.2

-0.4

u2(t)
-0.6

-0.8

-1
0 100 200 300 400 500 600 700 800 900 1000

t
2.5

2
y2(t)
1.5

1
y1(t)
0.5

0
0 100 200 300 400 500 600 700 800 900 1000

t
1

0.8

u1(t)
0.6

0.4

0.2

-0.2

u2(t)
-0.4

-0.6

-0.8

-1
0 100 200 300 400 500 600 700 800 900 1000

t
Figure 6: The two rst curves show output and input data used for identi cation, while
the two last curves show data used for validation.

30
0.1

1 0.05

-0.05

-0.1

Sim. 1 0 100 200 300 400 500 600 700 800 900 1000

0.1

2 0.05

-0.05

-0.1
0 100 200 300 400 500 600 700 800 900 1000

0.1

1 0.05

-0.05

-0.1

Sim. 2 0 100 200 300 400 500 600 700 800 900 1000

0.1

2 0.05

-0.05

-0.1
0 100 200 300 400 500 600 700 800 900 1000

0.1

1 0.05

-0.05

Sim. 3
-0.1
0 100 200 300 400 500 600 700 800 900 1000

0.1

2 0.05

-0.05

-0.1
0 100 200 300 400 500 600 700 800 900 1000

0.1

1 0.05

-0.05

Sim. 4
-0.1
0 100 200 300 400 500 600 700 800 900 1000

0.1

2 0.05

-0.05

-0.1
0 100 200 300 400 500 600 700 800 900 1000

t
Figure 7: Prediction errors on the independent validation data for the resulting models of
the 4 simulations. Here 1 = y1 ; y^1 , and 2 = y2 ; y^2 .

31
1 1

0.5 1 0.5
2
3
4
0 0
4
3
-0.5 -0.5 2
1
r11 ( )
r11 (0)
pr r12(0)( r)
11 22 (0)

-1 -1
0 5 10 0 5 10

1 1

1
0.5 0.5 2
3
4
0 0
4
3
2
-0.5 1 -0.5
pr11r21(0)( r22) (0) r22 ( )
r22 (0)

-1 -1
0 5 10 0 5 10

Figure 8: The curves show the autocorrelation functions for the prediction errors, for the
resulting models of the 4 simulations.

32

You might also like