0% found this document useful (0 votes)
92 views7 pages

Universal Approximation To Nonlinear Operators by Neural Networks With Arbitrary Activation Functions and Its Application To Dynamical Systems

This paper investigates the capability of neural networks to approximate nonlinear operators and functionals using arbitrary activation functions. The main findings include that every nonpolynomial continuous function can serve as an activation function and that neural networks can approximate outputs of dynamical systems as a whole. The results have significant implications for system identification and computing outputs in dynamic systems.

Uploaded by

laoji Qian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views7 pages

Universal Approximation To Nonlinear Operators by Neural Networks With Arbitrary Activation Functions and Its Application To Dynamical Systems

This paper investigates the capability of neural networks to approximate nonlinear operators and functionals using arbitrary activation functions. The main findings include that every nonpolynomial continuous function can serve as an activation function and that neural networks can approximate outputs of dynamical systems as a whole. The results have significant implications for system identification and computing outputs in dynamic systems.

Uploaded by

laoji Qian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 6, NO.

4, JULY 1995 91 1

Universal Approximation to Nonlinear Operators


by Neural Networks with Arbitrary Activation
Functions and Its Application to Dynamical Systems
Tianping Chen and Hong Chen

Abstract- The purpose of this paper is to investigate neural Mhaskar and Micchelli [l I] showed that under some restric-
network capability systematically.The main results are: 1) every tion on the amplitude of a continuous function near infinity,
Tauber-Wiener function is qualified as an activation function any nonpolynomial function is qualified to be an activation
in the hidden layer of a three-layered neural network, 2) for a
continuous function in S’ (R’ ) to be a Tauber-Wiener function, function.
the necessary and sufficientcondition is that it is not a polynomial, It is clear that all the aforementioned works are concerned
3) the capability of approximating nonlinear functionals defined with approximation to a continuous function defined on a com-
on some compact set of a Banach space and nonlinear operators pact set in R” (a space of finite dimensions). In engineering
has been shown, which implies that 4) we show the possibility by
neural computation to approximate the output as a whole (not at problems such as computing the output of dynamic systems
a fixed point) of a dynamical system, thus identifying the system. or designing neural system identifiers, however, we often
encounter the problem of approximating nonlinear functionals
defined on some function space, even nonlinear operators from
I. INTRODUCTION
one function space (a space of infiinte dimensions) to another

T HERE have been many papers related to approximation


to a continuous function of several variables. In 1987,
Wieland and Leighton [ 11 dealt with the capability of networks
function space (another space of infinite dimensions). In
[ lo], Sandberg gave an interesting theorem on approximating
nonlinear functionals by superposition and composition of
consisting of one or two hidden layers. Miyake and Irie [2] several linear functionals and a continuous function of one
obtained an integral representation formula with an integral variable.
kernel fixed beforehand. This representation formula is a kind Yet, two problems remain open:
of integrals, which could be realized by a three-layered neural
1) Can we give those linear functionals explicitly?
network. In 1989, several papers related to this topic appeared.
2) Can we approximate nonlinear operators rather than
They all claimed that a three-layered neural network with
functionals?
sigmoid units on the hidden layer can approximate continuous
or other kinds of functions defined on a compact set in R”. Problem 1 is essential in application, since otherwise we are
They used different methods. Carrol and Dickinson [4]used not able to construct real networks. Problem 2 is important in
inverse Radon transform. Cybenko [3] used the Hahn-Banach computing dynamic systems, for a dynamic system is in fact
theorem and Riesz representation theorem. Funahashi [ 5 ] an operator. In [13], we discussed in detail the problem of
approximated Irie and Miyake’s integral representation by approximating nonlinear functionals defined on some compact
a finite sum, using a kernel which can be expressed as a set in C[a,b] or P [ a , b] and obtained some explicit results.
difference of two sigmoidal functions. Hornik et al. [6] applied The problem of a neural network’s capability in approximating
the Stone-Weierstrass theorem using trigonometric functions. nonlinear operators with its related application in computing
In all these papers, however, sigmoidal functions must be the output as a whole of a dynamic system, however, still
assumed to be continuous or monotone. Recently [9], [19] we remains open. Moreover, a unified and systematic treatment
pointed out that the boundedness of the sigmoidal function of neural network approximation to continuous functions,
plays an essential role for its being an activation function in functionals and operators is much needed but nevertheless also
the hidden layer. remains to be an open problem.
In addition to sigmoidal functions, many other functions Specifically, it is quite natural to raise the following issues:
can be used as activation functions in the hidden layer. For 1) What is the characteristic property for a continuous
example, Hornik [6] proved that any bounded nonconstant function to qualify as an activation function in the hidden
continuous function is qualified to be an activation function. layer of a neural network?
2) Can we give a neural network model to approximate
Manuscript received December 7, 1992; revised July 18, 1993 and July 10,
1994. This work was supported in part by the “Climbing Programme-National nonlinear functionals defined on some compact set in
Key Project for Fundamental Research in China” Grant NSC 92097. C ( K ) ,where K is some compact set in some Banach
T. Chen is with the Department of Mathematics, Fudan University, Shang- space?
hai, P. R. China.
H. Chen is with Sun Microsystems, Inc., Mountain View, CA 95051 USA. 3) Can we give a neural network model, which could be
IEEE Log Number 9409156. used to approximate the output of some dynamic system
1045-9227/95$04.00 0 1995 IEEE

Authorized licensed use limited to: Univ of Calif Merced. Downloaded on February 05,2025 at 19:07:48 UTC from IEEE Xplore. Restrictions apply.
912 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 6, NO. 4, JULY 199.5

as a whole (not merely at a special point, cf. [lo], [13]), S'(Rn) Tempered distributions, i.e., linear continuous
thus to identify the system? functionals defined on S(R")
In this paper, we systematically give strong results regarding C'"(R" Infinitely differentiable functions
these issues. CF (R" Infinitely differentiable functions with
The paper is organized as follows. In Section 11, we review compact support in R"
some definitions and notations. In Section 111, we show that the C,[-l, 1" All periodic functions with period two with
necessary and sufficient condition for a continuous function in respect to every variable I C ~ ,i = 1,. . . , n.
S'(R1) (tempered distributions in RI) to be a Tauber-Wiener
function (for definitions, see Section 11) is that it is not a
FUNCTIONS
111. CHARACTERISTICS OF ACTIVATION
polynomial, and any Tauber-Wiener function can be used
as an activation function, i.e., any nonpolynomial continuous In this section, we prove three theorems.
function in S ' ( R 1 ) is an activation function. What is more Theorem I : Suppose that 9 is a continuous function, and
interesting is that we show the approximation is equiuniform g E S'(R1), then g E (TW), if and only if g is not a
on any compact set in C ( K ) ,which is crucial in discussing polynomial.
approximation to continuous operators by neural networks. Theorem 2: If CT is a bounded sigmoidal function, then
In Section IV, we show the capability of neural networks to CT E (TW).
approximate continuous functionals defined on some compact Theorem 3: Suppose that K is a compact set in Rn, U is
set in C ( K ) , where K is a compact set in some Banach a compact set in C ( K ) ,g E (TW), then for any t > 0, there
space and through which we establish the capability of neural exist a positive integer N , real numbers O i , vectors wi E R",
networks to approximate continuous operators from C (K1) i = l , . .. ,N , which are independent of f E C ( K ) and
to C(K2). The main results in Section IV have potential constants ci (f), 1: = 1,. . . ,N depending on f , such that
applications to computing outputs of dynamic systems and
identifying the systems. This is an important issue in system
identification [17] and [18], and we will discuss it in more
detail in Section V.
holds for all IC E K and f E U. Moreover, each c , ( f ) is a
AND DEFINITIONS
11. NOTATIONS linear continuous functional defined on U.
Remark I : Theorem 3 shows that for a function (continuous
Definition I : A function CT : R1 -+ R1 is called a sigmoidal or discontinuous) to be qualified as an activation function, a
function, if it satisfies sufficient condition is that it belongs to TW class. Therefore,
to prove that a neural network is capable of approximating any
.(IC)
{ limz+-m
limz+m
= 0,
.(IC) = 1. continuous function of n variables, all we need to do is to deal
with the case n = 1, thus we have reduced the complexity
Definition 2: If a function g : R -+ R (continuous of the problem in terms of its dimensionality. Moreover,
or discontinuous) satisfies that all the linear combinations by examining the approximated function f ( 1 ~ 1.,. . , IC,) =
N
c,g(X,z+8,), A, E R, 8, E R, C , E R, i = 1, 2, . . . , N , f(z1, O ; . . , 0) = f * ( 1 ~ 1 ) ,where f * ( z l ) is a continuous
are dense in every C[a,b], then g is called a Tauber-Wiener function of one variable, it is straightforward to see that the
(TW) function. condition is also a necessary one.
Definition 3: Suppose that X is a Banach space, V C X is Remark 2: The equiuniform convergence property in The-
called a compact set in X , if for every sequence { T ~ } Fwith ?~ orem 3 will play a crucial role in approximation to nonlinear
all T , E V, there is a subsequence { I C , ~ } , which converges operators by neural networks.
to some element z E V. Remark3: When a sigmoidal function is used as an ac-
It is well known that if V G X is a compact set in X , then tivation in a neural network, Theorem 2 shows that the only
for any S > 0, there is a S-net N ( 6 ) = { X I , . . . , I C , ( ~ ) } , with necessary condition imposed on is its boundedness. In contrast,
all IC, E V, i = 1,.. . , n(S), i.e., for every z E X . there is in almost all other papers [ 11-[SI, sigmoidal functions must be
some z, E N ( 6 ) such that 1) 2, - 2 I I x < S. assumed to be either continuous or monotone.
In the sequel, we will often use the following table notations. Remark 4: In [12], some result similar to Theorem I was
obtained under more restrictions imposed on g, i.e., there
X some Banach space with norm 1) . 11s are positive integer N and a constant C N , such that 1(1 +
R" Euclidean space of dimension n lsl)-Ng(x)l 5 C for all IC E RI. This restriction is essential
K some compact set in a Banach space for [12], for the proof in [I21 depends heavily on a variation
C(K) Banach space of all continuous functions of Paley-Wiener Theorem. In Theorem 1, however, we only
defined on K , with norm assume that g E C ( R 1 )n S'(R"), which is weaker than the
II f llC(K)= maXzEK If(.)l assumptions used in 1121.
(TW) All the Tauber-Wiener functions Proof of Theorem I : We will prove by contradiction. If
S(R") Schwartz functions in tempered distribution the set of all the linear combinations E,"& c,g(X,z + 19,) is
theory, i.e., rapidly decreasing and infinitely not dense in C [ a ,b ] , then Hahn-Banach extension theorem
differentiable functions and Riesz representation of linear continuous functionals show

Authorized licensed use limited to: Univ of Calif Merced. Downloaded on February 05,2025 at 19:07:48 UTC from IEEE Xplore. Restrictions apply.
~

CHEN AND CHEN: UNIVERSAL APPROXIMATION TO NONLINEAR OPERATORS 913

that there is a signed Bore1 measure dp with supp(dp) C [a, b] \.(U) - 11 < 1/M2; if U < -W, then Io(u)I < 1/M2. Let
and K > 0 be such that K . (1/2M) > W. Construct

g(z) = f ( - M ) o ( K ( z - t-1))
N

for all X # 0 and 0 E R1.Take any w E S(R1),then + C[f(.i)- f(zi-l)l.(K(z - ti-1)) (9)
i=l

then we can prove

Let Xz + 0 = U and change order of integration, we have 1g(z) - f ( z ) l < E forallz E [-I, 11. (IO)

Theorem 2 is thus proved.


h o r to proving Theorem 3, we need to establish the
which is equivalent to following lemmas.
Lemma 1: Suppose that K is a compact set in R”,
g ( ? ; ( . ) i p ( X . ) )= 0 (5) f E C ( K ) , then there is a continuous function E ( f ) E
C(R”),such that 1) f ( z ) = E ( f ) ( z )for all z E K , 2)
where g represents Fourier transform of g in the sense of
tempered distribution, and ( 5 ) is also understood in the sense SUPz.Rn I E ( f ) ( x ) I 5 SUP,EK I f ( z ) l , and 3 ) there is a
constant c such that
of distribution [14]. In order that the left*and side of ( 5 )
makes sense, we have to show that ‘Li)(t)dp(Xt) E S(R1).
Since supp(dp) C [a,b], it is straightforward to show that
&(t) E Cm(R1) and for each IC = 1,2,.. . , there is a constant
C k such that

Proof: The proof of Lemma 1 can be found in [15, p.


1751.
Consequently, ~(t)&(g E s(R~). Lemma 2 [16]: Suppose K is a compact set in Banach
Since d p 8 0 and d&) E C”(R1),hence there exists space X , then V is a compact set in C ( K ) ,if and only if
some t o # 0 with some neighborhood (to- 6,to 6) such + 1) V is a closed set in C ( K ) .
+
that &(t) # 0 for all t E ( t o - 6 , t o 6). Now, if tl # 0, let 2) There is a constant M , such that I l f ( z ) l l c c ~I , M for
X = to/tl, then & ( A t ) # 0 for all t E (tl - S / X , t * S/X). all f E V.
Take any ‘Li) E C?(to - S/2X, t o +S/2X), then ‘Li)(t)/dp(Xt)E 3) V is equicontinuous, i.e., for any E > 0, there is a S > 0
such that If(z’) - f ( z ” ) l < E for all f E V, provided
S ( W , and by ( 5 )
that z’, z” E K and I( z’- z” IIx < S.
Lemma 3: Suppose that K is a compact set in I“ = [0, l]”,
(7)
V is a compact set in C ( K ) ,then V can be extended to a
compact set in Cp[-l, 11”.
The previous argument shows that for any fixed point t*, Proof: By Lemmas 1 and 2, V can be extended to be a
there is a neighborhood [t*-q,t*+7]such that g(Q(.)) = 0 for compact set VI in C[0,11”. Now, for every f E VI, define an
all w with compact support It* -7, t*+7], i.e., supp(g) (0). even extension of f as follows
By the distribution theory, g is some linear combination of S-
Dirac function and its derivatives, which is equivalent to that f * ( z 1 , “ ’ , z k , “ ’ , 2”) = f ( z 1 , ” ’ , - x k , ” ’ , 2”) (12)
g is a polynomial. Theorem 1 is proved.
The proof of Theorem 2 can be found in [9] and [19]. Here then U = { f * : f E V I } is the required compact set in
we only give a brief proof for the completeness of this paper. C,[-1, 11%.
Proof of Theorem 2: Without loss of generality, we Lemma 4: Suppose that U is a compact set in C,[-l, 11‘
can assume that [a, b] = [O, 11. Since f is continuous on
[-1, 11 for any e > 0, there is an integer M > 0, such that
If(.’) - f ( z ” ) l < e/4, provided that z ’ , ~ ’ ’ E [-1, 11 and
12’ - z’’1 < l/M.
Divide [-1, 11 into 2M equal segments, each with length is the Bochner-Riesz means of Fourier series of f , where
of 1/M. Let m = ( m l , . . . , m n ) ,lmI2 = lmiI2, c,(f) are Fourier
coefficients of f , then for any e > 0, there is R > 0 such that
-1 = zo<z1< . . . < X & f = 0
<ZM+l< .’. <22M = 1 (8) 1Bd.f; z) - f(.)l < E (14)

+
and t; = 1/2(zi zi+l),t - 1 = -1 - 1/2M. From the forevery fEUandzE[-l,l]“providedthat a > ( n - 1)/2.
assumption, there exists W > 0, such that if U > W, then The proof of Lemma 4 can be found in [14].

Authorized licensed use limited to: Univ of Calif Merced. Downloaded on February 05,2025 at 19:07:48 UTC from IEEE Xplore. Restrictions apply.
914 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 6, NO. 4, JULY 1995

Proof of Theorem 3: Without loss of generality, we can is a continuous functional defined on U (and also a continuous
assume that K C [O, 11". By Lemma 3 , we can assume that functional defined on V), and c i ( f ) , being a finite linear
K = [-1, 11" and U 5 Cp[-l, 11". By Lemma 4, for any combination of the Fourier coefficients of f * , is surely a
E > 0, there exists R > 0, such that for any x = (21,. . . , 2
)
, E continuous functional defined on V. The proof of Theorem
[-1, 11" and f E U, there holds 3 is completed.

Iv. APPROXIMATIONTO NONLINEAR


CONTINUOUS FUNCTIONALS AND MAPS
In this section, we will discuss the problem of approximat-
ing nonlinear continuous functionals and operators by neural
network computation. The main results are as follows.
I ' Theorem 4: Suppose that g E (TW), X is a Banach Space,
K C X is a compact set, V is a compact set in C ( K ) ,f is a
By the definition of the Fourier coefficients and evenness of continuous functional defined on V, then for any E > 0, there
f*(z),we can rewrite (15) as are an positive integerN, m points 51, . . . , x m E K , and real
constants c i , B i , tij,i = I, . . . , N, j = 1, . . . ,m., such that
..
dml...mn c o s ( ~ ( m l x 1 + + m,~,)) I N / m \I
ImlSR

holds for all U E V.


where d,l,,.m, are real numbers. It is obvious that for every
Theorem 5: Suppose that g E (TW), X is a Banach Space,
K1 C X , K2 C R" are two compact sets in X and R",
x E [-1, l]",there is a unique U E [ - f i r R , f i r R ] , such
respectively, V is a compact set in C ( K 1 ) ,G is a nonlinear
that
continuous operator, which maps V into C(K2), then for
U = r m . x = r(m121 + . . . + mnx,) (17) any t > 0 , there are positive integers M , N, m. constants
c t , [k, ( f j E R, points w k E R n , x J E K1, i = l , . . . , M ,
where m = ( m l ,... , m,). Since COS(U) is a continuous
IC = l , . . . ,N , j = I , . . . , m, such that
function in [ - f i r R , f i r R I and g E (TW), we can find
an integer M , real numbers s j , qj and <j, j = 1,. . , M , e I N M / m \
such that
M
(18)

holds for all U E V and y E K z .


The following two lemmas are well known and will be used
in the proof of Theorems 4 and 5.
Lemma 5: Let X be a Banach Space and K C X , then K
is a compact set if and only if the following two conditions
are satisfied simultaneously: 1) K is a closed set in X , and
2) for any S > O , there is a &net N ( 6 ) = { X I , . . . , x,(q},
i.e., for any x E K , there constitute an Xk E N ( 6 ) such that
11 z - xk (IX < 6.
(I.(z) N
-XCi(f*)f*(Wi
i=l
.x 8i)+ < E (20)
Lemma 6: If V C C ( K ) is a compact set in C ( K ) ,then
it is uniformly bounded and equicontinuous, i.e., 1) there is
A > O such that )I U(.) Ilc(K)< A for all U E V, and 2) for
any E > 0, there is 6 > 0 such that Iu(x') - u(z")I < E for all

if(r) -
N

i=l
C Z ( f ) d ~ .x
i + Oil <E
U E V, provided that 11 z' - x" I(,y < S.
Now pick a sequence € 1 > c2 > . . . > E , -+ 0 , then we
can find another sequence 6 1 > Sz < . . * > 6, 4 0, such that
If(.) - !('U)\ < Ek for all f E V , provided that U , 'U E andv
(1 U - 'U /Ic(K) < 2 6 k , for f is a continuous functional defined
on a compact set V.
By Lemma 6, we can also find q1 > 7 2 > . . . qn -+ 0 such
that Iu(z' ) - u(x")J < SI,for all U E V, whenever x', IC'E 'K
and I( Z' - 2'' 1lX < V k .

Authorized licensed use limited to: Univ of Calif Merced. Downloaded on February 05,2025 at 19:07:48 UTC from IEEE Xplore. Restrictions apply.
915

By induction and rearrangement, we can find a sequence Consequently


{xi}zl with each xi E K and a sequence of positive integers
n(Vi) < n ( ~ 2 <
) . . . < VI,) + 00, such that the first n ( q ~ , ) 11 U(x) - % k ).( IIx
elements N(vI,)= (21, . . . ,xn(,,)} is an Vk-net in K. n(m)
For each qk-net, define functions 5 SI, T,,,j(x) = SI, for all U E V. (32)
j=1

3) Suppose ( ~ 2 is ) a~ sequence
~ in V * . If there is a
subsequence { ~ ' l } f ? ~ of { u ~ with } ~ all uir
~ E V,
and
I = 1, . , then by the fact that V is compact, there
is a subsequence of { u i l } & , which converges to some
U E V. Otherwise, to each U ' , there corresponds a
positive integer IC(i) and a vi E V such that ui = ut,( .
There are two possibilities: i) We can find infinite
j=1 and a fixed ko such that q ~ , ( =~ ~VI,(^^) ) = ... - -
for j = l,...,n(qk) . It is easy to verify that {Tvk,j(x)}
is a = . . . = qko, i.e., uil E Vqk0for all i l . By
partition of unity, i.e., proposition 1) of this lemma, Vvk0is a compact set, there
is a subsequence of { w ~ , ( ~ ) which
}, converges to some
U E V&, i.e., there is a subsequence of { U ' } converging
to w E Vqk0. ii) There are sequences il < i 2 < . . . + cc
T,k,J(x)= 1 (26) and IC(i1) < IC(i2) < ... + cc such that uif E V,E(I1).
j=1
Let u Z ~ E V be such that
T,,,3(x)
=0 if II 2 - x J Ilx > VI,. (27)
For each U E V, define a function
n(l)k)

UVk(2) = 4x63)T,kJ(x) (28) Since v21 E V and V is compact, we see that there is a
j=1 subsequence of {vzl}El, which converges to some v E
V. By proposition 2) of this lemma, the corresponding
and let V,, = {uOk: U E V } and V* = V U (ur=.=,V,,).We
subsequence of {u2l}& also converges to W. Thus the
then have the following result.
compactness of V * is proved.
Lemma 7:
Proof of Theorem 4: By the Tietze Extension Theorem,
1) For each fixed IC, V,, is a compact set in a subspace of
we can define a continuous functional on V* such that
dimension VI,) in C ( K ) .
2) For every U E V, there holds f * ( z )= f ( x ) if z E V. (34)
Because f * is a continuous functional defined on the compact
11 Ilc(K) (29)
set V * , therefore for any F > 0, we can find a S > 0 such
3) V* is a compact set in C ( K ) . that If*(u)- f*(v)1 < ~ / provided
2 that U , 2' E V* and
Proof: We will prove the three propositions individually II U - IlC(K) <S.
as follows. Let IC be fixed such that SI, < 6,then by (29) for every U E V
For a fixed k , let U!?, a = 1, 2, . . . , be a sequence in II 'U. - Uq, IIS < SI, (35)
V,, and u ( ~be) a sequence in V, such that which implies
n(l)k)
If*(U) - f*(%,)l< (36)
= u(~)(x~)T,,,~(x).
(30)
for all U E V.
j=1
By proposition 1) of Lemma 7, we see that f*(u,,) is
Since V is a compact set in C ( K ) ,there is a subse- a continuous functional defined on the compact set V,, in
quence ~ ( ~ " ( xwhich
), converges to some U E V, then Rn("). By Theorem 3, we can find N , c,, E Z j , O,, i =
it is obvious that u$;)(x) converges to u,,(x)E V,,, 1,... , N , j = 1,.. . , VI,), such that
i.e., V,, is a compact subset in C ( K ) .
By the definition and the property of unity partition, we
have
n(qk)
Combining it with (36), we conclude that
4.) - U,!% (x)= [4.) - 473)lT,k,j(4
j=1 I N / m \I
= [U(.) -441T,k,j(4.
IIl-zj IIx
(31) where m = VI,). Thus, Theorem 4 is proved.

Authorized licensed use limited to: Univ of Calif Merced. Downloaded on February 05,2025 at 19:07:48 UTC from IEEE Xplore. Restrictions apply.
916 E E E TRANSACTIONS ON NEURAL NETWORKS, VOL. 6, NO. 4, JULY 1995

1
N
G ( u ) ( y )- ck(G(U))g(Wk ' y + Ck) < (39)
k=l

holds for all IC = 1,.. . , N and U E V, where I


INPUT 1 "(XI
N
Fig. 1. A neural network architecture of approximationto nonlinear operator
G ( u ) ( y ) based on Theorem 5.

Substituting (40) into (39), we obtain that


The significance of these results lies in that we show
potential application of neural network computation in system
identification. A general procedure that we suggest in this
paper is as follows.
Let a system be V = KU, where U is the input, V is the
.g ( W k . y + G)< 6 (42) output, and K is to be identified.
Suppose that based on some prior knowledge or exper-
iments, we know several input-output relationships VI =
KU1, . . . , V, = KU,. Generally, they can be expressed by
discrete data sets { u S ( x j ) , s = I , . . . , n , j = l , . . . ,m } ,
{vs(gl), s = 1,.. . , n , j = 1 , .. . , L } . Using these data, and
by Theorem 5, we can construct a functional
L n I N M

' ( g & U s ( x j ) +0:)g(wk.y1 +<k)i2. (44)


holds for all U E V and y E K2. This completes the proof
of Theorem 5.
A neural network architecture based on Theorem 5 is shown <
Parameters Ct, S,",,, 6:, W k , could be determined by mini-
in Fig. 1. mizing E (by using, for example, backpropagation algorithm,
its improved variations, or other popular algorithms). Then
v. APPLICATION
TO NONLINEAR
DYNAMICAL
SYSTEMS based on Theorem 5
In [12], we discussed the problem of approximating the N M / m

output of a dynamical system at a fixed point (or time) by


neural networks. As a direct application of Theorem 5, we
can use neural networks to approximate the output as a whole
of a nonlinear dynamical system. Indeed, built upon the several
keystone theorems proved earlier in Section IV, our result on can be viewed as an approximant of V(y) = ( K U ) ( y ) and
,
this topic follows naturally. thus system K can be identified.

Authorized licensed use limited to: Univ of Calif Merced. Downloaded on February 05,2025 at 19:07:48 UTC from IEEE Xplore. Restrictions apply.
CHEN AND CHEN: UNIVERSAL APPROXIMATION TO NONLINEAR OPERATORS 917

If the system is linear, then E , v(y) can be simplified as [2] B. hie and S. Miyake, “Capacity of three-layered perceptrons,” in Proc.
IEEE ICNN 1, 1988, pp. 641448.
[3] G. Cybenko, “Approximation by superpositions of a sigmoidal func-
tion,” Math. Contr., Signals Syst., vol. 2, no. 4, pp. 303-314, 1989.
[4] S. M. Carroll and B. W. Dickinson, “Construction of neural nets using
radon transform,” in Proc. IJCNN Proc. I, 1989, pp. 607411.
[5] K. Funahashi, “On the approximate realization of continuous mappings
by neural networks,” Neural Networks, vol. 2, pp. 183-192, 1989.
[6] K. Homik, M. Stinchcombe, and H. White, “Multi-layer feedforward
networks are universal approximators,” Neural Networks, vol. 2, pp.
359-366, 1989.
N M m [7] K. Homik, “Approximation capabilities of multilayer feedforward net-
works,” Neural Networks, vol. 4,pp. 251-257, 1991.
IC=l i = l j = 1 [8] V. Y. Kreinovich, “Arbitrary nonlinearity is sufficient to represent all
functions by neural networks: A theorem,” Neural Networks, vol. 4, pp.
The larger the values of n, L , m are, the better accuracy we 381-383, 1991.
[9] T. Chen, H. Chen, and R. Liu, “A constructive proof of Cybenko’s
will obtain for this approximation. approximation theorem and its extensions,” pp. 163-168 in Proc. 22nd
Therefore, we have pointed to a way of constructing neural Symp. Interface, East Lansing, Michigan, May 1990. Also submitted for
network models for identifying dynamic systems. publication.
[ 101 I. W. Sandberg, “Approximation theorems for fiscrete-time systems,”
IEEE Trans. Circuits Syst., vol. 38, no. 5, pp. 564-566, May 1991.
VI. CONCLUSION [ 111 -, “Approximations for nonlinear functionals,” IEEE Trans. Cir-
cuits Syst., vol. 39, no. 1, pp. 65-67, Jan. 1992.
In this paper, the problem of approximating functions of [12] H. N. Mhaskar and C. A. Micchelli, “Approximation by superposition of
several variables, functionals, and nonlinear operators are sigmoidal and radial basis functions,” Advances Applied Mathematics,
vol. 13, pp. 350-373, 1992.
thoroughly studied. The necessary and sufficient condition [13] T. Chen and H. Chen, “Approximation to continuous functionals by
for a continuous function in S’(R1)to be qualified for an neural networks with application to dynamical systems,” accepted by
activation function is given, which is a broad generalization IEEE Trans. Neural Networks, vol. 4, no. 6, Nov. 1993.
[14] E. M. Stein and G. Weiss, Introduction to Fourier Analysis on Euclidean
of previous results [1]-[8], especially [12]. It is also pointed Spaces. Princeton, NJ: Princeton Univ. Press, 1971.
out that to prove neural network approximation capability, one [ 151 E. M. Stein, Singular Integrals and Differentiability Properties of Func-
needs only to treat the one dimensional case. As applications, tions. Princeton, NJ: Princeton Univ. Press, 1970.
[16] J. Diedonne, Foundation of Modern Analysis. New York and London:
we show how to construct neural networks to approximate the Academic, 1969, p. 142.
output of a dynamical system as a whole, not merely at a fixed [17] K. S. Narendra and K. Parthasarathy, “Identification and control of
point, thus show the capability of neural network in identifying dynamic systems using neural networks,” IEEE Trans. Neural Networks,
vol. 1, pp. 4-27, 1990.
dynamic systems. Moreover, we point out that using existing [ 181 -, “Gradient methods for optimization of dynamical systems con-
algorithms in literatures (for example, backpropagation algo- taining neural networks,” IEEE Trans. Neural Networks, vol. 2, pp.
rithm), we can determine those parameters in the network, 252-262, 1991.
[19] T. Chen, H. Chen, and R. Liu, “Approximation capability in CR by
i.e., identify the system. multilayer feedforward networks and related problems,” IEEE Trans.
Neural Networks, vol. 6, no. 1, Jan. 1995.
ACKNOWLEDGMENT
The authors wish to express their gratefulness to the review-
ers for their valuable comments and suggestions on revising Tianping Chen, for photograph and biography, please see this TRANSACTIONS,
this paper. p. 910.

REFERENCES

[ l ] A. Wieland and R. Leighten, “Geometric analysis of neural network Hong Chen, for photograph and biography, please see this TRANSACTIONS,
capacity,” in Proc. IEEE First ICNN. 1, 1987, pp. 385-392. p. 910.

Authorized licensed use limited to: Univ of Calif Merced. Downloaded on February 05,2025 at 19:07:48 UTC from IEEE Xplore. Restrictions apply.

You might also like