0% found this document useful (0 votes)

92 views7 pages

Universal Approximation To Nonlinear Operators by Neural Networks With Arbitrary Activation Functions and Its Application To Dynamical Systems

This paper investigates the capability of neural networks to approximate nonlinear operators and functionals using arbitrary activation functions. The main findings include that every nonpolynomial continuous function can serve as an activation function and that neural networks can approximate outputs of dynamical systems as a whole. The results have significant implications for system identification and computing outputs in dynamic systems.

Uploaded by

laoji Qian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

92 views7 pages

Universal Approximation To Nonlinear Operators by Neural Networks With Arbitrary Activation Functions and Its Application To Dynamical Systems

Uploaded by

laoji Qian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 6, NO.

4, JULY 1995 91 1

Universal Approximation to Nonlinear Operators

by Neural Networks with Arbitrary Activation
Functions and Its Application to Dynamical Systems
Tianping Chen and Hong Chen

Abstract- The purpose of this paper is to investigate neural Mhaskar and Micchelli [l I] showed that under some restric-
network capability systematically.The main results are: 1) every tion on the amplitude of a continuous function near infinity,
Tauber-Wiener function is qualified as an activation function any nonpolynomial function is qualified to be an activation
in the hidden layer of a three-layered neural network, 2) for a
continuous function in S’ (R’ ) to be a Tauber-Wiener function, function.
the necessary and sufficientcondition is that it is not a polynomial, It is clear that all the aforementioned works are concerned
3) the capability of approximating nonlinear functionals defined with approximation to a continuous function defined on a com-
on some compact set of a Banach space and nonlinear operators pact set in R” (a space of finite dimensions). In engineering
has been shown, which implies that 4) we show the possibility by
neural computation to approximate the output as a whole (not at problems such as computing the output of dynamic systems
a fixed point) of a dynamical system, thus identifying the system. or designing neural system identifiers, however, we often
encounter the problem of approximating nonlinear functionals
defined on some function space, even nonlinear operators from
I. INTRODUCTION
one function space (a space of infiinte dimensions) to another

T HERE have been many papers related to approximation

to a continuous function of several variables. In 1987,
Wieland and Leighton [ 11 dealt with the capability of networks
function space (another space of infinite dimensions). In
[ lo], Sandberg gave an interesting theorem on approximating
nonlinear functionals by superposition and composition of
consisting of one or two hidden layers. Miyake and Irie [2] several linear functionals and a continuous function of one
obtained an integral representation formula with an integral variable.
kernel fixed beforehand. This representation formula is a kind Yet, two problems remain open:
of integrals, which could be realized by a three-layered neural
1) Can we give those linear functionals explicitly?
network. In 1989, several papers related to this topic appeared.
2) Can we approximate nonlinear operators rather than
They all claimed that a three-layered neural network with
functionals?
sigmoid units on the hidden layer can approximate continuous
or other kinds of functions defined on a compact set in R”. Problem 1 is essential in application, since otherwise we are
They used different methods. Carrol and Dickinson [4]used not able to construct real networks. Problem 2 is important in
inverse Radon transform. Cybenko [3] used the Hahn-Banach computing dynamic systems, for a dynamic system is in fact
theorem and Riesz representation theorem. Funahashi [ 5 ] an operator. In [13], we discussed in detail the problem of
approximated Irie and Miyake’s integral representation by approximating nonlinear functionals defined on some compact
a finite sum, using a kernel which can be expressed as a set in C[a,b] or P [ a , b] and obtained some explicit results.
difference of two sigmoidal functions. Hornik et al. [6] applied The problem of a neural network’s capability in approximating
the Stone-Weierstrass theorem using trigonometric functions. nonlinear operators with its related application in computing
In all these papers, however, sigmoidal functions must be the output as a whole of a dynamic system, however, still
assumed to be continuous or monotone. Recently [9], [19] we remains open. Moreover, a unified and systematic treatment
pointed out that the boundedness of the sigmoidal function of neural network approximation to continuous functions,
plays an essential role for its being an activation function in functionals and operators is much needed but nevertheless also
the hidden layer. remains to be an open problem.
In addition to sigmoidal functions, many other functions Specifically, it is quite natural to raise the following issues:
can be used as activation functions in the hidden layer. For 1) What is the characteristic property for a continuous
example, Hornik [6] proved that any bounded nonconstant function to qualify as an activation function in the hidden
continuous function is qualified to be an activation function. layer of a neural network?
2) Can we give a neural network model to approximate
Manuscript received December 7, 1992; revised July 18, 1993 and July 10,
1994. This work was supported in part by the “Climbing Programme-National nonlinear functionals defined on some compact set in
Key Project for Fundamental Research in China” Grant NSC 92097. C ( K ) ,where K is some compact set in some Banach
T. Chen is with the Department of Mathematics, Fudan University, Shang- space?
hai, P. R. China.
H. Chen is with Sun Microsystems, Inc., Mountain View, CA 95051 USA. 3) Can we give a neural network model, which could be
IEEE Log Number 9409156. used to approximate the output of some dynamic system
1045-9227/95$04.00 0 1995 IEEE

Authorized licensed use limited to: Univ of Calif Merced. Downloaded on February 05,2025 at 19:07:48 UTC from IEEE Xplore. Restrictions apply.
912 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 6, NO. 4, JULY 199.5

as a whole (not merely at a special point, cf. [lo], [13]), S'(Rn) Tempered distributions, i.e., linear continuous
thus to identify the system? functionals defined on S(R")
In this paper, we systematically give strong results regarding C'"(R" Infinitely differentiable functions
these issues. CF (R" Infinitely differentiable functions with
The paper is organized as follows. In Section 11, we review compact support in R"
some definitions and notations. In Section 111, we show that the C,[-l, 1" All periodic functions with period two with
necessary and sufficient condition for a continuous function in respect to every variable I C ~ ,i = 1,. . . , n.
S'(R1) (tempered distributions in RI) to be a Tauber-Wiener
function (for definitions, see Section 11) is that it is not a
FUNCTIONS
111. CHARACTERISTICS OF ACTIVATION
polynomial, and any Tauber-Wiener function can be used
as an activation function, i.e., any nonpolynomial continuous In this section, we prove three theorems.
function in S ' ( R 1 ) is an activation function. What is more Theorem I : Suppose that 9 is a continuous function, and
interesting is that we show the approximation is equiuniform g E S'(R1), then g E (TW), if and only if g is not a
on any compact set in C ( K ) ,which is crucial in discussing polynomial.
approximation to continuous operators by neural networks. Theorem 2: If CT is a bounded sigmoidal function, then
In Section IV, we show the capability of neural networks to CT E (TW).
approximate continuous functionals defined on some compact Theorem 3: Suppose that K is a compact set in Rn, U is
set in C ( K ) , where K is a compact set in some Banach a compact set in C ( K ) ,g E (TW), then for any t > 0, there
space and through which we establish the capability of neural exist a positive integer N , real numbers O i , vectors wi E R",
networks to approximate continuous operators from C (K1) i = l , . .. ,N , which are independent of f E C ( K ) and
to C(K2). The main results in Section IV have potential constants ci (f), 1: = 1,. . . ,N depending on f , such that
applications to computing outputs of dynamic systems and
identifying the systems. This is an important issue in system
identification [17] and [18], and we will discuss it in more
detail in Section V.
holds for all IC E K and f E U. Moreover, each c , ( f ) is a
AND DEFINITIONS
11. NOTATIONS linear continuous functional defined on U.
Remark I : Theorem 3 shows that for a function (continuous
Definition I : A function CT : R1 -+ R1 is called a sigmoidal or discontinuous) to be qualified as an activation function, a
function, if it satisfies sufficient condition is that it belongs to TW class. Therefore,
to prove that a neural network is capable of approximating any
.(IC)
{ limz+-m
limz+m
= 0,
.(IC) = 1. continuous function of n variables, all we need to do is to deal
with the case n = 1, thus we have reduced the complexity
Definition 2: If a function g : R -+ R (continuous of the problem in terms of its dimensionality. Moreover,
or discontinuous) satisfies that all the linear combinations by examining the approximated function f ( 1 ~ 1.,. . , IC,) =
N
c,g(X,z+8,), A, E R, 8, E R, C , E R, i = 1, 2, . . . , N , f(z1, O ; . . , 0) = f * ( 1 ~ 1 ) ,where f * ( z l ) is a continuous
are dense in every C[a,b], then g is called a Tauber-Wiener function of one variable, it is straightforward to see that the
(TW) function. condition is also a necessary one.
Definition 3: Suppose that X is a Banach space, V C X is Remark 2: The equiuniform convergence property in The-
called a compact set in X , if for every sequence { T ~ } Fwith ?~ orem 3 will play a crucial role in approximation to nonlinear
all T , E V, there is a subsequence { I C , ~ } , which converges operators by neural networks.
to some element z E V. Remark3: When a sigmoidal function is used as an ac-
It is well known that if V G X is a compact set in X , then tivation in a neural network, Theorem 2 shows that the only
for any S > 0, there is a S-net N ( 6 ) = { X I , . . . , I C , ( ~ ) } , with necessary condition imposed on is its boundedness. In contrast,
all IC, E V, i = 1,.. . , n(S), i.e., for every z E X . there is in almost all other papers [ 11-[SI, sigmoidal functions must be
some z, E N ( 6 ) such that 1) 2, - 2 I I x < S. assumed to be either continuous or monotone.
In the sequel, we will often use the following table notations. Remark 4: In [12], some result similar to Theorem I was
obtained under more restrictions imposed on g, i.e., there
X some Banach space with norm 1) . 11s are positive integer N and a constant C N , such that 1(1 +
R" Euclidean space of dimension n lsl)-Ng(x)l 5 C for all IC E RI. This restriction is essential
K some compact set in a Banach space for [12], for the proof in [I21 depends heavily on a variation
C(K) Banach space of all continuous functions of Paley-Wiener Theorem. In Theorem 1, however, we only
defined on K , with norm assume that g E C ( R 1 )n S'(R"), which is weaker than the
II f llC(K)= maXzEK If(.)l assumptions used in 1121.
(TW) All the Tauber-Wiener functions Proof of Theorem I : We will prove by contradiction. If
S(R") Schwartz functions in tempered distribution the set of all the linear combinations E,"& c,g(X,z + 19,) is
theory, i.e., rapidly decreasing and infinitely not dense in C [ a ,b ] , then Hahn-Banach extension theorem
differentiable functions and Riesz representation of linear continuous functionals show

Authorized licensed use limited to: Univ of Calif Merced. Downloaded on February 05,2025 at 19:07:48 UTC from IEEE Xplore. Restrictions apply.
~

CHEN AND CHEN: UNIVERSAL APPROXIMATION TO NONLINEAR OPERATORS 913

that there is a signed Bore1 measure dp with supp(dp) C [a, b] \.(U) - 11 < 1/M2; if U < -W, then Io(u)I < 1/M2. Let
and K > 0 be such that K . (1/2M) > W. Construct

g(z) = f ( - M ) o ( K ( z - t-1))
N

for all X # 0 and 0 E R1.Take any w E S(R1),then + C[f(.i)- f(zi-l)l.(K(z - ti-1)) (9)
i=l

then we can prove

Let Xz + 0 = U and change order of integration, we have 1g(z) - f ( z ) l < E forallz E [-I, 11. (IO)

Theorem 2 is thus proved.

h o r to proving Theorem 3, we need to establish the
which is equivalent to following lemmas.
Lemma 1: Suppose that K is a compact set in R”,
g ( ? ; ( . ) i p ( X . ) )= 0 (5) f E C ( K ) , then there is a continuous function E ( f ) E
C(R”),such that 1) f ( z ) = E ( f ) ( z )for all z E K , 2)
where g represents Fourier transform of g in the sense of
tempered distribution, and ( 5 ) is also understood in the sense SUPz.Rn I E ( f ) ( x ) I 5 SUP,EK I f ( z ) l , and 3 ) there is a
constant c such that
of distribution [14]. In order that the left*and side of ( 5 )
makes sense, we have to show that ‘Li)(t)dp(Xt) E S(R1).
Since supp(dp) C [a,b], it is straightforward to show that
&(t) E Cm(R1) and for each IC = 1,2,.. . , there is a constant
C k such that

Proof: The proof of Lemma 1 can be found in [15, p.

1751.
Consequently, ~(t)&(g E s(R~). Lemma 2 [16]: Suppose K is a compact set in Banach
Since d p 8 0 and d&) E C”(R1),hence there exists space X , then V is a compact set in C ( K ) ,if and only if
some t o # 0 with some neighborhood (to- 6,to 6) such + 1) V is a closed set in C ( K ) .
+
that &(t) # 0 for all t E ( t o - 6 , t o 6). Now, if tl # 0, let 2) There is a constant M , such that I l f ( z ) l l c c ~I , M for
X = to/tl, then & ( A t ) # 0 for all t E (tl - S / X , t * S/X). all f E V.
Take any ‘Li) E C?(to - S/2X, t o +S/2X), then ‘Li)(t)/dp(Xt)E 3) V is equicontinuous, i.e., for any E > 0, there is a S > 0
such that If(z’) - f ( z ” ) l < E for all f E V, provided
S ( W , and by ( 5 )
that z’, z” E K and I( z’- z” IIx < S.
Lemma 3: Suppose that K is a compact set in I“ = [0, l]”,
(7)
V is a compact set in C ( K ) ,then V can be extended to a
compact set in Cp[-l, 11”.
The previous argument shows that for any fixed point t*, Proof: By Lemmas 1 and 2, V can be extended to be a
there is a neighborhood [t*-q,t*+7]such that g(Q(.)) = 0 for compact set VI in C[0,11”. Now, for every f E VI, define an
all w with compact support It* -7, t*+7], i.e., supp(g) (0). even extension of f as follows
By the distribution theory, g is some linear combination of S-
Dirac function and its derivatives, which is equivalent to that f * ( z 1 , “ ’ , z k , “ ’ , 2”) = f ( z 1 , ” ’ , - x k , ” ’ , 2”) (12)
g is a polynomial. Theorem 1 is proved.
The proof of Theorem 2 can be found in [9] and [19]. Here then U = { f * : f E V I } is the required compact set in
we only give a brief proof for the completeness of this paper. C,[-1, 11%.
Proof of Theorem 2: Without loss of generality, we Lemma 4: Suppose that U is a compact set in C,[-l, 11‘
can assume that [a, b] = [O, 11. Since f is continuous on
[-1, 11 for any e > 0, there is an integer M > 0, such that
If(.’) - f ( z ” ) l < e/4, provided that z ’ , ~ ’ ’ E [-1, 11 and
12’ - z’’1 < l/M.
Divide [-1, 11 into 2M equal segments, each with length is the Bochner-Riesz means of Fourier series of f , where
of 1/M. Let m = ( m l , . . . , m n ) ,lmI2 = lmiI2, c,(f) are Fourier
coefficients of f , then for any e > 0, there is R > 0 such that
-1 = zo<z1< . . . < X & f = 0
<ZM+l< .’. <22M = 1 (8) 1Bd.f; z) - f(.)l < E (14)

+
and t; = 1/2(zi zi+l),t - 1 = -1 - 1/2M. From the forevery fEUandzE[-l,l]“providedthat a > ( n - 1)/2.
assumption, there exists W > 0, such that if U > W, then The proof of Lemma 4 can be found in [14].

Authorized licensed use limited to: Univ of Calif Merced. Downloaded on February 05,2025 at 19:07:48 UTC from IEEE Xplore. Restrictions apply.
914 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 6, NO. 4, JULY 1995

Proof of Theorem 3: Without loss of generality, we can is a continuous functional defined on U (and also a continuous
assume that K C [O, 11". By Lemma 3 , we can assume that functional defined on V), and c i ( f ) , being a finite linear
K = [-1, 11" and U 5 Cp[-l, 11". By Lemma 4, for any combination of the Fourier coefficients of f * , is surely a
E > 0, there exists R > 0, such that for any x = (21,. . . , 2
)
, E continuous functional defined on V. The proof of Theorem
[-1, 11" and f E U, there holds 3 is completed.

Iv. APPROXIMATIONTO NONLINEAR

CONTINUOUS FUNCTIONALS AND MAPS
In this section, we will discuss the problem of approximat-
ing nonlinear continuous functionals and operators by neural
network computation. The main results are as follows.
I ' Theorem 4: Suppose that g E (TW), X is a Banach Space,
K C X is a compact set, V is a compact set in C ( K ) ,f is a
By the definition of the Fourier coefficients and evenness of continuous functional defined on V, then for any E > 0, there
f*(z),we can rewrite (15) as are an positive integerN, m points 51, . . . , x m E K , and real
constants c i , B i , tij,i = I, . . . , N, j = 1, . . . ,m., such that
..
dml...mn c o s ( ~ ( m l x 1 + + m,~,)) I N / m \I
ImlSR

holds for all U E V.

where d,l,,.m, are real numbers. It is obvious that for every
Theorem 5: Suppose that g E (TW), X is a Banach Space,
K1 C X , K2 C R" are two compact sets in X and R",
x E [-1, l]",there is a unique U E [ - f i r R , f i r R ] , such
respectively, V is a compact set in C ( K 1 ) ,G is a nonlinear
that
continuous operator, which maps V into C(K2), then for
U = r m . x = r(m121 + . . . + mnx,) (17) any t > 0 , there are positive integers M , N, m. constants
c t , [k, ( f j E R, points w k E R n , x J E K1, i = l , . . . , M ,
where m = ( m l ,... , m,). Since COS(U) is a continuous
IC = l , . . . ,N , j = I , . . . , m, such that
function in [ - f i r R , f i r R I and g E (TW), we can find
an integer M , real numbers s j , qj and <j, j = 1,. . , M , e I N M / m \
such that
M
(18)

holds for all U E V and y E K z .

The following two lemmas are well known and will be used
in the proof of Theorems 4 and 5.
Lemma 5: Let X be a Banach Space and K C X , then K
is a compact set if and only if the following two conditions
are satisfied simultaneously: 1) K is a closed set in X , and
2) for any S > O , there is a &net N ( 6 ) = { X I , . . . , x,(q},
i.e., for any x E K , there constitute an Xk E N ( 6 ) such that
11 z - xk (IX < 6.
(I.(z) N
-XCi(f*)f*(Wi
i=l
.x 8i)+ < E (20)
Lemma 6: If V C C ( K ) is a compact set in C ( K ) ,then
it is uniformly bounded and equicontinuous, i.e., 1) there is
A > O such that )I U(.) Ilc(K)< A for all U E V, and 2) for
any E > 0, there is 6 > 0 such that Iu(x') - u(z")I < E for all

if(r) -
N

i=l
C Z ( f ) d ~ .x
i + Oil <E
U E V, provided that 11 z' - x" I(,y < S.
Now pick a sequence € 1 > c2 > . . . > E , -+ 0 , then we
can find another sequence 6 1 > Sz < . . * > 6, 4 0, such that
If(.) - !('U)\ < Ek for all f E V , provided that U , 'U E andv
(1 U - 'U /Ic(K) < 2 6 k , for f is a continuous functional defined
on a compact set V.
By Lemma 6, we can also find q1 > 7 2 > . . . qn -+ 0 such
that Iu(z' ) - u(x")J < SI,for all U E V, whenever x', IC'E 'K
and I( Z' - 2'' 1lX < V k .

Authorized licensed use limited to: Univ of Calif Merced. Downloaded on February 05,2025 at 19:07:48 UTC from IEEE Xplore. Restrictions apply.
915

By induction and rearrangement, we can find a sequence Consequently

{xi}zl with each xi E K and a sequence of positive integers
n(Vi) < n ( ~ 2 <
) . . . < VI,) + 00, such that the first n ( q ~ , ) 11 U(x) - % k ).( IIx
elements N(vI,)= (21, . . . ,xn(,,)} is an Vk-net in K. n(m)
For each qk-net, define functions 5 SI, T,,,j(x) = SI, for all U E V. (32)
j=1

3) Suppose ( ~ 2 is ) a~ sequence
~ in V * . If there is a
subsequence { ~ ' l } f ? ~ of { u ~ with } ~ all uir
~ E V,
and
I = 1, . , then by the fact that V is compact, there
is a subsequence of { u i l } & , which converges to some
U E V. Otherwise, to each U ' , there corresponds a
positive integer IC(i) and a vi E V such that ui = ut,( .
There are two possibilities: i) We can find infinite
j=1 and a fixed ko such that q ~ , ( =~ ~VI,(^^) ) = ... - -
for j = l,...,n(qk) . It is easy to verify that {Tvk,j(x)}
is a = . . . = qko, i.e., uil E Vqk0for all i l . By
partition of unity, i.e., proposition 1) of this lemma, Vvk0is a compact set, there
is a subsequence of { w ~ , ( ~ ) which
}, converges to some
U E V&, i.e., there is a subsequence of { U ' } converging
to w E Vqk0. ii) There are sequences il < i 2 < . . . + cc
T,k,J(x)= 1 (26) and IC(i1) < IC(i2) < ... + cc such that uif E V,E(I1).
j=1
Let u Z ~ E V be such that
T,,,3(x)
=0 if II 2 - x J Ilx > VI,. (27)
For each U E V, define a function
n(l)k)

UVk(2) = 4x63)T,kJ(x) (28) Since v21 E V and V is compact, we see that there is a
j=1 subsequence of {vzl}El, which converges to some v E
V. By proposition 2) of this lemma, the corresponding
and let V,, = {uOk: U E V } and V* = V U (ur=.=,V,,).We
subsequence of {u2l}& also converges to W. Thus the
then have the following result.
compactness of V * is proved.
Lemma 7:
Proof of Theorem 4: By the Tietze Extension Theorem,
1) For each fixed IC, V,, is a compact set in a subspace of
we can define a continuous functional on V* such that
dimension VI,) in C ( K ) .
2) For every U E V, there holds f * ( z )= f ( x ) if z E V. (34)
Because f * is a continuous functional defined on the compact
11 Ilc(K) (29)
set V * , therefore for any F > 0, we can find a S > 0 such
3) V* is a compact set in C ( K ) . that If*(u)- f*(v)1 < ~ / provided
2 that U , 2' E V* and
Proof: We will prove the three propositions individually II U - IlC(K) <S.
as follows. Let IC be fixed such that SI, < 6,then by (29) for every U E V
For a fixed k , let U!?, a = 1, 2, . . . , be a sequence in II 'U. - Uq, IIS < SI, (35)
V,, and u ( ~be) a sequence in V, such that which implies
n(l)k)
If*(U) - f*(%,)l< (36)
= u(~)(x~)T,,,~(x).
(30)
for all U E V.
j=1
By proposition 1) of Lemma 7, we see that f*(u,,) is
Since V is a compact set in C ( K ) ,there is a subse- a continuous functional defined on the compact set V,, in
quence ~ ( ~ " ( xwhich
), converges to some U E V, then Rn("). By Theorem 3, we can find N , c,, E Z j , O,, i =
it is obvious that u$;)(x) converges to u,,(x)E V,,, 1,... , N , j = 1,.. . , VI,), such that
i.e., V,, is a compact subset in C ( K ) .
By the definition and the property of unity partition, we
have
n(qk)
Combining it with (36), we conclude that
4.) - U,!% (x)= [4.) - 473)lT,k,j(4
j=1 I N / m \I
= [U(.) -441T,k,j(4.
IIl-zj IIx
(31) where m = VI,). Thus, Theorem 4 is proved.

Authorized licensed use limited to: Univ of Calif Merced. Downloaded on February 05,2025 at 19:07:48 UTC from IEEE Xplore. Restrictions apply.
916 E E E TRANSACTIONS ON NEURAL NETWORKS, VOL. 6, NO. 4, JULY 1995

1
N
G ( u ) ( y )- ck(G(U))g(Wk ' y + Ck) < (39)
k=l

holds for all IC = 1,.. . , N and U E V, where I

INPUT 1 "(XI
N
Fig. 1. A neural network architecture of approximationto nonlinear operator
G ( u ) ( y ) based on Theorem 5.

Substituting (40) into (39), we obtain that

The significance of these results lies in that we show
potential application of neural network computation in system
identification. A general procedure that we suggest in this
paper is as follows.
Let a system be V = KU, where U is the input, V is the
.g ( W k . y + G)< 6 (42) output, and K is to be identified.
Suppose that based on some prior knowledge or exper-
iments, we know several input-output relationships VI =
KU1, . . . , V, = KU,. Generally, they can be expressed by
discrete data sets { u S ( x j ) , s = I , . . . , n , j = l , . . . ,m } ,
{vs(gl), s = 1,.. . , n , j = 1 , .. . , L } . Using these data, and
by Theorem 5, we can construct a functional
L n I N M

' ( g & U s ( x j ) +0:)g(wk.y1 +<k)i2. (44)

holds for all U E V and y E K2. This completes the proof
of Theorem 5.
A neural network architecture based on Theorem 5 is shown <
Parameters Ct, S,",,, 6:, W k , could be determined by mini-
in Fig. 1. mizing E (by using, for example, backpropagation algorithm,
its improved variations, or other popular algorithms). Then
v. APPLICATION
TO NONLINEAR
DYNAMICAL
SYSTEMS based on Theorem 5
In [12], we discussed the problem of approximating the N M / m

output of a dynamical system at a fixed point (or time) by

neural networks. As a direct application of Theorem 5, we
can use neural networks to approximate the output as a whole
of a nonlinear dynamical system. Indeed, built upon the several
keystone theorems proved earlier in Section IV, our result on can be viewed as an approximant of V(y) = ( K U ) ( y ) and
,
this topic follows naturally. thus system K can be identified.

Authorized licensed use limited to: Univ of Calif Merced. Downloaded on February 05,2025 at 19:07:48 UTC from IEEE Xplore. Restrictions apply.
CHEN AND CHEN: UNIVERSAL APPROXIMATION TO NONLINEAR OPERATORS 917

If the system is linear, then E , v(y) can be simplified as [2] B. hie and S. Miyake, “Capacity of three-layered perceptrons,” in Proc.
IEEE ICNN 1, 1988, pp. 641448.
[3] G. Cybenko, “Approximation by superpositions of a sigmoidal func-
tion,” Math. Contr., Signals Syst., vol. 2, no. 4, pp. 303-314, 1989.
[4] S. M. Carroll and B. W. Dickinson, “Construction of neural nets using
radon transform,” in Proc. IJCNN Proc. I, 1989, pp. 607411.
[5] K. Funahashi, “On the approximate realization of continuous mappings
by neural networks,” Neural Networks, vol. 2, pp. 183-192, 1989.
[6] K. Homik, M. Stinchcombe, and H. White, “Multi-layer feedforward
networks are universal approximators,” Neural Networks, vol. 2, pp.
359-366, 1989.
N M m [7] K. Homik, “Approximation capabilities of multilayer feedforward net-
works,” Neural Networks, vol. 4,pp. 251-257, 1991.
IC=l i = l j = 1 [8] V. Y. Kreinovich, “Arbitrary nonlinearity is sufficient to represent all
functions by neural networks: A theorem,” Neural Networks, vol. 4, pp.
The larger the values of n, L , m are, the better accuracy we 381-383, 1991.
[9] T. Chen, H. Chen, and R. Liu, “A constructive proof of Cybenko’s
will obtain for this approximation. approximation theorem and its extensions,” pp. 163-168 in Proc. 22nd
Therefore, we have pointed to a way of constructing neural Symp. Interface, East Lansing, Michigan, May 1990. Also submitted for
network models for identifying dynamic systems. publication.
[ 101 I. W. Sandberg, “Approximation theorems for fiscrete-time systems,”
IEEE Trans. Circuits Syst., vol. 38, no. 5, pp. 564-566, May 1991.
VI. CONCLUSION [ 111 -, “Approximations for nonlinear functionals,” IEEE Trans. Cir-
cuits Syst., vol. 39, no. 1, pp. 65-67, Jan. 1992.
In this paper, the problem of approximating functions of [12] H. N. Mhaskar and C. A. Micchelli, “Approximation by superposition of
several variables, functionals, and nonlinear operators are sigmoidal and radial basis functions,” Advances Applied Mathematics,
vol. 13, pp. 350-373, 1992.
thoroughly studied. The necessary and sufficient condition [13] T. Chen and H. Chen, “Approximation to continuous functionals by
for a continuous function in S’(R1)to be qualified for an neural networks with application to dynamical systems,” accepted by
activation function is given, which is a broad generalization IEEE Trans. Neural Networks, vol. 4, no. 6, Nov. 1993.
[14] E. M. Stein and G. Weiss, Introduction to Fourier Analysis on Euclidean
of previous results [1]-[8], especially [12]. It is also pointed Spaces. Princeton, NJ: Princeton Univ. Press, 1971.
out that to prove neural network approximation capability, one [ 151 E. M. Stein, Singular Integrals and Differentiability Properties of Func-
needs only to treat the one dimensional case. As applications, tions. Princeton, NJ: Princeton Univ. Press, 1970.
[16] J. Diedonne, Foundation of Modern Analysis. New York and London:
we show how to construct neural networks to approximate the Academic, 1969, p. 142.
output of a dynamical system as a whole, not merely at a fixed [17] K. S. Narendra and K. Parthasarathy, “Identification and control of
point, thus show the capability of neural network in identifying dynamic systems using neural networks,” IEEE Trans. Neural Networks,
vol. 1, pp. 4-27, 1990.
dynamic systems. Moreover, we point out that using existing [ 181 -, “Gradient methods for optimization of dynamical systems con-
algorithms in literatures (for example, backpropagation algo- taining neural networks,” IEEE Trans. Neural Networks, vol. 2, pp.
rithm), we can determine those parameters in the network, 252-262, 1991.
[19] T. Chen, H. Chen, and R. Liu, “Approximation capability in CR by
i.e., identify the system. multilayer feedforward networks and related problems,” IEEE Trans.
Neural Networks, vol. 6, no. 1, Jan. 1995.
ACKNOWLEDGMENT
The authors wish to express their gratefulness to the review-
ers for their valuable comments and suggestions on revising Tianping Chen, for photograph and biography, please see this TRANSACTIONS,
this paper. p. 910.

REFERENCES

[ l ] A. Wieland and R. Leighten, “Geometric analysis of neural network Hong Chen, for photograph and biography, please see this TRANSACTIONS,
capacity,” in Proc. IEEE First ICNN. 1, 1987, pp. 385-392. p. 910.

Authorized licensed use limited to: Univ of Calif Merced. Downloaded on February 05,2025 at 19:07:48 UTC from IEEE Xplore. Restrictions apply.

Year 1 Math Exam
No ratings yet
Year 1 Math Exam
9 pages
Chassis Control Systems
100% (2)
Chassis Control Systems
312 pages
Condenser Cladding Info
0% (1)
Condenser Cladding Info
37 pages
Activation Function
No ratings yet
Activation Function
4 pages
Neural Networks
No ratings yet
Neural Networks
37 pages
Perio Instruments
100% (3)
Perio Instruments
32 pages
Work Sampling
100% (1)
Work Sampling
69 pages
Activation Functions
No ratings yet
Activation Functions
11 pages
Iph750 Hydraulic Piling Hammer and Rig: Impact-Power Hydraulics Sdn. BHD
100% (1)
Iph750 Hydraulic Piling Hammer and Rig: Impact-Power Hydraulics Sdn. BHD
4 pages
Deep Neural Network Approximation Theory
No ratings yet
Deep Neural Network Approximation Theory
80 pages
Group 2 - How Does Music Impact Plant Growth
No ratings yet
Group 2 - How Does Music Impact Plant Growth
5 pages
Fabrice Rossi, Brieuc Conan-Guez and Francois Fleuret - Theoretical Properties of Functional Multi Layer Perceptrons
No ratings yet
Fabrice Rossi, Brieuc Conan-Guez and Francois Fleuret - Theoretical Properties of Functional Multi Layer Perceptrons
6 pages
NeurIPS 2021 Towards Lower Bounds On The Depth of Relu Neural Networks Paper
No ratings yet
NeurIPS 2021 Towards Lower Bounds On The Depth of Relu Neural Networks Paper
13 pages
Lu DeepONet NMachineIntell21
No ratings yet
Lu DeepONet NMachineIntell21
15 pages
Deep ONet
No ratings yet
Deep ONet
22 pages
Neural Networks
No ratings yet
Neural Networks
37 pages
Neural Networks Fail To Learn Periodic Functions and How To Fix It
No ratings yet
Neural Networks Fail To Learn Periodic Functions and How To Fix It
22 pages
14 Deep
No ratings yet
14 Deep
6 pages
Artificial Neural Networks and Deep Learning
No ratings yet
Artificial Neural Networks and Deep Learning
22 pages
Sourangshu Ghosh IISc Bangalore Mathematical Foundations of Deep Learning Version 5+
No ratings yet
Sourangshu Ghosh IISc Bangalore Mathematical Foundations of Deep Learning Version 5+
713 pages
0820 - 2000 Benitez NN
No ratings yet
0820 - 2000 Benitez NN
3 pages
4 - Activation Functions in Neural Networks
No ratings yet
4 - Activation Functions in Neural Networks
12 pages
ML Lec-22
No ratings yet
ML Lec-22
25 pages
DSA5105 Lecture5
No ratings yet
DSA5105 Lecture5
52 pages
A Survey On UATs
No ratings yet
A Survey On UATs
10 pages
Two Applications of Deep Learning in The Physical Layer of Communication Systems
No ratings yet
Two Applications of Deep Learning in The Physical Layer of Communication Systems
10 pages
Approximation With Artificial Neural Network - MSC Thesis 2001 PDF
No ratings yet
Approximation With Artificial Neural Network - MSC Thesis 2001 PDF
45 pages
Cybenko 1989 Aka Neural Network Can Approximate Continuous Functions PDF
No ratings yet
Cybenko 1989 Aka Neural Network Can Approximate Continuous Functions PDF
12 pages
NeurIPS 2020 Neural Networks Fail To Learn Periodic Functions and How To Fix It Paper
No ratings yet
NeurIPS 2020 Neural Networks Fail To Learn Periodic Functions and How To Fix It Paper
12 pages
Apicella Et Al. 2019 - A Simple and Efficient Architecture For Trainable Activation Functions
No ratings yet
Apicella Et Al. 2019 - A Simple and Efficient Architecture For Trainable Activation Functions
15 pages
Neural Network Theory22
No ratings yet
Neural Network Theory22
60 pages
(2020) Kidger, Lyons (Proc. Mach. Learn. Res.)
No ratings yet
(2020) Kidger, Lyons (Proc. Mach. Learn. Res.)
22 pages
1 s2.0 S1474667017477378 Main
No ratings yet
1 s2.0 S1474667017477378 Main
24 pages
(1991) Hornik (Neural Netw.)
No ratings yet
(1991) Hornik (Neural Netw.)
7 pages
Introduction To Radial Basis Function Networks
No ratings yet
Introduction To Radial Basis Function Networks
45 pages
TFM Lichtner Bajjaoui Aisha
No ratings yet
TFM Lichtner Bajjaoui Aisha
18 pages
Klqgceb Ewvhja SC
No ratings yet
Klqgceb Ewvhja SC
8 pages
Multilayer Feedforward Networks Are Universal Approximators PDF
No ratings yet
Multilayer Feedforward Networks Are Universal Approximators PDF
8 pages
Deep Neural Network Structures Solving Variational Inequalities
No ratings yet
Deep Neural Network Structures Solving Variational Inequalities
26 pages
Random Fully Connected Neural Networks As Perturbatively Solvable Hierarchies
No ratings yet
Random Fully Connected Neural Networks As Perturbatively Solvable Hierarchies
58 pages
A Proposal of Neural Network Architecture For Non-Linear Function Approximation
No ratings yet
A Proposal of Neural Network Architecture For Non-Linear Function Approximation
4 pages
Applications of ANN
No ratings yet
Applications of ANN
19 pages
1998-scarselli-NN - Universal Approximation Using Feedforward Neural Networks A Survey of Some Existing Methods, and Some New Results PDF
No ratings yet
1998-scarselli-NN - Universal Approximation Using Feedforward Neural Networks A Survey of Some Existing Methods, and Some New Results PDF
23 pages
The Mathematics of Kolmogorov-Arnold-Networks
No ratings yet
The Mathematics of Kolmogorov-Arnold-Networks
26 pages
Perceptron in Machine Learning
No ratings yet
Perceptron in Machine Learning
11 pages
1 s2.0 S0925231215017774 Main
No ratings yet
1 s2.0 S0925231215017774 Main
9 pages
Chen 1990
No ratings yet
Chen 1990
25 pages
ML Mentorship Prahitha Movva V1
No ratings yet
ML Mentorship Prahitha Movva V1
5 pages
Activatn FN 2
No ratings yet
Activatn FN 2
10 pages
Neural Tangent Kernel
No ratings yet
Neural Tangent Kernel
21 pages
Activation FN
No ratings yet
Activation FN
15 pages
Approximation Capability To Functions of Several Variables Nonlinear Functionals and Operators by Radial Basis Function Neural Networks
No ratings yet
Approximation Capability To Functions of Several Variables Nonlinear Functionals and Operators by Radial Basis Function Neural Networks
7 pages
Kurkova Kolmogorov's Theorem and Multilayer Neural Networks
No ratings yet
Kurkova Kolmogorov's Theorem and Multilayer Neural Networks
6 pages
Artificial Neural Artificial Neural Networks
No ratings yet
Artificial Neural Artificial Neural Networks
40 pages
Activation Function in NN
No ratings yet
Activation Function in NN
29 pages
Aproximação Por Redes Neurais - Prova para LP e Funções Contínuas Quando Se Usa Trigonometrica
No ratings yet
Aproximação Por Redes Neurais - Prova para LP e Funções Contínuas Quando Se Usa Trigonometrica
10 pages
ANNs
No ratings yet
ANNs
57 pages
13-Nonlinear Dynamic System Identification Using
No ratings yet
13-Nonlinear Dynamic System Identification Using
25 pages
Activation Functions in Neural Networks - 241102 - 224129
No ratings yet
Activation Functions in Neural Networks - 241102 - 224129
7 pages
On The Universal Approximation of Real Functions With Varying Domain
No ratings yet
On The Universal Approximation of Real Functions With Varying Domain
10 pages
Chapter 2 Exercises and Answers: Answers Are in Blue
No ratings yet
Chapter 2 Exercises and Answers: Answers Are in Blue
6 pages
Psychol Limits NN
No ratings yet
Psychol Limits NN
11 pages
Basis Worksheet
No ratings yet
Basis Worksheet
52 pages
Computer Science Annotation Paraphrasing Vihaan Malpani
No ratings yet
Computer Science Annotation Paraphrasing Vihaan Malpani
14 pages
Calculating Frequency Bias Setting
100% (1)
Calculating Frequency Bias Setting
5 pages
Approximation of Functionals On Korobov Spaces
No ratings yet
Approximation of Functionals On Korobov Spaces
19 pages
Digital Filter Design (FIR) Using Frequency Sampling Method: Abstract
No ratings yet
Digital Filter Design (FIR) Using Frequency Sampling Method: Abstract
10 pages
Multiflex Assembly Instructions
No ratings yet
Multiflex Assembly Instructions
52 pages
T7S 1250 Pr332/P Lsirc in 1250A 4P F F: General Information
No ratings yet
T7S 1250 Pr332/P Lsirc in 1250A 4P F F: General Information
3 pages
Class 10 2019 Science Set 2
No ratings yet
Class 10 2019 Science Set 2
11 pages
Uncertainty in Humidity Measurements PDF
No ratings yet
Uncertainty in Humidity Measurements PDF
48 pages
Custom Reports Design Manual: Micros
No ratings yet
Custom Reports Design Manual: Micros
58 pages
Rotational Motion - Torque and Center of Gravity
No ratings yet
Rotational Motion - Torque and Center of Gravity
39 pages
ITF24-DS-Assignment #1
No ratings yet
ITF24-DS-Assignment #1
3 pages
I - V Converter
No ratings yet
I - V Converter
4 pages
Chapter 4 Bending Part 1
No ratings yet
Chapter 4 Bending Part 1
35 pages
Ensayos de Permeabilidad
No ratings yet
Ensayos de Permeabilidad
27 pages
Bes 026 - P2
No ratings yet
Bes 026 - P2
51 pages
Hyperbola
No ratings yet
Hyperbola
2 pages
E GMAT SC Complete StudyPlan
No ratings yet
E GMAT SC Complete StudyPlan
6 pages
Switch On - Worksheets - 3 - Amer 6
No ratings yet
Switch On - Worksheets - 3 - Amer 6
1 page
Lab Module 1.2 - NMK 11103
No ratings yet
Lab Module 1.2 - NMK 11103
10 pages
Software Design and Architecture: Week 4 A Case Study: Designing A Document Editor - Lexi
No ratings yet
Software Design and Architecture: Week 4 A Case Study: Designing A Document Editor - Lexi
38 pages
Final IEEEversion
No ratings yet
Final IEEEversion
7 pages
Charges Q (1) 1.5 MC, Q (2) 0.2 MC and Q (3) - 0.5 MC, Are Placed at
No ratings yet
Charges Q (1) 1.5 MC, Q (2) 0.2 MC and Q (3) - 0.5 MC, Are Placed at
1 page
The Graphical Interpretation of The Function Properties: Increasing, Decreasing, and Constant Functions
No ratings yet
The Graphical Interpretation of The Function Properties: Increasing, Decreasing, and Constant Functions
3 pages
Enhancing Deep Learning Performance Using Displaced Rectifier Linear Unit
From Everand
Enhancing Deep Learning Performance Using Displaced Rectifier Linear Unit
David Macêdo
No ratings yet
Nonlinear Transformations of Random Processes
From Everand
Nonlinear Transformations of Random Processes
Ralph Deutsch
No ratings yet
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
From Everand
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
Fouad Sabry
No ratings yet
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
From Everand
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
Fouad Sabry
No ratings yet

Universal Approximation To Nonlinear Operators by Neural Networks With Arbitrary Activation Functions and Its Application To Dynamical Systems

Uploaded by

Universal Approximation To Nonlinear Operators by Neural Networks With Arbitrary Activation Functions and Its Application To Dynamical Systems

Uploaded by

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 6, NO.

Universal Approximation to Nonlinear Operators

T HERE have been many papers related to approximation

CHEN AND CHEN: UNIVERSAL APPROXIMATION TO NONLINEAR OPERATORS 913

then we can prove

Theorem 2 is thus proved.

Proof: The proof of Lemma 1 can be found in [15, p.

Iv. APPROXIMATIONTO NONLINEAR

holds for all U E V.

holds for all U E V and y E K z .

By induction and rearrangement, we can find a sequence Consequently

holds for all IC = 1,.. . , N and U E V, where I

Substituting (40) into (39), we obtain that

' ( g & U s ( x j ) +0:)g(wk.y1 +<k)i2. (44)

output of a dynamical system at a fixed point (or time) by

You might also like