Approximation Capability To Functions of Several Variables Nonlinear Functionals and Operators by Radial Basis Function Neural Networks
Approximation Capability To Functions of Several Variables Nonlinear Functionals and Operators by Radial Basis Function Neural Networks
4, JULY 1995
I. INTRODUCTION
we pointed out that the boundedness of the sigmoidal function where yt E R" , c,, O2 E RI, and y2..x denotes the inner product
plays an essential role for its being an activation function of yz and x.
in the hidden layer, i.e., instead of continuity or monotonity, Among the various kinds of promising neural networks
the boundedness of sigmoidal functions ensures the network currently under active research, there is another type called
capability. radial-basis-function (RBF) networks [ 131 (also called lo-
In addition to sigmoidal functions, many others can be used calized receptive field network [ 16]), where the activation
as activation functions of neural networks. For example, in functions are radially symmetric and produce a localized
[12], we proved that for a function g E C(R1) n S'(R1) to response to input stimulus. For a survey, see, for example,
be an activation function in feedforward neural networks, the [15]. A block diagram of an RBF network is shown in
necessary and sufficient condition is that the function is not a Fig. 1. One of the special basis functions that are commonly
polynomial (also see [5]). used is a Gaussian kernel function. Using the Gaussian basis
The above papers are all significant advances towards function, RBF networks is capable of forming an arbitrarily
solving the problem of whether a function is qualified as close approximation to any continuous functions, as shown by
an activation function in neural networks. They only dealt [ 171-[ 191.
with affine-basis-function neural networks (ABF), also called More generally, the goal here is to approximate functions
multilayer perceptrons (MLP), however, and the goal there is of a finite number of real variables by
Authorized licensed use limited to: Univ of Calif Merced. Downloaded on February 05,2025 at 19:07:08 UTC from IEEE Xplore. Restrictions apply.
CHEN AND CHEN: APPROXIMATION CAPABILITY TO FUNCTIONS OF SEVERAL VARIABLES 905
neural networks were obtained [13], [14]. In [14], Park and AND DEFMTIONS
11. NOTATIONS
Sandberg proved the following important theorem: We list here the main symbols and notations that will be
Let K : R" + R be a radial symmetric, integrable, bounded used throughout this paper.
function such that K is continuous almost everywhere and
SR- K ( x ) x # 0 , then the family X some Banach space with norm I I . I I x .
R" Euclidean space of dimension n with norm
11 ' IIR".
K some compact set in a Banach space.
C(K) Banach space of all continuous functions defined
on K , with norm
is dense in L p ( R n )where
, g(IIXIIR-) = K ( x ) .
In [23], Park and Sandberg discussed several related results
on the L1, L2 approximation. For example, they proved the
following interesting theorem. V some compact set in C ( K ) .
TheoremA: Assuming that K : R" -+ R1 is a square- All Schwartz functions in distribution theory,
integrable function, then the family S(Rn)
i.e., all the infinitely differentiable functions,
which are rapidly decreasing at infinity.
S'(R") All the distributions defined on S ( R " ) , i.e., all
the linear continuous functionals defined on
S(Rn).
Cm( R") All infinitely differentiable functions defined on
is dense in L2(R")if and only if K is pointable. R" .
In [24], we generalized this result and proved. c,- all infinitely differentiable functions with
Theorem B: Suppose that g : R+ + R1,such that compact support in R".
g(11xllR-)E L2(R"),then the family of functions We review the following definitions.
Dejinition 1: A function o : R1 + R1 is called a (gener-
alized) sigmoidal function, if it satisfies
is dense in L2(R").
It is now natural to ask the following questions: 1) What
is the necessary and sufficient condition for a function to be Dejinition 2: Let X be a Banach space with norm I I . I If
qualified as an activation function in RBF neural networks? there are elements x, E X , n = 1 , 2 , . . . , such that for every
2) How can nonlinear functionals be approximated by RBF x E X there is a unique real number sequence aTL (x),such that
neural networks? 3) How can nonlinear operators (e.g., the
00
output of a system) be approximated by RBF neural networks,
using sample data in frequency (phase) domain or in time z= an(z)xn
(state) domain?
The purpose of this paper is to give strong results in
answering those questions. where the series converges in X ,then {xn}?& is called a
This paper is organized as follows. In Section 11, we list our Schauder basis in X and X is called a Banach space with
symbols, notations and review some definitions. In Section Schauder basis.
111, we show that the necessary and sufficient condition for a Dejinition 3: Suppose that X is a Banach space, V C X is
continuous function in S'(R') to be qualified as an activation called a compact set in X , if for every sequence {x"}:!~ with
function in RBF networks is that it is not an even polynomial. all 2, E V, there is a subsequence { x n k } which converges
In Section IV, we show the capability of RBF neural networks to some element x E V. It is well known that if V C X
to approximate nonlinear functionals and operators on some is a compact set in X , then for any 6 > 0, there is a 6-net
Banach space as well as on some compact set in C ( K ) , N ( 6 ) = { X I , . . . , x n ( 6 ) } with
, all xi E V,1: = l , . . .,n(6),
where K is a compact set in any Banach space. Furthermore, i.e., for every x E X , there is some xi E N(6) such that
we establish the capability of neural networks to approximate 11xi - xllx < S.
nonlinear operators from C(K1) to C (K2). Approximations
using samples in both frequency domain and time domain
are discussed. Examples are given, which includes the use of 111. CHARACTERISTICS OF CONTINUOUS
wavelet coefficients to the approximation. It is also pointed out FUNCTIONSAS ACTIVATION IN RBF NETWORKS
that the main results in Section IV can be used in computing In this section, we show that for a continuous function to
the outputs of nonlinear dynamical systems, thus identifying be qualified as an activation function in RBF networks, the
the system. We conclude this paper with Section V. necessary and sufficient condition is that it is not an even
Authorized licensed use limited to: Univ of Calif Merced. Downloaded on February 05,2025 at 19:07:08 UTC from IEEE Xplore. Restrictions apply.
906 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 6, NO. 4, JULY 1995
polynomial and we prove two approximation theorems by RBF for all X E R1, X # 0 , rotations p, where irepresents the
networks. Fourier transform of h in the sense of distribution.
More precisely, we prove the following. For (6) to make sense, we have to show that
Theorem 1: Suppose that g E C(R1)n S'(R1),i.e., all
those continuous functions such that SRI g ( x ) s ( x )dx makes w(t)&(Xp-'(t)) E S'(R").
sense for all s E S ( R 1 )then
, the family
In fact, G ( t ) E S(R").Moreover, since supp(dp) C K , it
N is easy to verify that &(t) = S e-it'z d p ( x ) E Cm(R")and
cciY(Xi115 - YillR.) there are constants C k , k = 1,2, . . . , such that
a=1
(7)
is dense in C ( K ) ,if and only if g is not an even polynomial,
where K is a compact set in R", ya E R", e,, A, E R1,i = Thus, w ( t ) d ^ p ( t )E S(R").
1, . . . , N . Since dp # 0 and &(t) E C"(Rn),there is t o E R n , h #
Theorem 2: Suppose that g E C ( R 1 n ) S'(R1)and is not 0 and a neighborhood O(t0,S) = {x : 112 - t o l l ~ n< 6} such
an even polynomial, K is a compact set in R", V is a compact
that I&u(t)l> c > 0 for all t E O(to,6).
set in C ( K ) ,then for any E > 0, there are a positive integer
Pick tl E R", tl # 0 arbitrarily. Let t o = Xp(tl),where p is
N . A, E RI, ya E R" , i = I, . . ' , N , which are all independent
a rotation in R". Then Idp(Xp(t))l > c for all t E O ( t l , 6 / X ) .
of f E V and constants c , ( f ) depending on f ,i = 1, . . . ,N ,
Let 2ir E C,"(O(tl,6/X)), then w ( t ) / & ( X p ( t ) ) E
such that
C,"(O(tl, S/X)), because I&(Xp(t))l > rl and d ~ L ( X p ( t ) E )
I N I C" (R").Therefore
I i=l I
holds for all x E K , f E V. Moreover, every e i ( f ) is a
continuous functional defined on V.
The previous argument shows that for any t* E R", t* # 0 ,
Remark I : It is worth noting that X i and yi are all indepen-
dent of f in V and q ( f ) are continuous functionals. This fact there is a neighborhood of t* : O ( t * , q )such that
will play an important role in the approximation to nonlinear (A@), z;(t))= 0 (9)
operators by RBF networks.
To prove Theorem 1, we need to prove the following lemma holds for all 6 with support supp (w)C O ( t * ,T I ) , which
which is of significance by itself. means that supp (4) C { O}. It is well known that a distribution
Lemma 1: Suppose that h(x) E C(R")n S'(R"),then the is the Fourier transform of a polynomial if and only if its
family support is a subset of (0). (see [25, Section 7.161). Thus g
N is a polynomial.
czh(Aipi(x) Y i ) + Necessity. If g is a polynomial of degree m, then all the
functions
i=l
N
is dense in C ( K ) if and only if h is not a polynomial, where
p,arerotationsinR",y;~R",A;~R~,i=l,-..,N.
i=l
Pro08 Sufficiency. Suppose that the linear combinations
Cy=?=,c;h(Xipa(x)++i) are not dense in C[a,b ] , then by Hahn- are polynomials of x l , . . . , x, with total degree m, which, of
Banach extension theorem of linear functionals and Riesz course, are not dense in C ( K ) .Lemma 1 is proved. 0
representation theorem (see [6]), we conclude that there is Proof of Theorem 1: Suppose that g E C(R1)n S'(R1),
a bounded signed Bore1 measure dp with supp ( d p ) C K and then h(x) = g(11XllRn)E S'(R") n C(R")and
N
(3)
c%'(Xillx - YilIR")
i=l
for all X E R1,y E R" and all rotation p. Pick any w E N
S (R" ) , then = c c i Y ( X i l l P ( x )- P(YZ)lIRn)
Authorized licensed use limited to: Univ of Calif Merced. Downloaded on February 05,2025 at 19:07:08 UTC from IEEE Xplore. Restrictions apply.
CHEN AND CHEN: APPROXIMATION CAPABILITY TO FUNCTIONS OF SEVERAL VARIABLES 907
such that
I N
(i(r: N
- ~ c i ( f ) g ( ~ i -YzllRn)
i=l
llz < E
Combining the two cases, we conclude that K' is a compact
(15) set in X. Lemma 4 is proved.
Proof of Theorem 3: By Tietze extension theorem, we
0
Authorized licensed use limited to: Univ of Calif Merced. Downloaded on February 05,2025 at 19:07:08 UTC from IEEE Xplore. Restrictions apply.
908 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 6, NO. 4, JULY 1995
and
J=1
for j = l;..,n(qk). It is easy to verify that {TVk,j(x)}
is a
partition of unity, i.e.,
Authorized licensed use limited to: Univ of Calif Merced. Downloaded on February 05,2025 at 19:07:08 UTC from IEEE Xplore. Restrictions apply.
~
a=1
k=l i=l
li(I.) -
2=1
N
~ c 2 g ( X z 1 1 ~-
M <YIlp)l <E (30)
v
for all U E V and E K2. Theorem 5 is proved.
Remark3: Theorem 5 shows the capability of RBF net-
works in approximating nonlinear operators using sample data
in time (or state) domain. Likewise, by Theorem 3, we can
construct RBF networks using sample data in frequency (or
phase) domain.
Remark 4: We can also construct neural networks, where
affine basis functions are mixed with radial basis functions.
For example, we can approximate G(U ) (y) by
N M
V. CONCLUSION
In this paper, the problem of approximating functions of
several variables, functionals and nonlinear operators by radial
basis function neural networks are studied. The necessary and
sufficient condition for a continuous function to be qualified
as an activation function in RBF networks is given. Results
on using RBF neural networks for computing the output of
dynamical systems by sample data in frequency domain or
time domain are given.
1
N
"('I,)
ug = U'"(xJT'Ik,J(2). (36)
3=1
- U n ( 7 k )(x)= [U(.) - 1 1 ~ v ,J
k (x)
J=1
1/1--23 I/Xl,lk
Authorized licensed use limited to: Univ of Calif Merced. Downloaded on February 05,2025 at 19:07:08 UTC from IEEE Xplore. Restrictions apply.
910 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 6, NO. 4, m y 1995
ACKNOWLEDGMENT
The authors wish to thank Prof. R.-W. Liu of University of
Notre Dame and Prof. I. W. Sandberg of University of Texas- Tianping Chen received the graduate degrees from
Austin for bringing some of the papers in this area to their Fudan University, Shanghai, China, in 1966.
attention. He is a Professor at Fudan University, Shanghai,
China. He is also a Concurrent Professor at Nanjing
University of of Aeronautics and Astronautics. He
REFERENCES has held short-term appointments at several insti-
tutions in US. and Europe. His research interests
[ l ] A. Wieland and R. Leighten, “Geometric analysis of neural network include: harmonic analysis, approximation theory,
capacity,” in Proc. IEEE 1st ICNN. 1, 1987, pp. 385-392. neural networks and signal processing.
[2] B. Irie and S. Miyake, “Capacity of three-layered perceptrons,” in Proc. He has published over 80 journal papers and was
IEEE ICNN 1, 1988, pp. 641448. a recipient of a National Award for Excellence in
[3] S. M. Carroll and B. W. D i c k ” , “Construction of neural nets using Scientific Research by State Education Commission of China in 1985 and
radon transform,” in Proc. IJCNN Proc. I , 1989, pp. 607411. 1994.
[4] K. Funahashi, “On the approximate realization of continuous mappings
by neural networks,” Neural Networks, vol. 2, pp. 183-192, 1989.
[5] H. N. Mhaskar and C. A. Micchelli, “Approximation by superposition of
sigmoidal and radial functions,” in Advances on Applied Mathematics,
vol. 13, pp. 350-373, 1992.
[6] G. Cyhenko, “Approximation by superpositions of a sigmoidal func- Hong Chen received the B.S.E.E. degree from
tion,” in Mathematics of Control, Signals and Systems, vol. 2, no. 4,pp. Fudan University, Shanghai, P.R. China in 1988
303-314, 1989. and the M.S.E.E. and Ph.D. degrees from University
[7] Y. Ito, “Representation of functions by superpositions of a step or of Notre Dame, Notre Dame, Indiana, in 1991 and
sigmoidal function and their applications to neural network theory,” 1993, respectively.
Neural Networks, vol. 4, pp. 385-394, 1991. He was with VLSI Libraries, Inc., Santa Clara,
[8] K. Hornik, “Approximation capabilities of multilayer feedfonvard net- CA, and is now with Sun Microsystems, Inc., Moun-
works,” Neural Networks, vol. 4, pp. 251-257, 1991. tain View, CA. His interests include neural net-
[9] T. Chen, H. Chen, and R.-W. Liu, “Approximation capability in C ( R ” ) works, signal processing and VLSI design.
by multilayer feedfonvard networks and related problems,” IEEE Trans.
Neural Networks, vol. 6, no. 1, Jan. 1995.
Authorized licensed use limited to: Univ of Calif Merced. Downloaded on February 05,2025 at 19:07:08 UTC from IEEE Xplore. Restrictions apply.