2005 A Generalized Growing and Pruning RBF (GGAP-RBF) Neural Network For Function Approximation
2005 A Generalized Growing and Pruning RBF (GGAP-RBF) Neural Network For Function Approximation
1, JANUARY 2005
57
AbstractThis paper presents a new sequential learning algorithm for radial basis function (RBF) networks referred to as generalized growing and pruning algorithm for RBF (GGAP-RBF).
The paper first introduces the concept of significance for the
hidden neurons and then uses it in the learning algorithm to
realize parsimonious networks. The growing and pruning strategy
of GGAP-RBF is based on linking the required learning accuracy
with the significance of the nearest or intentionally added new
neuron. Significance of a neuron is a measure of the average information content of that neuron. The GGAP-RBF algorithm can be
used for any arbitrary sampling density for training samples and
is derived from a rigorous statistical point of view. Simulation results for bench mark problems in the function approximation area
show that the GGAP-RBF outperforms several other sequential
learning algorithms in terms of learning speed, network size and
generalization performance regardless of the sampling density
function of the training data.
Index TermsGrowing, neurons significance, pruning, radial
basis networks, sequential learning.
I. INTRODUCTION
58
sense) over all the input data received so far. A neuron will
not be added to the network if its significance is going to be
little. To our best knowledge, this is different from all the existing sequential learning algorithms including RAN, RANEKF,
HSOL [18], and the sequential growing and pruning algorithm
proposed by Todorovic and Stankovic [19] since all of them add
new neurons based on their novelty to the individual instant observations. However in our proposed algorithm, a neuron can
be only added when it is statistically significant to all the observations including the observations which have been learned
but discarded already. Likewise, in our proposed algorithm, a
hidden neuron with little significancethe contribution made
by that neuron to the network output averaged over all the observations receivedis simply removed from the network. In contrast, both HSOL [18] and the sequential growing and pruning
algorithm proposed in [19] prune neurons mainly based on their
significance computed based on that individual instant input
observations.
Our concept of significance is also wholly different from and
much simpler than that of [19] where it was defined based on
the sensitivity of a neurons width and connection weight to the
output error for the current input data. To put it simply, the significance proposed in this paper is defined as a neurons statistical contribution to the overall performance of the network.
In the generalized growing and pruning RBF (GGAP-RBF)
algorithm proposed in this paper, this significance is used in
growing and pruning strategies. A new neuron will be added
only if its significance is more than the chosen learning accuracy. If during training the significance for a neuron becomes
less than the learning accuracy, then that neuron will be pruned.
The GGAP-RBF algorithm is truly sequential and can be used
for on-line learning in real time applications where the training
observations are sequentially (one-by-one) presented and discarded after being learned, and the learning (parameter adjustment, network growing or pruning) is conducted whenever a
new observation is presented.
The main difference between the work of Salmern et al. [11]
and Rojas et al. [12] and that in this paper is: [11] [12] do not
conduct pruning during the learning stage. As shown in Section
IV, the algorithms without pruning functions during learning
stage may result in a large network leading to learning failure. In
our proposed algorithm, the pruning is checked and conducted
through the whole learning phase and the network architecture
remains compact and, thus, avoids the above problem.
In this paper, the performance comparison of the GGAP-RBF
with RAN, RANEKF, and MRAN in terms of learning accuracy, learning speed and compactness of the network are presented for two benchmark problems, namely California Housing
and chaotic time series prediction problems. The results indicate
the superior performance of GGAP-RBF for all the problems
studied. Recently a fast implementation of SVM for regression
(SVR) based on sequential minimal optimization (SMO) [15]
has become popular. We have compared our GGAP-RBF with
this SMO even though GGAP-RBF is a sequential algorithm
whereas SMO is not a truly sequential algorithm. SMO sequentially adjusts only the Lagrange multipliers and the data are processed batch by batch. The learning phase starts when all data
are ready and no new data are added during the learning phase.
The simulation results on real large complex applications show
that the proposed GGAP-RBF algorithm is faster and provides
59
(5)
Thus, for an observation
neuron is given by
(6)
(1)
is the weight connecting the th hidden neuron to
where
the output neuron and
is the response of the th hidden
neuron for an input vector
(2)
where
and
are the center and
width of the th hidden neuron, respectively,
.
In sequential learning, a series of training samples are randomly drawn and presented to, and learned by the network one
, be
by one. Let a series of training samples
with a samdrawn sequentially and randomly from a range
pling density function of
, where is a subset of an -dimensional Euclidian space. The sampling density function
is defined as
(3)
For simplicity, in this paper,
is denoted by . The size
can be denoted by
. After seof the range
quentially learning observations, assume that a RBF network
neurons has been obtained. The network output for an
with
input
is given by
(4)
(8)
would be very
However, the computation complexity of
high if it were calculated based on all learned observations and
if is large. On the other hand, in the sequential learning implementation after learning the training observations
, are no longer stored in the system and the value may
be unknown and not recorded either. In fact, there may possibly have some more observations to be input further. Thus,
there must be some simpler and better way to calculate
as stated in (8) without the prior knowledge of each specified
.
observations
Suppose that the observations
are
drawn from a sampling range with a sampling density func. Suppose that at an instant of time, observations
tion
have been learned by the sequential learning system.
Let the sampling range
be divided into
small spaces
. The size of
is represented by
.
there are about
Since the sampling density function is
samples in each
, where
is any point
chosen in
. From (8), we have
(9)
60
is large and
is
(14)
(10)
This is the statistical contribution of neuron to the overall
output of the RBF network and we define this as the signifi, and is given by
cance of a specified neuron
For the uniform sampling density case, one needs to know the
of the sampling range . In most applications,
size
may be known or can be simply estimated. In fact, as we do in
some of our simulations, one may just normalize the inputs to
and get
.
the range
B. Normal Sampling Distribution
(11)
If the significance of neuron is less than the required learning
accuracy
, then neuron should be deemed insignificant
and removed, otherwise, neuron is significant and should be
retained.
More interestingly, if the distributions of the attributes
of observations s are independent from
of can be written as:
each other, the density function
, where
is the density function of
the th attribute of observations. Thus, in this case, significance (11) can be rewritten as
(12)
and
the
where is the dimension of the input space
interval of the th attribute of observations,
.
The above equation involves an integration of the probability
in the sampling range . This can be done
density function
functions
analytically for some simple but popularly used
like uniform, normal, exponential and Rayleigh functions, etc.
(16)
since the samples are drawn from the whole
In general,
range of and the neuron impacts only part area of . Thus,
the significance can be expressed as
(13)
(17)
61
(18)
(22)
(19)
(20)
(23)
In this case, the significance (11) can be estimated as
(21)
In GGAP-RBF algorithm, this definition of significance of a
neuron (formula (11) or (12), or its special cases such as uniform
sampling case (14), normal sampling case (17), Rayleigh sampling case (19), and exponential sampling case (21)) are used
for the growing and pruning criteria for hidden neurons as indicated below.
where
and is an overlap factor that determines the overlap of the responses of the hidden neurons in the
input space.
Remark: The first criterion ensures that a new neuron is
only added if the input data is sufficiently far from the existing
neurons. The second criterion ensures that the significance of
the newly added neuron obtained by substituting (23) in (11)
. Also
is greater than the required approximation accuracy
the second criterion subsumes the novelty criterion used in
RAN, RANKEF, and MRAN where the condition
is obviously implied in our enhanced growing criterion
based on the significance of newly intentionally added
neuron.
B. Pruning Criterion
If the significance of neuron is less than the approximation
, neuron is insignificant and should be removed,
accuracy
62
and the growing criteria (22) is not satisfied, no new neuron will
be added and only the parameters of the nearest neuron will
be adjusted. Since the parameters of all the neurons except for
the nearest one remain unchanged, those neurons except for the
th
nearest one will remain significant after learning the
observation. After the parameters of the nearest neuron are
adjusted, if the nearest neuron becomes insignificant it should
be removed. That means, if only the parameters of the nearest
neuron are adjusted after each observation during sequential
learning, one needs to check only whether the nearest neuron
becomes insignificant after adjustment. As for the parameter
adjustment, the adjustment will be done for the parameters of
the nearest neuron using the EKF algorithm. In EKF algorithm,
the Kalman gain vector computation becomes dramatically
simpler because we are adjusting only one neuron. (Refer to [9]
for EKF equation details.)
Thus, a new simple efficient GGAP-RBF algorithm suitable
for multi-inputmulti-output (MIMO) applications and with any
is summarized below:
sampling density function
Proposed GGAP-RBF Algorithm: For each observation
presented to the network, where
and
, do
(24)
The above condition implies that after learning each observation, the significance for all neurons should be computed and
checked for possible pruning. This will be a computationally intensive task. But it is shown in the next subsection that only the
nearest neuron is possibly insignificant and need to be checked
for pruning and there is no need to compute the significance of
all neurons in the network.
C. Nearest Neuron for Parameter Adjustment and Pruning
In order to increase the learning speed further it can be shown
that, one needs only to adjust parameters of the neuron nearest
(in the Euclidean distance sense) to the most recently received
input if no new neuron is added and only needs to check the
nearest (most recently adjusted) neuron for pruning. It is neither
necessary to adjust the parameters for all neurons nor necessary
to check all neurons for possible pruning. At any time instant,
only the single nearest neuron needs to be adjusted or needs to
be checked for pruning. The rationale for this is given as follows.
,
Compared to the Gaussian function
its first and second derivatives will approach zero faster
. Thus, in the gradient vector
of EKF [9]
when
all elements except
will approach zero
quickly when
.
This would dramatically increase the learning speed by removing the curse of dimensionality of RANEKF and MRAN
but without losing learning performance in most cases. In fact,
when an observation is presented, RANEKF and MRAN needs
to calculate Kalman gain vector and error covariance matrix by
gradient vector, where is the number of neurons
using a
obtained so far. When
becomes large, this would result in
a larger learning time and possibly make ordinary computers
incapable of learning because of lower memory configuration,
etc. However, since only one neurons parameters are adjusted
at any time, GGAP-RBF only needs to do simple matrix operation on a very small matrix.
Suppose that after sequentially learning observations, a
RBF network with
neurons has been obtained. Obviously
all these
neurons should be significant since insignificant
neurons would have been pruned after learning the th observation. If a new
th observation
arrives and
the growing criteria (22) is satisfied, a new significant neuron
will be added. The parameters of all the rest neurons remain unchanged and those neurons will remain significant after
learning the
th observation, and the new added neuron is
also significant, thus, pruning checking need not be done after a
new neuron is added. If a new observation
arrives
with
(27)
Else
adjust the network parameters
for the nearest
neuron only.
check the criterion for pruning the adjusted hidden neuron:
If
63
MAE
(28)
Fig. 1. Histogram shows the distribution of California Housing training
samples: attributes 14. (a) Histogram of attribute 1. (b) Histogram of attribute
2. (c) Histogram of attribute 3. (d) Histogram of attribute 4.
(29)
64
TABLE I
PERFORMANCE COMPARISON OF DIFFERENT ALGORITHMS FOR A REAL
LARGE-SCALE COMPLEX APPLICATION: CALIFORNIA HOUSING
(30)
can be further easily calculated by estimating
, according to normal
), uniform sampling case (14) (if
sampling case (17) (if
), exponential sampling case (19) (if
), and
Rayleigh sampling case (21) (if
), respectively.
Table I shows that GGAP, MRAN, RANEKF, RAN, and
SVR achieve comparable generalization performance. SVR
spent 164.8440 s (running in C executable environment faster
than MATLAB environment)3 obtaining 2429 support vectors.
Although by setting different values for parameter , SVR
could achieve higher learning speed for this application, but
the learning accuracy would become worse and the number
of support vectors would become higher. RAN spent 3505.2
s obtaining a network with 3552 neurons and MRAN spent
2891.5 s obtaining a smaller network with 64 neurons. During
the simulations for this application, it was found that it was
difficult to get all the data trained by RANEKF. Similar to
RAN, RANEKF obtained large number of neurons for this
3Refer
to http://www.mathworks.com/products/compiler/examples/example2.shtml for compiler execution speed comparison.
65
TABLE II
PERFORMANCE COMPARISON OF DIFFERENT ALGORITHMS FOR A TIME
SERIES APPLICATION: MACKEYGLASS
(33)
whereas the step-ahead predicted value (the output of the netis given by
work) at time
(34)
The
as one of the benchmark time series problems, which is generated from the following delay differential equation:
(31)
for
the time interval
, and
(32)
As given by Yingwei et al. [9], setting
, the time
for
series is generated under the condition
(
in our test case). The series is predicted
Similar to [1] and [9], the parameter values are selected as fol. For
lows:
MRAN, the growing threshold and growing/pruning window
size are chosen as 0.04 and 90, respectively. For GGAP-RBF,
simply assume that the observations are drawn uniformly from
and, thus, set
.
the range
In this test case, the network was trained by the observations
up to time
and the unknown observations from time
to
were used as testing observation to test
the prediction of the network. Table II shows the learning speed,
training accuracy, prediction performance and the obtained neurons. It can be seen that GGAP-RBF, MRAN, and RANEKF can
get comparable prediction performance on the long-term prediction. However, GGAP-RBF learns much faster than other algorithms and obtains smaller network architectures. Figs. 5 and 6
display the neuron updating history during the online learning
phase and the approximated time series curve for both 1-norm
and 2-norm cases.
Remark: It has been found that the networks obtained by
RAN and RANEKF become large in some simulations, indicating that RAN and RANEKF without pruning approaches
may not be able to cope with real practical applications in an
ordinary computing environment. Compared with RAN and
RANEKF, MRAN can obtain a more compact network. But its
learning accuracy and generalization performance may not be
very good unless the growing-pruning window size is chosen
properly. Choosing the proper size for the window can only be
done by trial and error based on exhaustive simulation studies.
If several trials are conducted, the total time spent on whole
66
training stage would be possibly large. For example, for the California Housing application, one trial of MRAN would take up
to 50 min. In order to get proper parameters, at least six trials
were conducted in our experiment and the actual training time
we spent was much more than 300 min, compared to the exs) spent for our new proposed
tremely short training time (
algorithm.
V. CONCLUSION
In this paper, a sequential learning algorithm called
GGAP-RBF for any arbitrary distribution of input training
samples/data is presented for function approximation. Although RANEKF may obtain large networks in some applications, EKF based parameter adjustment method introduced
by Kadirkamanathan and Niranjan [2] still plays the key role
in both MRAN and GGAP-RBF. The key difference among
the parameter adjustments of RANEKF, MRAN and ours is
that GGAP-RBF reduces the computation complexity by only
adjusting the parameters of the nearest neuron every time
instead of all neurons without losing learning performance.
Using the idea of significance of neurons, which is quantitatively defined from a statistical viewpoint as the average information content of a neuron and also the contribution of that
neuron to the overall performance of the RBF network, the proposed algorithm adds and prunes hidden neurons smoothly and
produces much more compact networks than other popular sequential learning algorithms. The GGAP-RBF algorithm can be
used for any arbitrary input sampling distribution and the simulation results have been provided for uniform, normal, Rayleigh,
and exponential sampling cases.
Although Gaussian RBF has been used and tested in this
paper, it should be noted that the significance concept [(11)
and (12)] introduced in this paper can be used for other radial
basis functions as well. The convergence and generalization performance of these significance based radial basis function networks is an interesting topic for further research.
REFERENCES
[1] J. Platt, A resource-allocating network for function interpolation,
Neural Computat., vol. 3, pp. 213225, 1991.
[2] V. Kadirkamanathan and M. Niranjan, A function estimation approach
to sequential learning with neural networks, Neural Computat., vol. 5,
pp. 954975, 1993.
[3] N. B. Karayiannis and G. W. Mi, Growing radial basis neural networks:
Merging supervised and unsupervised learning with network growth
techniques, IEEE Trans. Neural Netw., vol. 8, no. 6, pp. 14921506,
Nov. 1997.
[4] S. Chen, C. F. N. Cowan, and P. M. Grant, Orthogonal least squares
learning algorithm for radial basis function networks, IEEE Trans.
Neural Netw., vol. 2, no. 2, pp. 302309, Mar. 1991.
[5] S. Chen, E. S. Chng, and K. Alkadhimi, Regularized orthogonal least
squares algorithm for constructing radial basis function networks, Int.
J. Control, vol. 64, no. 5, pp. 829837, 1996.
[6] E. S. Chng, S. Chen, and B. Mulgrew, Gradient radial basis function
networks for nonlinear and nonstationary time series prediction, IEEE
Trans. Neural Netw., vol. 7, no. 1, pp. 190194, Jan. 1996.
[7] M. J. L. Orr, Regularization on the selection of radial basis function
centers, Neural Computat., vol. 7, pp. 606623, 1995.
[8] A. G. Bors and M. Gabbouj, Minimal topology for a radial basis functions neural network for pattern classification, Dig. Signal Process., vol.
4, pp. 173188, 1994.
[9] L. Yingwei, N. Sundararajan, and P. Saratchandran, A sequental
learning scheme for function approximation using minimal radial
basis function (RBF) neural networks, Neural Computat., vol. 9, pp.
461478, 1997.
[10]
, Performance evaluation of a sequental minimal radial basis
function (RBF) neural network learning algorithm, IEEE Trans. Neural
Netw., vol. 9, no. 2, pp. 308318, Mar. 1998.
[11] M. Salmern, J. Ortega, C. G. Puntonet, and A. Prieto, Improved RAN
sequential prediction using orthogonal techniques, Neurocomput., vol.
41, pp. 153172, 2001.
[12] I. Rojas, H. Pomares, J. L. Bernier, J. Ortega, B. Pino, F. J. Pelayo, and
A. Prieto, Time series analysis using normalized PG-RBF network with
regression weights, Neurocomput., vol. 42, pp. 267285, 2002.
[13] H. Drucker, C. J. Burges, L. Kaufman, A. Smola, and V. Vapnik, Support vector regression machines, in Neural Information Processing Systems 9, M. Mozer, J. Jordan, and T. Petscbe, Eds. Cambridge, MA:
MIT Press, 1997, pp. 155161.
[14] S. Vijayakumar and S. Wu, Sequential support vector classifiers and
regression, in Proc. Int. Conf. Soft Computing (SOCO99), Genoa, Italy,
pp. 610619.
[15] J. Platt, Sequential minimal optimization: A fast algorithm for training
support vector machines, Microsoft, Res. Tech. Rep. MSR-TR-98-14,
1998.
[16] N. Sundararajan, P. Saratchandran, and L. Yingwei, Radial Basis Function Neural Networks with Sequential Learning: MRAN and Its Applications, Singapore: World Scientific, 1999.
[17] J. N. Franklin, Determinants, in Matrix Theory. Englewood Cliffs,
NJ: Prentice-Hall, 1968, pp. 125.
[18] S. Lee and R. M. Kil, A Gaussian potential function network with hierarchically self-organizing learning, Neural Netw., vol. 4, pp. 207224,
1991.
[19] B. Todorovic and M. Stankovic , Sequential growing and pruning of
radial basis function network, in Proc. Int. Joint Conf. Neural Networks
(IJCNN01), Washington, DC, Jul. 1519, pp. 19541959.
[20] C.-C. Chang and C.-J. Lin. (2003) LIBSVMA library for support vector machines. Dept. Comput. Sci. Inform. Eng., National
Taiwan Univ., Taiwan, R.O.C. [Online]. Available: Available http://
www.csie.ntu.edu.tw/~cjlin/libsvm/
67