0% found this document useful (0 votes)
7 views11 pages

A Rapid Learning and Dynamic Stepwise Updating Algor

This paper presents a rapid learning algorithm for flat neural networks, particularly functional-link networks, which allows for quick weight updates and efficient training. The proposed dynamic stepwise updating algorithm can be applied to various time-series prediction tasks and demonstrates significant advantages over traditional methods in terms of speed and adaptability. The results indicate that the model is well-suited for real-time applications, making it an attractive option for various practical scenarios.

Uploaded by

madadi morad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views11 pages

A Rapid Learning and Dynamic Stepwise Updating Algor

This paper presents a rapid learning algorithm for flat neural networks, particularly functional-link networks, which allows for quick weight updates and efficient training. The proposed dynamic stepwise updating algorithm can be applied to various time-series prediction tasks and demonstrates significant advantages over traditional methods in terms of speed and adaptability. The results indicate that the model is well-suited for real-time applications, making it an attractive option for various practical scenarios.

Uploaded by

madadi morad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

62 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 29, NO.

1, FEBRUARY 1999

A Rapid Learning and Dynamic Stepwise


Updating Algorithm for Flat Neural Networks
and the Application to Time-Series Prediction
C. L. Philip Chen, Senior Member, IEEE, and John Z. Wan

Abstract—A fast learning algorithm is proposed to find an opti- This paper proposes a one-step fast learning algorithm and
mal weights of the flat neural networks (especially, the functional- a stepwise update algorithm for the flat networks. Although
link network). Although the flat networks are used for nonlinear only the functional-link network is used as a prototype here,
function approximation, they can be formulated as linear systems.
Thus, the weights of the networks can be solved easily using a the proposed algorithms can also be applied to radial basis
linear least-square method. This formulation makes it easier to function network. The algorithms are developed based on
update the weights instantly for both a new added pattern and the formulation of the functional-link network into a set of
a new added enhancement node. A dynamic stepwise updating linear system equations. Because the system equations of
algorithm is proposed to update the weights of the system on- the radial basis function network have a similar form to
the-fly. The model is tested on several time-series data including
an infrared laser data set, a chaotic time-series, a monthly flour the functional-link network and both networks share similar
price data set, and a nonlinear system identification problem. “flat” architecture, the proposed update algorithm can be
The simulation results are compared to existing models in which applied to radial basis function network as well. The most
more complex architectures and more costly training are needed. significant advantage of the stepwise approach is that the
The results indicate that the proposed model is very attractive to weight connections of network can be updated easily, when a
real-time processes.
new input is given later after the network has been trained. The
weights can be updated easily based on the original weights
I. INTRODUCTION and the new inputs. The stepwise approach also is able to
update weights instantly when a new neuron is added to the
F EEDFORWARD artificial neural networks have been a
popular research subject recently. The research topics vary
from theoretical view of learning algorithms such as learning
existing network if the desired error criterion cannot be met.
With the proposed approach, the flat networks become very
and generalization properties of the networks to a variety attractive in terms of learning speed.
of applications in control, classification, biomedical, manu- The proposed work has been applied to time-series ap-
facturing, and business forecasting, etc. The backpropagation plications including, an infrared laser data set, a chaotic
(BP) supervised learning algorithm is one of the most popular time-series, a monthly flour price data set, and a nonlinear
learning algorithms being developed for layered networks [1], system identification. The time-series is modeled by the AR( )
[2]. Improving the learning speed of BP and increasing the (auto-regression with delay) model. During the training
generalization capability of the networks has played a center stage, different number of enhancement nodes may be added as
role in neural network research [3]–[9]. Apart from multilayer necessary. The update of weights is carried by the proposed al-
network architectures and the BP algorithm, various simplified gorithm. Contrary to the traditional BP learning and multilayer
architectures or different nonlinear activation functions have models, the training of this network is fast because of an one-
been devised. Among those, a so-called flat networks includ- step learning procedure and the dynamic updating algorithm.
ing functional-link neural network and radial basis function The proposed work also has been applied to nonlinear system
network have been proposed [10]–[15]. These flat networks identification problems involving discrete-time single-input,
remove the drawback of a long learning process with the ad- single-output (SISO), multiple-input, multiple-output (MIMO)
vantage of learning only one set of weights. Most importantly, plants can be described by the difference equations [16]. The
the literature has reported satisfactory generalization capability system identification model extends time-series to more than
in function approximation [14]–[16]. one-dimension, that is the addition of the state variables. With
the proposed algorithm, the training is easy and fast. The result
Manuscript received March 17, 1996; revised September 9, 1996 and July
5, 1997. This work was supported under Air Force Contract F33610-D-5964, is also very promising.
Wright Laboratory, Wright-Patterson AFB, OH, under Grant N00014-92-J- The paper is organized as follows, wherein Section II briefly
4096 from ONR, and under Grant F49620-94-0277 from the Air Force Office discusses the concept of the functional-link and its linear
of Scientific Research.
C. L. P. Chen is with the Department of Computer Science and Engineering, formulation. Sections III and IV introduce the proposed dy-
Wright State University, Dayton, OH 45435 USA. He is also with MLIM, namic stepwise update algorithm followed by the refinement
Materials Directorate, Wright Laboratory, Wright-Patterson Air Force Base, of the model in Section V. Section VI discusses the procedures
OH 45433 USA (e-mail: [email protected]).
J. Z. Wan is with Lexis-Nexis Data Central, Dayton, OH 45343 USA. of the training. Finally, several examples and conclusions
Publisher Item Identifier S 1083-4419(99)00899-7. are given.
1083–4419/99$10.00  1999 IEEE
CHEN AND WAN: DYNAMIC STEPWISE UPDATING ALGORITHM 63

Fig. 2. A linear formulation of functional-link network.

III. PSEUDOINVERSE AND STEPWISE UPDATING


Pao implemented a conjugate gradient search method that
finds the weight matrix, [11]. This paper discusses a rapid
method of finding the weight matrix. To learn the optimal
weight connections for the flat network it is essential to find
Fig. 1. A flat functional-link neural network. the least-square solution of the equation, . Recall
that, the least-square solution to the equation, , is
II. THE LINEAR SYSTEM EQUATION , where is the pseudoinverse of matrix . To
OF THE FUNCTIONAL-LINK NETWORK find the best weight matrix , the rank expansion with instant
learning (REIL) algorithm is described in the following [17].
Fig. 1 illustrates the characteristic flatness feature of the
functional-link network. The network consists of a number A. Algorithm Rank-Expansion with Instant Learning
of enhancement nodes. These enhancement nodes are used as
extra inputs to the network. The weights from input nodes Input: The extended input pattern matrix, , and the
to the enhancement nodes are randomly generated and fixed output matrix, , where is the number of the
thereafter. To be more precise, an enhancement node is con- input patterns.
structed by first taking a linear combination of the input nodes, Output: The weight matrix, , and the neural network.
and then applying a nonlinear activation function to it. Step 1: Add hidden nodes and assign random weights,
This model has been discussed elsewhere by Pao [10]. A , where is the rank of .
rigorous mathematical proof has also been given by Igelnik Step 2: Solve weight, , by minimizing .
and Pao [12]. The literature has also discussed the advantage Step 3: If mean-squared error criterion is not met, add
of the functional-link network in terms of training speed additional nodes and go to Step 2; otherwise, stop.
and its generalization property over the general feedforward End of Algorithm REIL
networks [11]. In general, the functional-link network with The computation complexity of this algorithm comes mostly
enhancement nodes can be represented as an equation of the from the time spent in Step 2. There are several methods
form: for solving least-squares problem [18]. The complexity of
FLOP count is the order of , where is the
(1) number of rows in the training matrix, and is the number
of columns. The singular value decomposition is the most
common approach. Compared with gradient descent search,
where is the enhancement weight matrix, which is ran-
the least-squares method is time efficient [19].
domly generated, is the weight matrix that needs to be
The above algorithm is a batch algorithm in which we
trained, is the bias function, is the output matrix, and
assume that all the input data are available at the time
is a nonlinear activation function. The activation function
of training. However, in a real-time application, as a new
can be either a sigmoid or a tanh function. If the term is
input pattern is given to the network, the matrix must
not included, an additional constant bias node with 1 or 1
be updated. It is not efficient at all if we continue using the
is needed. This will cover even function terms for function
REIL algorithm. We must pursue an alternative approach. Here
approximation applications, which has been explained using
we take advantage of the flat structure in which extra nodes
Taylor series expansion in [17].
can be added and the weights can be found very easily if
Denoting by the matrix , where is the
necessary. In addition, weights can be easily updated without
expanded input matrix consisting of all input vectors combined
running a complete training cycle when either one or more
with enhancement components, yields
new enhancement nodes are added, or more observations are
available. The stepwise updating of the weight matrix can be
(2) achieved by taking the pseudoinverse of a partitioned matrix
described below [20], [21]. Let us denote prime ( ) as the
The structure is illustrated in Fig. 2. transpose of a matrix. Let be the pattern matrix
64 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 29, NO. 1, FEBRUARY 1999

defined above, and be the new pattern entered to the


neural network. Here the subscript denotes the discrete time
instance. Denote as follows:

then the theorem states that the pseudoinverse of the new


matrix is

where
if (a)
and
if
where

In other words, the pseudoinverse of can be obtained


through and the added row vector . A noteworthy fact
is that, if and is of full rank, then . This can
be shown as follows. If is of full rank and , then

(b)
therefore
Fig. 3. Illustration of stepwise update algorithm.

where ,

if
if
So the pseudoinverse of can be updated based only on
, and the new added row vector without recomputing and .
the entire new pseudoinverse. Again the new weights are
Let the output vector be partitioned as
(4)

where and are the weights after and before a new


where is the new output corresponding to the new input neuron is added, respectively. Since a new neuron is added
and let to the existing network, the weights, , have one more
dimension than . Also note again that, if is of the
full rank, then and no computation of pseudoinverse is
Then according to the above equations, the new weight, involved in updating the pseudoinverse or weight matrix
, can be found as follows: .
(3) The one-step dynamic learning is shown in Fig. 3. This
raises the question of the rank of input matrix . As can
Equation (3) has the same form with the recursive least- be seen from the above discussion, it is desirable to maintain
square solution, if . However, (3) considers the case if the full rank condition of when adding rows and columns.
is not the full-rank (i.e., ). Compared to the least The rows consist of training patterns. In other words, it is
mean square (LMS) learning rule [22], (3) has the optimal practically impossible to observe any rank deficient matrix.
learning rate, , which leads the learning in one-step update, Thus, during the training of the network, it is to our advantage
rather than iterative update. The stepwise updating in flat to make sure that the added nodes will increase the rank of
networks is also perfect for adding a new enhancement node to the input matrix. Also if the matrix becomes numerically rank
the network. In this case, it is equivalent to add a new column deficient based on the adjustable tolerance on the singular
to the input matrix . Denote . Then the values, we should consider removing the redundant input
pseudoinverse of the new equals nodes. This is discussed in more details in Section V on
principal component analysis (PCA) related topics.
Another approach to achieve stepwise updating is to main-
tain the – decomposition of the input matrix . The
CHEN AND WAN: DYNAMIC STEPWISE UPDATING ALGORITHM 65

updating of the pseudoinverse (and therefore the weight ma- IV. TRAINING WITH WEIGHTED LEAST SQUARE
trix) involves only a multiplication of finitely sparse matrices In training and testing the fitness of a model, error is
and backward substitutions. Suppose we have the – de- minimized in the sense of least mean-squares, that is in general
composition of and denote , where is an
orthogonal matrix and is an upper triangular matrix. When (5)
a new row or a new column is added, the – decomposition
can be updated based on a finite number of Givens rotations where is the number of patterns. In other words, the
[18]. Denote where, remains orthogonal, average difference between network output and actual output
is an upper triangular matrix, and both are obtained through is minimized over the span of the whole training data. If an
finitely many Givens rotations. The pseudoinverse of is overall fit is hard to achieve, it might be reasonable to train the
network so that it achieves a better fit for most recent data.
This leads to the so-called weighted least-squares problem.
The stepwise updating of weight matrix based on weighted
where (eventually, ) can be computed by back- least-squares is derived as follows.
ward substitution. This stepwise weight update using – Let diag be the weight
and Givens rotation matrix is summarized in the following factor matrix. Also, let represent input matrix with
algorithm. patterns and is with an added new row, that is

B. – Implementation of Weights Matrix Updating


Input: , vector , and weight matrix , Then the weighted least-squares error for the equation
where is an matrix, is an is
orthogonal matrix, is an upper triangular
matrix, and is an row vector. (6)
Output: and weight matrix , With
where is an matrix, is an
orthogonal matrix, and is an diag diag (7)
upper triangular matrix.
we have
Step 1: Expand and , i.e.,

diag
(8)

The weighted least-squares solution can be represented as


Step 2: for to do (9)

Rot If is known and a new pattern (i.e., a


new row ) is imported to the network, then the weighted
pseudoinverse of matrix can be
updated by

Step 3: Since is an upper triangular matrix, the


new can be easily obtained by solving (10)
using backward substitution.
where
End of the – Weight-Updating Algorithm
In Step 2, is the Givens rotation matrix, is the th
column of and is a column vector identical to except
the th and th components. The component is 0.
if
Rot performs a plane rotation from vector to . An example
should make this clear. If (1, 3, 0, 0, 4) , then rotates
to (1, 5, 0, 0, 0) . In fact, transforms the plane
vector (3, 4) to (5, 0) and keeps other components unchanged.
The resulting is an upper triangular matrix while remains if
orthogonal.
Similarly, with a few modification, the above algorithm can
Similar to (3), the updating rule for the weight matrix is
be used to update the new weight matrix if a new column (a
new node) is added to the network. (11)
66 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 29, NO. 1, FEBRUARY 1999

the updating rule for the weight matrix, if is of full rank The orthogonal least squares learning approach is another
(i.e., ), is way to generate a set of weights that can avoid ill-conditioning
problem [13]. Furthermore, regularization and cross-validation
(12)
methods are the techniques to avoid both overfitting and
Equation (12) is exactly the same as the weighted recursive generalization problems [24].
least-squares method [23] in which only the full-rank condition
is discussed. However, (11) is more complete because it covers VI. TIME-SERIES APPLICATIONS
both and cases. Thus, the weighted weight matrix
The literature has discussed time-series forecasting using
can be easily updated based on the current weights and
different neural network models [25], [26]. Here the algorithm
new observations, without running a complete training cycle,
proposed above is applied to the forecasting model. Represent
as long as the weighted pseudoinverse is maintained. Similar
the time-series by the AR( ) (autoregression with delay)
derivation can be applied to the network with an added neuron.
model. Suppose is a stationary time-series. The AR( )
model can be represented as the following equation:
V. REFINE THE MODEL
Let us take a look again at an input matrix of size ,
which represents observations of variables. The singular
value decomposition of is where ’s are autoregression parameters.
In terms of a flat neural network architecture, the AR( )
model can be described as a functional-link network with
input nodes, enhancement nodes, and a single output node.
where is an orthogonal matrix of the eigenvectors of
This will artificially increase the dimension of the input space,
and an orthogonal matrix of eigenvectors of
or the rank of the input data matrix. The network includes
. is an “diagonal” matrix whose diagonals are
input nodes and a single output node. During the training
singular values of . That is
stage, a variable number of enhancement nodes may be added
as necessary. Contrary to the traditional error backpropagation
models, the training of this network is fast because of the
one-step learning procedure and dynamic updating algorithm
mentioned above. To improve the performance in some special
situations, a weighted least-square criterion may be used to
optimize the weights instead of the ordinary least-squares
error.
where is the rank of matrix . is the so-called corre- Using the stepwise updating learning, this section discusses
lation matrix, whose eigenvalues are squares of the singular the procedure of training the neural network for time-series
values. Small singular values might be the result of noise in forecasting. First, available data on a single time-series are
the data or due to round off errors in computations. This can split into training set and testing set. Let the time data,
lead to very large values of weights because the pseudoinverse , be the th time step after the data and assume
of is given by , where that there will be training data points, The training stage
proceeds as follows.
Step 1—Construct Input and Output: Build an input matrix
of size , where is the delay-time. The th
row consists of .
Clearly, small singular values of will result in very large The target output vector will be produced using
value of weights which will, in turn, amplify any noise in .
the new data. The same question arises as more and more Step 2—Obtain the Weight Matrix: Find the pseudoinverse
enhancement nodes are added to the model during the training. of and the weight matrix . This will give the
A possible solution is to round off small singular values to linear least-square fit with lags, or AR( ). Predictions can be
zeros and therefore avoid large values of weights. If there is produced either single step ahead or iterated prediction. The
a gap among all the singular values, it is easy to cutoff at the network outputs are then compared to the actual continuation
gap. Otherwise, one of the following approaches may work. of the data using testing data. The error will be large most of
1) Set an upper bound on the norm of weights. This will the time, especially when we deal with a nonlinear time-series.
provide a criterion to cutoff small singular values. The Step 3—Add a New Enhancement Node if the Error is Above
result is an optimal solution within a bounded region. the Desired Level: If the error is above the desired level, a
2) Investigate the relation between the cutoff values and new hidden node will be added. The weights from input nodes
the performance of the network in terms of prediction to the enhancement node can be randomly generated, but a
error. If there is a point where the performance is not numerical rank check may be necessary to ensure that the
improved when small singular values are included, it is added input node will increase the rank of augmented matrix
then reasonable to set a cutoff value corresponding to by one. At this time the pseudoinverse of the new matrix can
that point. be updated by using (4).
CHEN AND WAN: DYNAMIC STEPWISE UPDATING ALGORITHM 67

(a)

(b) (c)
Fig. 4. (a) Prediction of the time-series 60 steps of the future, (b) network prediction (first 50 points), and (c) network prediction (first 100 points).

Step 4—Stepwise Update the Weight Matrix: After enter- in ascending order. Let the condition number of be the
ing a new input pattern to the input matrix, (i.e., adding to ratio of the largest singular value over the smallest one. If
and forming ), the new weight matrix can be the small singular values are not rounded off to zeros, the
obtained or updated, using either (3) or – decomposition conditional number would be huge. In other words, the matrix
algorithm. Then testing data is applied again to check the would be extremely ill-conditioned. The least-square solution
error level. resulting from the pseudoinverse would be very sensitive to
Step 5—Looping for Further Training: Repeat by going to small perturbations which is not desirable. A possible solution
Step 3 until the desired error level is achieved. would be to cut off any small singular values (and therefore
It is worth noting that having more enhancement nodes does reduce the rank). If the error is not under the desired level
not necessarily mean better performance. Particularly, a larger after training, extra input nodes will be produced based on
than necessary number of enhancement nodes usually would the original input nodes and the enhanced input nodes, where
make the augmented input matrix very ill-conditioned and the weights are fixed. This is similar to the idea of “cascade-
therefore prone to computational error. Theoretically, the rank correlation” network structure [27]. But one step learning is
of the expanded input matrix will be increased by one, which utilized here, which is much more efficient.
is not the case as observed in practice. Suppose the expanded
input matrix has singular value decomposition , VII. EXAMPLES AND DISCUSSION
where and are orthogonal matrices, and is a diagonal The proposed time-series forecasting model is tested on
matrix whose diagonal entries give the singular values of several time-series data including an infrared laser data set,
68 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 29, NO. 1, FEBRUARY 1999

TABLE I
PREVIOUS RESULTS FOR EXAMPLE 1

a chaotic time-series, a monthly flour price data set, and a enhancement nodes are added, the network can perform single
nonlinear system identification. The following examples not step predictions exceptionally well. Since the goal is to predict
only show the effectiveness of the proposed method but also multiple steps beyond the training data, iterated prediction
demonstrate a relatively fast way of forecasting time-series. is also produced. Fig. 4 shows 60 steps iterated prediction
The nonlinear system identification of discrete-time SISO, into the future, as is compared to the actual continuation of
MIMO plants can be described by the difference equations the time-series. The whole procedure including training and
[16]. The most common equation for system identification is producing predictions took just about less than 20 s on a DEC
alpha machine, compared to the huge computation with over
1000 parameters to adapt and overnight training time using
backpropagation training algorithm. To compare the prediction
with previous work [28], the normalized mean squared error
where represents the input-output pair of the
(NMSE) is defined as , where
plant at time and and are differentiable functions.
denotes the points in the test set ,
The system identification model extends the input dimen-
denotes the sample variance of the observed value in ,
sion, that is the addition of the state variables. The training
, and are target and predicted values, respectively. A
concept is similar to the one-dimensional (i.e., time) time-
network with a 25 lags and 50 enhancement nodes are used
series prediction. The proposed algorithm can be also applied
for predicting 50 and 100 steps ahead using 1000 data points
to multilag, MIMO systems easily as shown in Example 4.
for training. For 50 steps ahead prediction, the NMSE is about
Example 1: This is one of the data sets used in a competi-
4.15 10 , and the NMSE for 100 steps ahead prediction
tion of time-series prediction held in 1992 [28]. The training
is about 8.1 10 . The results are better than those of
data set contains 1000 points of the fluctuations in a far-
previously done, shown in Table I, in both speed (such as
infrared laser as shown in Fig. 4. The goal is to predict the
hours, days, or weeks) and accuracy [28] (see [28, p. 64,
continuation of the time-series beyond the sample data. During
Table II]).
the course of the competition, the physical background of the
Example 2: Time series produced by iterating the logistic
data set was withheld to avoid biasing the final prediction
map
results. Therefore we are not going to use any information
other than the time-series itself to build our network model.
To determine the size of network, first we use simple linear
net as a preliminary fit, i.e., AR( ), where, is the value of or
so-called lag. After comparing the single step error versus the
value of , it’s noted that optimal choice for the lag value
lies between 10 and 15. So we use an AR(15) model and add is probably the simplest system capable of displaying deter-
nonlinear enhancement nodes as needed. Training starts with ministic chaos. This first-order difference equation, also known
a simple linear network with 15 inputs and 1 output node. as the Feigenbaum equation, has been extensively studied
Enhancement nodes are added one at a time and weights are as a model of biological populations with nonoverlapping
updated using (4), as described in Section III. After about 80 generations, where represents the normalized population
CHEN AND WAN: DYNAMIC STEPWISE UPDATING ALGORITHM 69

of th generation and is a parameter that determines the


dynamics of the population. The behavior of the time-series
depends critically on the value of the bifurcation parameter
. If , the map has a single fixed point and the output
or population dies away to zero. For , the fixed
point at zero becomes unstable and a new stable fixed point
appears. So the output converges to a single nonzero value.
As the value of increases beyond three, the output begins
to oscillate first between two values, then four values, then
eight values, and so on, until reaches a value of about 3.56
when the output becomes chaotic. The is set to four for
producing the tested time-series data from the above map. The
logistic map of the time-series equation (the solid curve) and
the output predicted by the neural network (the “ ” curve) is
shown in Fig. 5(a). A short segment of the time-series is shown
in Fig. 5(b). The network is trained to predict the th
value based only on the value at . The training set consists of
100 consecutive pairs of time-series values. With
just five enhancement nodes, the network can do a single step
prediction pretty well after training. To produce multiple steps (a)
ahead prediction, ten enhancement nodes can push the iterated
prediction up to 20 steps into the future with reasonable error
level [see Fig. 5(c)].
Example 3: As the third example, we tested the model on
a trivariate time-series , ,
where ranges up to 100. The data used are logarithms
of the indexes of monthly flour prices for Buffalo ,
Kansas City , and Minneapolis over the period from
8/72 to 11/80 [29]. First we trained the network with eight
enhancement nodes using first 90 data. The next ten data (b)
are then tested in one-lag prediction, starting from .
To compare the prediction with previous work, the mean
squared error (MSE) is defined as ,
where denotes the points in the test set
, , and are target and predicted values, respectively.
Fig. 6(a) shows the flour price indexes. Fig. 6(b) shows the
networking modeling and target output. The prediction and the
error are given in Fig. 6(c) and (d), respectively. The training
MSE’s for Minneapolis, Kansas City, and Buffalo are 0.0039,
0.0043, and 0.0051, respectively. The prediction MSE’s for
(c)
Minneapolis, Kansas City, and Buffalo are 0.0053, 0.0055, and
0.0054, respectively. The result is better than previous work Fig. 5. (a) Actual quadratic map and network’s prediction, (b) single step
prediction, and (c) iterated prediction of 25 steps for the future.
using multilayer network. We also trained the network with
six inputs coupled (combined) with ten enhancement nodes
using the first 90 triplets from the data. The network performs where the known functions and have the forms
well even in multilag prediction, or iterated prediction. This
is shown in Fig. 6(e). We also observe that, even though
more lags or more enhancement nodes would achieve better fit
during the training stage, but not necessary improve prediction and
performance, especially in the case of multilag.
Example 4: The model is also used for an MIMO nonlinear
system. The two-dimensional input-output vectors of the plant
were assumed to be and The stepwise update algorithm with five enhancement
. The difference equation describing the plant nodes is used to train the above system. Using
was assumed to be of the form [16] and , the responses are
shown in Fig. 7. Fig. 7(a) is the plot for and and
Fig. 7(b) is the plot for and . The dashed-line and
solid-line are also overlapped in this case. Fig. 7(c) and (d)
70 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 29, NO. 1, FEBRUARY 1999

(a) (b)

(c) (d)

(e)
Fig. 6. (a) Flour price indexes, (b) network modeling, (c) network prediction (one-lag), (d) network prediction error (one-lag), and (e) iterated prediction
(multilag) of flour price indexes of three cities, where solid lines are the predicted values and the dashed lines are the actual values.
CHEN AND WAN: DYNAMIC STEPWISE UPDATING ALGORITHM 71

real-time if additional enhancement nodes are added to the


system. Meanwhile, the weights can also be updated easily if
new observations are added to the system. This column-wise
(additional neurons) and row-wise (additional observations)
update scheme is very attractive to real-time processes. 2) The
easy updating of the weights in the proposed approach saves
time and resources to retrain the network from scratch. This
is especially beneficial when the data set is huge.

(a) ACKNOWLEDGMENT
The authors are indebted to Dr. Y.-H. Pao, the pioneer
of the functional-link neural network, for his encouragement
and discussion in this work. The author also deeply thanks
the support from AFOSR and Materials Directorate, Wright
Laboratory, Wright-Patterson Air Force Base.

REFERENCES
[1] P. J. Werbos, “Beyond regression: New tools for prediction and analysis
in the behavioral science,” Ph.D. dissertation, Harvard Univ., Cam-
(b)
bridge, MA, Nov. 1974.
[2] , “Backpropagation through time: What it does and how to do it,”
Proc. IEEE, vol. 78, pp. 1550–1560, Oct. 1990.
[3] A. Cichocki and R. Unbehauen, Neural Networks for Optimization and
Signal Processing. New York: Wiley, 1992.
[4] L. F. Wessels and E. Barnard, “Avoiding false local minima by proper
initialization of connections,” IEEE Trans. Neural Networks, vol. 3, pp.
899–905, 1992.
[5] R. A. Jacobs, “Increased rates of convergence through learning rate
adaptation,” Neural Networks, vol. 1, pp. 295–307, 1988.
[6] H. Drucker and Y. le Cun, “Improving generalization performance using
double backpropagation,” IEEE Trans. Neural Networks, vol. 3, pp.
991–997, 1992.
[7] S. J. Perantonis and D. A. Karras, “An efficient constrained learning
(c)
algorithm with momentum acceleration,” Neural Networks, vol. 8, no.
2, pp. 237–249, 1994.
[8] D. A. Karras and S. J. Perantonis, “An efficient constrained training
algorithm for feedforward networks,” IEEE Trans. Neural Networks,
vol. 6, pp. 1420–1434, Nov. 1995.
[9] D. S. Chen and C. Jain, “A robust back propagation learning algorithm
for function approximation,” IEEE Trans. Neural Networks, vol. 5, pp.
467–479, May 1994.
[10] Y. H. Pao and Y. Takefuji, “Functional-Link net computing, theory,
system architecture, and functionalities,” IEEE Comput., vol. 3, pp.
76–79, 1991.
[11] Y. H. Pao, G. H. Park, and D. J. Sobajic, “Learning and generalization
characteristics of the random vector functional-link net,” in Neuro-
(d) computing. Amsterdam, The Netherlands: Elsevier, 1994, vol. 6, pp.
163–180.
Fig. 7. (a) Identification of a MiMO system, yp1 , (b) identification of a
0
MiMO system, yp2 , (c) the difference for (yp1 y^p1 ), and (d) the difference
[12] B. Igelnik and Y. H. Pao, “Stochastic choice of basis functions in
for (yp2 0 y^p2 ).
adaptive function approximation and the functional-link net,” IEEE
Trans. Neural Networks, vol. 6, pp. 1320–1329, 1995.
[13] S. Chen, C. F. N. Cowan, and P. M. Grant, “Orthogonal least squares
learning algorithm for radial basis function networks,” IEEE Trans.
show the plots of and , respectively. Neural Networks, vol. 2, pp. 302–309, Mar. 1991.
The training time is again very fast—about 30 s in a DEC [14] D. S. Broomhead and D. Lowe, “Multivariable functional interpolation
workstation. and adaptive methods,” Complex Syst. 2, pp. 321–355, 1988.
[15] Y. H. Pao, G. H. Park, and D. J. Sobajic, “Learning and generalization
characteristics of the random vector functional-link net,” in Neuro-
VIII. CONCLUSION computing. Amsterdam, The Netherlands: Elsevier, 1994, vol. 6, pp.
163–180.
In summary, the proposed algorithm is simple and fast [16] K. S. Narendra and K. Parthasarathy, “Identification and control of dy-
and easy to update. Several examples show the promising namical systems using neural networks,” IEEE Trans. Neural Networks,
vol. 1, pp. 4–27, 1990.
result. There are two points that we want to emphasize: 1) the [17] C. L. P. Chen, “A rapid supervised learning neural network for function
proposed learning algorithm for functional-link net is very fast interpolation and approximation,” IEEE Trans. Neural Networks, vol. 7,
pp. 1220–1230, Sept. 1996.
and efficient. The fast learning makes it possible for the trial- [18] G. H. Golub and C. F. Van Loan, Matrix Computations, 3rd ed.
error approach to fine-tune some hard-to-determine parameters Baltimore, MD: Johns Hopkins Univ. Press, 1996.
[e.g., the number of enhancement (hidden) nodes], and the [19] M. H. Hassoun, Fundamentals of Artificial Neural Networks. Cam-
bridge, MA: MIT Press, 1995.
dimension of the state space, or the AR parameter . The [20] A. Ben-Israel and T. N. E. Greville, Generalized Inverses: Theory and
training algorithm allows us to update the weight matrix in Applications. New York: Wiley, 1974.
72 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 29, NO. 1, FEBRUARY 1999

[21] F. H. Kishi, “On line computer control techniques and their application John Z. Wan received the B.S. degree from JanXi University, China, the
to re-entry aerospace vehicle control,” in Advances in Control Systems M.S. degree from Zhongshan University, China, and the Ph.D. degree from
Theory and Applications, C. T. Leondes, Ed. New York: Academic, the University of Cincinnati, Cincinnati, OH, in 1982, 1985, and 1993,
1964, pp. 245–257. respectively, all in mathematics. He also studied at Wright State University
[22] B. Widrow, “Generalization and information storage in networks of for the M.S. degree in computer science.
adaline neuron,” in Self-Organizing Systems, M. C. Jovitz et al., Eds., He worked at Armstrong Laboratory, Wright-Patterson Air Force Base, for
1962, pp. 435–461. one year before he moved to Lexis-Nexis Data Central, Dayton, OH, where
[23] C. R. Johnson, Jr., Lectures on Adaptive Parameter Estimation. Engle- he is currently a Software Engineer.
wood Cliffs, NJ: Prentice-Hall, 1988.
[24] M. J. L. Orr, “Regularization in the selection of radial basis function
centers,” Neural Computat., vol. 7, pp. 606–623, 1995.
[25] V. R. Vemuri and R. D. Rogers, Eds., Artificial Neural Networks:
Forecasting Time Series. Los Alamitos, CA: IEEE Comput. Soc. Press,
1993.
[26] A. Khotanzad, R. Hwang, A. Abaye, and D. Maratukulam, “An adaptive
modular artificial neural network hourly load forecaster and its imple-
mentation at electric utilities,” IEEE Trans. Power Syst., vol. 10, pp.
1716–1922, 1995.
[27] S. E. Fahlman and C. Lebiere, “The cascade-correlation learning archi-
tecture,” Adv. Neural Inf. Process. Syst. I, 1989.
[28] A. S. Weigend and N. A. Gershenfeld Eds., Time Series Prediction,
Forecasting the Future and Understanding the Past. Reading, MA:
Addison-Wesley, 1994.
[29] K. Chakraborty, K. Mehrotra, C. Mohan, and S. Ranka, “Forecasting
the behavior of multivariate time series using neural networks,” Neural
Networks, vol. 5, pp. 961–970, 1992.

C. L. Philip Chen (S’88–M’88–SM’94) received


the B.S. degree from National Taiwan Institute
of Technology, Taipei, Taiwan, R.O.C., the M.S.
degree from the University of Michigan, Ann Arbor,
in 1985, and the Ph.D. degree from the Purdue
University, West Lafayette, IN, in December 1988,
all in electrical engineering.
In 1988 and 1989, he was a Visiting Assis-
tant Professor at the School of Engineering and
Technology, Purdue University. Since September
1989 he has been with the Computer Science and
Engineering Department, Wright State University, Dayton, OH, where he is
currently an Associate Professor. He is also a Visiting Research Scientist
at the Materials Directorate, Wright Laboratory, Wright-Patterson Air Force
Base, and a National Research Council Senior Research Fellow. He was on
sabbatical leave at Purdue University and Case Western Reserve University
from Fall 1996 to Fall 1997. His current research interests include neural
networks, intelligent systems, CAD/CAM, and robotics.
Dr. Chen is a member of the advisory committee of the International Jour-
nal of Smart Engineering Systems Design. He was a Conference Cochairman
of the International Conference on Artificial Neural Networks in Engineering
in 1995 and 1996; a Tutorial Chairman on International Conference on Neural
Networks in 1994; a Program Committee member of the OAI Neural Networks
Symposium in 1995; the Conference Cochairman of Adaptive Distributed
Parallel Computing in 1996; and a Program Committee member of the IEEE
International Conference on Robotics and Automation in 1996 and CESA in
1998. He actively reviews papers for several IEEE TRANSACTIONS. He is
a member of Eta Kappa Nu. He is the Founding Faculty Advisor of the
IEEE Computer Society Student Chapter at Wright State University and was
a recipient of the 1997 College Research Excellent Faculty Award.

You might also like