The Numerical Solution of Linear Ordinary Differential Equations by Feedforward Neural Networks
The Numerical Solution of Linear Ordinary Differential Equations by Feedforward Neural Networks
l-25, 1994
Copyright@1994 Elsevier Science Ltd
Printed in Great Britain. All rights reserved
0895-7177/9437.00 + 0.00
08957177(94)00078-6
1. INTRODUCTION
The rapidly growing field of connectionism is concerned with parallel, distributed, and adaptive
information processing systems. This includes such tools as genetic learning systems, simulated
annealing systems, associative memories, and fuzzy learning systems. However, the primary tool
of interest is the artificial neural network.
The term Artificial Neural Network (ANN) refers to any of a number of information processing
systems that are more-or-less biologically inspired. Generally speaking, they take the form of
directed-graph type structures [I] w h ose nodes perform simple mathematical operations on the
data to be processed. Information is represented within them by means of the numerical weights
associated with the links between nodes. The mathematical operations performed by these nodes,
the manner of their connectivity to other nodes, and the flow of information through the structure
is patterned after the general structure of biological nervous systems. Much of the terminology
associated with these systems is also biologically inspired; thus, networks are commonly said to
be “trained,” not programmed, and to “learn” rather than model.
ANNs have proven to be versatile tools for accomplishing what could be termed higher order
tasks such as sensor fusion, pattern recognition, classification, and visual processing. All these
Typeset by AM-T@
1
2 A. J. MEADE, JR. AND A. A. FEFLNANDEZ
applications are of great interest to the engineering field, however, present ANN applications have
created the opinion that networks are unsuited for use in tasks requiring accuracy and precision,
such as mathematical modelling and the physical analysis of engineering systems. Certainly the
biological underpinnings of the neural network concept suggest networks would perform best on
tasks at which biological systems excel, and worse or not at all at other tasks.
Contrary to this opinion, the authors believe that continued research into the approximation
capabilities of networks will show that ANNs can be as numerically accurate and predictable
as conventional computational methods. It is also believed that in viewing ANNs as numerical
tools, advances in training and analyses of existing architectures can be made, In a more im-
mediate sense, benefits of this approach will enable the neural network paradigm, with all of its
richness of behavior and adaptivity, to be mated to the more purely computational paradigms of
mathematically oriented programming used by engineers in modelling and analysis.
Presently, the most popular application of ANNs in science and engineering is the emulation
of physical processes by a feedforward artificial neural network (FFANN) architecture using a
training algorithm. Because this paradigm requires exposure to numerous input-output sets, it
can become memory intensive and time consuming. Considerable effort may be saved if the
mathematical model of a physical process could be directly and accurately incorporated into the
FFANN architecture without the need of examples, thereby shortening or eliminating the learning
phase.
As a consequence, our efforts concentrate on developing a general noniterative method in which
the FFANN architecture can be used to model accurately the solution to algebraic and differen-
tial equations, using only the equation of interest and the boundary and/or initial conditions. A
FFANN constructed by this noniterative, and numerically efficient, method would be indistin-
guishable from those trained using conventional techniques.
A number of researchers have approached the problem of approximating the solution of equa-
tions, whether algebraic or differential, in a connectionist context. Most of these attempts have
proceeded from an applications oriented viewpoint, where the solution of the equations is viewed
as a new area in which to apply conventional connectionist paradigms. Specifically, the solution
of the equations is transformed into an optimization problem by defining some objective function
and its associated metric. The problem is then placed in a connectionist structure and solutions
pursued by means of an optimization based training algorithm.
The majority of the work in connectionist equation solving is concerned with the solution of
algebraic equations, such as the solution of an arbitrary linear system of equations. Takeda and
Goodman [2] and Barnard and Casasent [3] use variations on the Hopfield network to solve linear
systems of equations. The linear systems to be solved are expressed as the minimization of a
quadratic function, and the weights of the Hopfield net are trained until the function is minimized.
These algorithms have some interesting features, among them inherent high parallelism with very
simple components and the promise of high speeds. However, they are generally computationally
expensive to implement once the training time is taken into account [2], and they often rely on
specialized hardware such as optical implementation to acquire the high speeds. In addition,
representation of real numerical values can be somewhat involved if a discrete Hopfield model is
used, as in [2].
In contrast, Wang and Mendel [4] use a feedforward architecture to solve linear algebra prob-
lems, with connection structure constrained by the nature of the problem being solved. For a
particular matrix or algebraic system, the constrained network is trained to associate the input
(for example, an input matrix A) with the output (the LU decomposition of A). The solution
is then read from the connection weights of the trained network. The authors successfully train
these networks in problems of LU decomposition, linear equation solving, and singular value
decomposition. The algorithm can be implemented on more standard hardware than those based
on the Hopfield model and is easily parallelizable according to the authors.
Feedforward Neural Networks 3
Lee and Kang [5] h ave solved first order differential equations using the optimization network
paradigm. The differential equation is first discretized using finite difference techniques, and the
resulting algebraic equations are transformed into an energy function to be minimized by the
net. Unlike the energy functions arising from the formulation of the linear algebraic problem
of [2] and [3], the energy functions to be minimized by the net in the solution of a general partial
differential equation are not guaranteed to be quadratic in nature. Thus, Lee and Kang use
modified Hopfield nets able to minimize arbitrary energy functions. They then formulate the
energy quantity to be minimized for a general order nonlinear partial differential equation on a
unit hypercube, but only the solutions to first order linear differential equations are presented.
The results are quite good, although the solutions produced show a tendency to oscillate about
the exact solution in some fairly simple cases. Lee and Kang suggest an optical implementation
for more complex problems.
The approach of this paper is markedly different from the previously cited works. Our efforts
concentrate on developing a general, numerically efficient, noniterative method in which the
FFANN architecture can be used to model accurately the solution to algebraic and differential
equations, using only the equation of interest and the boundary and/or initial conditions. A
FFANN constructed by this noniterative method would be indistinguishable from those trained
using conventional techniques.
Progress in the noniterative approach should also be of interest to the connectionist community,
since the approach may be used as a relatively straightforward method to study the influence
of the various parameters governing the FFANN. When taken from the viewpoint of network
learning, applying an FFANN to the solution of a differential equation can effectively uncouple
the influences of the quality of data samples, network architecture, and transfer functions on
the network approximation performance. Insights provided by this uncoupling may help in the
development of faster and more accurate learning techniques [6]. We have borrowed a technique
from applied mathematics, known as the method of weighted residuals (MWR), and shown how
it can be made to operate directly on the net architecture. The method of weighted residuals is
a generalized method for approximating functions, usually from given differential equations, and
will be covered in detail in Section 4 of this paper.
The development of the noniterative approach to the determination of the FFANN weights can
be likened to the development of a new numerical technique. When evaluating the capabilities of
a new numerical method for science and engineering, it is prudent to apply it first to the solution
of simple equations of practical interest and known behavior. This same approach will be used
in our evaluation of the noniterative method for FFANNs. A simple feedforward network using
a single input and output neuron with a single hidden layer of processing elements utilizing the
hard limit transfer function is constructed to accurately approximate the solution to a first and
second order linear ordinary differential equation. It will also be demonstrated how the error of
the constructed FFANNs can be predicted and controlled.
2. FUNCTION APPROXIMATION
Function approximation theory, which deals with the problem of approximating or interpolating
a continuous, multivariate function, is an approach that has been considered by other researchers
[‘i-9] to explain supervised learning and ANN behavior in general. There are four sets of pa-
rameters that influence the approximation performance of the simple FFANN architecture of
Figure 1:
(1) the input weights,
(2) the bias weights,
(3) the transfer functions, and
(4) the output weights. In an effort to provide guidelines in the interpretation of the mathe-
matical roles of each set of parameters, function approximation theory is investigated.
A. J. MEADE, JR. AND A. A. FERNANDEZ
Input node
Output node
Hidden Nodes
The most common method of function approximation is through a functional basis expansion.
A functional basis expansion represents an arbitrary function y(x) as a weighted combination of
linearly independent basis functions (T) that are functions of the variable x,
Y(X)= 5 TQ(“)CQI
q=l
where T is the total number of basis functions and cq are the basis coefficients. This is analogous
to representing arbitrary multidimensional vectors by a weighted sum of linearly independent
basis vectors. A common example of a functional basis expansion is the Fourier series, which
uses trigonometric basis functions. Other basis functions can be used as well; the values of the
coefficients C~ will, of course, vary depending on the basis functions selected.
It is important to note that the basis expansion can also be viewed as an interpolation scheme,
where the nature of the interpolation depends on the kind of basis function used. In fact, func-
tion approximation literature often uses the terms “basis functions” and “interpolation functions”
interchangeably, with the former sometimes preferred by mathematicians and the latter by engi-
neers. Therefore, a basis employing linear polynomials can be said to exhibit a linear interpolation
of the approximated function.
Examination of equation (1) reveals it to be a scalar product of the basis functions and ex-
pansion coefficients. Note that the equation makes no mention of the dependence of the basis
functions on any parameter other than the independent variable x. Similarly, the mathematical
operations performed by the simple FFANN of Figure 1 can be interpreted as a scalar product
where the transfer functions act as basis functions and the output weights act as basis expansion
coefficients. This insight is behind many of the attempts to prove that finite sums of transfer
functions, such as sigmoids, can be used to approximate arbitrary continuous functions [lO,ll].
The link between a FFANN and functional basis expansion assigns a mathematical role both
to the output weights and to the transfer functions, two of the parameter sets mentioned earlier
as affecting the performance of the network.
FeedforwardNeural Networks 5
Let us evaluate the potential usefulness of network transfer functions as basis functions. For the
remainder of this paper, we will use the hard limit of Figure 2 as the transfer function. We have
based this choice on the simplicity of the function and the ease by which it can be implemented
in software and hardware [12].
Prenter [14] defines a good polynomial basis 8s one which allows for a unique solution to the
expansion coefficient problem
where the N points used to satisfy the function to be approximated are commonly termed knots.
A good polynomial basis should also provide smooth interpolation of the approximated function
between the knots. A particular family of polynomials that provides these characteristics is the
family of Lagrange polynomials.
Prenter examines the behavior of global Lagrange polynomials and discusses certain weaknesses
shared by all global basis functions:
(1) The algebraic systems they generate to solve for the expansion coefficients often suffer
from linear dependence problems as N increases (ill conditioning).
(2) They often suffer from poor approximation (so-called “snaking”) between knots, particu-
larly if the function to be approximated has high gradients, noise, or discontinuities.
Because of the disadvantages of global basis functions, piecewise polynomial splines based on
the Lagrange polynomials are often used. Splines are polynomial curves that extend over a
limited number of knots and which together provide varying degrees of interpolation between
their knots. The most common polynomial spline is the Chapeau or “hat” function as shown
in Figure 3. This is sometimes referred to in the literature as a first-order Lagrange polynomial
(also known as a first order Lagrangian interpolating function or shape function). Its formula is:
4
1 --
Figure 4 shows, for a problem domain [xi, ZN] discretized with N = 3 knots (xi, xz, and xs),
how the hat functions must be distributed to provide a linear interpolation. Notice that for any
value of x in the domain, the sum of all hat functions will equal a value of 1. This means that the
hat functions distributed in this manner will represent a constant accurately which is the first,
and most important, test of a good interpolation scheme. An additional benefit of the distribution
Feedforward Neural Networks
Figure 4. Proper distribution of hat functions along the problem domain (11 5
2 I x3).
with respect to computational efficiency is that the hat functions are linearly independent. This
will be of importance in the determination of the basis expansion coefficients.
The distribution of the hat functions is justified also on purely mathematical grounds. Con-
sidering that the hat functions are being used as a functional basis, and that the function being
represented has been discretized at N knots, it is clear there are N degrees of freedom for the
function being represented. This implies N dimensions in the function subspace being used to
represent the function, necessitating N hat functions in the domain, one centered at each knot.
This particular polynomial spline does not suffer from the common maladies of the global
Lagrange polynomials. Namely:
(1) The algebraic system generated to evaluate the expansion coefficients is very sparse and
well-conditioned. In this sense, a sparse system means that less than the total number (N)
of unknowns appear in each of the N algebraic equations. This is because the splines exist
only locally in the problem domain and so only a limited number of them are involved in
representing the function at any one value of x.
(2) The interpolation between knots is linear and well-behaved (no snaking) and can easily
handle high gradients or noise.
(3) Since the maximum value of the spline is 1, the values of the expansion coefficients are
identical to the values of the approximated function at the knots.
One drawback, however, is that the approximation is discontinuous in the first derivative. In the
language of functional analysis, this spline is found in C’[a, b], a subspace of Lz[a, b].
Using the notation of Figure 1, where CQ represents the input weight and 8, represents bias
weight, the output (oq) of a specific processing element q, can be written in equation form
While crq and eq are constants, 5, is a variable that is linearly dependent on the input x. By
inspection of equation (5), the input weight oq appears to act as a scaling coefficient for x.
Alternately, the combination of the input weight and bias can be viewed as transforming the
input variable to a new space cq, where tq is specific to the qth processing element. This linear
transformation of x into cq is similar to the unscaled shifted rotations of [ll].
Define a point on the x-axis xq as the “center” of the transfer function Tq such that
aqxq + 0, = 0 or eq = -aqzq.
From this equation, it is seen that the bias weight allows the origin of each transfer function
(Q = 0) to be set in the independent variable space x. The input and bias weights allow each
transfer function to be scaled differently and centered at different locations in the independent
variable space x (Figure 5).
3.2. Transforming the Transfer Functions of the Neural Network into Splines
From a purely function approximation perspective, the aptitude of the hard limit as a basis
function is relatively low. Since it is a global polynomial, it would be expected to produce a full
system of algebraic equations for the solution of the basis coefficients and would be sensitive to
noise. In addition, considering that the hard limit is zero at its center and has a nonzero value
throughout the remainder of the problem domain, it is expected that the basis coefficients that
will be generated will not have a direct relation to the values of the function being approximated.
This property is often termed as nonlocal representation in the connectionist literature [15].
Nonlocal representation is a condition that is often viewed favorably in the connectionist lit-
erature, as indicating “redundancy,” and is sometimes discussed as a characteristic unique to
networks. In fact, it is a common feature of any functional basis expansion utilizing global basis
functions, and its net effect is often negative as it couples the values of all the basis coefficients
in the approximation of a function, making the system more difficult to solve and complicating
the imposition of boundary conditions.
A considerable number of advantages, as discussed in Section 2.2, may be obtained if the hard
limit is transformed into the hat function. Consider a one-dimensional domain as in Figure 6,
with two hard limit functions centered in neighboring intervals as shown. If the function centered
between xi and xi+1 is multiplied by -1 and added to the second unaltered function (Figure 7),
it is clear by inspection that the result will be a hat function with a maximum value of 2.
Feedforward Neural Networks 9
Additionally, if the hard limits in the figure are scaled between -0.5 and +0.5 by suitable output
weights, the resulting hat function is indistinguishable from that used in the function approxi-
mation literature. The transformation of hard limits to hat functions can be described by the
following equation:
Y; - TB = 2@i, (i = 1,2,. . . ) N),
where the superscripts of the hard limits, A and B, refer to adjoining intervals xi-1 < x 5 zi
and zi 5 z < zi+r, respectively. The functions ‘If’: and ‘Yf are defined as zero at the midpoints
of their respective intervals. A consequence of this formulation is that the number of hard limits
can be linked to the number of hat functions desired. We will require twice as many hard limits
as hat functions (T = 2N).
4
c----
1.0 --
/
/
/
I
I b
xi+1 x
-1.0 -- -_---
Figure 6. Constrained distribution of hard limits along z axis using the input and
bias weights.
4
l_O-_ _--__-__-_
\
\
\
\
I
\ I b
‘,x;i+l X
\
-1.0 -- L m--w
Figure 7. Right side hard limit (dashed) flipped by multiplying by negative output
weight. Addition of the two hard limits creates a hat function. Refer to Figure 3.
Let us now derive the remaining specific constraints required to equate the hard limit function
representation (equation (1)) to one using the hat functions
T N
where the hat functions @i are defined as having a value of 1 at the knot xi, as per equation (4)
and wi are the basis coefficients associated with the hat functions,
Let us discretize the problem domain using the knots xi in the following manner:
We will label this discretization as the problem mesh. To generate the appropriate hat functions
at the boundaries, two auxiliary knots xc and zN+r are required such that 20 < a and zN+r > b.
10 A. J. MEADE, JR. AND A. A. FERNANDEZ
where the summations of the right-hand side can be rewritten without loss of generality as
q=l
2N N
YqCq = c YfWi, (9)
c
q=N+l i=l
therefore,
where
which is the basis expansion for ya(x) in terms of hat functions (equation (7)).
It is relatively straightforward to derive actual formulae for the input, bias, and output weights.
From equations (9) and (13), the output weights are given by
Wi
ci =ui=-, (i=1,2 )...( N),
2
-wj
Ci+N
=‘ui=- (i = 1,2,. . .,N), (14)
2 ’
Feedforward Neural Networks 11
where the numbering of the weights with respect to the actual net architecture can be in any
order. The result is simple. The ith pair of the T = 2N output weights in the FFANN must be
set to the values ui and zli, that is, f and -3 respectively, of the 2‘th basis expansion coefficient wi
for (i = 1,2,. . . , IV).
To derive the input and bias weight formulae, refer again to Figures 3, 5-7 and also to equations
(4), (ll), and (12). It is clear from Figure 6 that the hard limits must be placed in the problem
space in such a manner that their linear behavior (<f( z ) and t?(z) respectively) occurs across
the appropriate interval. Therefore, by inspection,
2(X - Pi-i)
J?(z) = (zi _ zi.& - l, Xi-_1 5 2 I zi,
2(2 - z:i)
GYz) = @+i _ zi) - l, zi < z I G+1,
i = 1,2, . . . , N. (15)
Since J:(z) = CY~X+ 0: and c?(x) = $2 + Of, we can calculate the value of the input and
bias weights using equation (15). Therefore,
4 = (xi ef = - (xi2;i;;_l)
_2xi_1) 7 - l, (16)
and
(1) Two hard limits can be added to form a hat function spline with a maximum value of two.
With appropriate scaling, the canonical hat function can be formed.
(2) By using T = 2N hard limits, where N is the number of hat functions and knots in
the problem domain [a, b], it is shown that by appropriately constraining the input, bias,
and output weights, the feedforward network can be made to form a hat function based
functional basis expansion everywhere in the problem domain.
(3) The input and bias weights can be computed from equations (16) and (17) given the
problem mesh.
(4) The values of the expansion coefficients wi can be evaluated using the hat function basis Cp.
This eliminates all the problems associated with using the hard limits as basis functions
directly, while preserving the architecture of the net. From equation (14), the values
of the coefficients wi are used to calculate the output weights C~ that will complete the
construction of the feedforward network.
Figure 8 shows the resulting architecture for six processing elements used to form three hat
functions for three knots plus two auxiliary knots, as in Figure 4. Figure 9 shows an alternate
network architecture that can be constructed using two hidden layers, where the second hidden
layer has linear transfer functions. Although the number of processing elements T is almost
the same for this architecture (T = 2N + l), only N + 1 of the processing elements have hard
limit transfer functions associated with them. Note also the connections between the first and
second hidden layer are local and repeating. The economy on the use of hard limits and the local
connections between the first and second hidden layer should make this architecture particularly
attractive for hardware implementation. For the remainder of this paper, we will refer to the
first architecture, with the understanding that the transformation to the second architecture is
trivial.
12 A. J. MEADE. JR. AND A. A. FERNANDEZ
The implication of Section 2 was that estimation of the weights of a FFANN (“training”) could
be viewed as the equivalent problem of finding a suitable basis expansion for the relationship to
be modelled by the net. This section adds the additional consideration that for suitable transfer
functions and constraints on the input and bias weights, the basis expansion being sought can be
a classical expansion in polynomial splines. Thus, training in the traditional sense is eliminated
in this approach. The construction of the network has been reduced to a question of finding
the appropriate basis expansion coefficients wi and then setting the values of the pairs of output
weights ui and vi from equation (14). The values of the input and bias weights are fixed for a
FeedforwardNeural Networks 13
given number of knots in the problem domain and are effectively uncoupled from the problem of
finding the output weights.
The computation of the expansion coefficients wi can be made from a variety of numerical
methods. As mentioned in Section 1, we will use the method of weighted residuals [16] to
determine the output weight values.
into the time dependent governing differential equation L(y) = g(s,t), results in the equation
L(Y,) - g(s,t) = R(w1,. . .7 wN, s, t), where R is some function of nonzero value that can be
described as the error or the differential equation residual. In addition, if equation (18) is subject
to the initial and boundary conditions of the governing differential equation, we may write
In principle, basis coefficients wi(t) can be found so that R becomes small in terms of some norm
over the problem domain V as N + co.
Let us require that ya satisfy the initial and boundary conditions exactly so that RI = Rg = 0.
The basis coefficients wi(t) that satisfy the differential equation can be determined by requiring
that the equation residual R be multiplied or weighed by a function f(s) (often called the weight
function or test function), integrated over the problem domain 2) and set to zero,
Iv
fRdD=((f,R)=O, (19)
where (f, R) is the inner product of functions f and R. It is from equation (19) that the method
of weighted residuals derives its name. It may be noted that equation (19) is closely related to
the weak form of the governing equation
This relation to the weak form has the benefit of allowing discontinuities in the exact solu-
tion [17], which is particularly advantageous when approximating the solution to problems with
large gradients or discontinuities.
Since linearly independent relationships are needed to solve for the basis coefficients of equa-
tion (19), it is clear that f must be made up of linearly independent functions fk. By letting
k = l,... , N, a system of N equations for the basis coefficients is generated. For a time depen-
dent case, a system of ordinary differential equations in t result. For the steady-state case, the
basis coefficients are constants and a system of algebraic equations is generated.
14 A. J. MEADE, JR. AND A. A. FERNANDEZ
Different choices to the weight function fk give rise to different computational methods that
are subclasses of MWR. Some of these methods are:
(1) The subdomain method. The computational domain is discretized into N subdomains Dk,
which may overlap where
The subdomain method is identical to the finite volume method when evaluating equa
tion (19). With the subdomain method, equations (18) and (20) provide the appropriate
framework for enforcing conservation properties in the governing equation, both locally
and globally.
(2) The collocation method. The weight functions are set to
fk = ‘@ - Sk), fork= l,...,N,
where 6 is the Dirac delta function. Substitution of this relation into equation (19) gives
Since the finite difference method requires the solution of the differential equation only
at nodal points, one can interpret the finite difference method as a collocation method
without the use of an approximate solution ya.
(3) The method of moments. The weight functions are chosen from a set of linearly indepen-
dent functions such that successively higher moments of the equation residual are required
to be zero. For a one-dimensional problem, we have
J
b
xkR dx = 0, fork=O,...,N-1.
a
where wk are the basis coefficients. The application of fk to equation (19) is identical to
finding the minimum to the square of the residual summed over the problem domain i.e.,
a
-/R2m=o.
awk 2)
(5) The generalized Galerkin method. The weight functions are set to
fk = @k(S), fork=l,...,N,
where gk(s) are analytic functions similar to the bases, but modified with additional
terms to satisfy the boundary and/or initial conditions. The finite element and spectral
methods can be considered subclasses of the generalized Galerkin method. Galerkin based
methods are considered particularly accurate if basis functions are the first N members of a
complete set, since equation (19) indicates that the residual is orthogonal to every member
of the complete set. Consequently, as N tends to infinity, the approximate solution ya will
converge to the exact solution y.
Feedforward Neural Networks 15
It should also be noted that MWR is not limited to cases where initial and boundary conditions
are satisfied exactly (RI = RB = 0). If we require that only the differential equation be satisfied
exactly (R = 0), we may use MWR to derive boundary methods such as the panel and boundary
element methods. The various subclasses of the method of weighted residuals are discussed and
compared at great length by Finlayson [16] and Fletcher [18].
By approaching the evaluation of the output weights through the method of weighted residuals,
we have access to both theoretical and computational results from the most commonly used
techniques in the fields of scientific and engineering computing. Each method has advantages
and disadvantages depending on the particular application. For this paper, we will make use
of the generalized Galerkin technique following the observation made by Fletcher that “. . . the
Galerkin method produces results of consistently high accuracy and has a breadth of application
as wide as any method of weighted residuals” 118, p. 381.
Specifically, 6(z) will be identical to the basis functions used to describe the approximation ya.
This is known as the Bubnov-Galerkin technique [i8]:
With f(z) so defined, evaluation of the integrals leads to a system of algebraic equations that
is usually linear if the differential equation to be approximated is linear. Otherwise, the algebraic
equations are nonlinear, though still tractable. These equations can be evaluated for a unique
solution through standard techniques [19], a11 owing the evaluation of both linear and nonlinear
differential equations.
5. MODEL PROBLEMS
The most logical and practical model problem in which to first test the accuracy and conver-
gence properties of the network methodology is a first order linear ordinary differential equation
with its associated initial condition
&
-- y(x = 0) = 1, OIz<l.
dx
Y = 0, (22)
This equation was also used as an example by [5]. A simple feedforward network is constructed
to approximate the nontrivial solution to equation (22). We select:
(1) A single input processing element using a linear transfer function for the independent
variable x.
(2) A single output processing element using a linear transfer function for the dependent
variable approximation ya.
(3) A number of hidden layer processing elements using the hard limit transfer function.
(4) A single bias node connected to each of the hidden layer processing elements.
Applying the results of the discussion in Section 3.3, we select the number of knots N to dis-
tribute in the problem domain and determine the values of the input and bias weight sets of, c$,
et, and OF, respectively. Using hat basis functions @, we can now represent the approximation
of the dependent function,
ya = 2 Q&r):). (23)
i==l
What remains to complete the network are the values of the output weights and, consequently,
the values of the coefficients wi.
Substituting equation (23) into the model equation results in
4/a
-- ya = R. (24)
dx
16 A. J. MEADE, JR. AND A. A. FERNANDEZ
(26)
~(mk,~)wi=~(~k,~i)wi,
fork=l,...,N.
i=l i=l
Evaluating the inner products, we arrive at the following system of linear algebraic equations:
where
d$
Mki = @k,- - @pi and b,, = 0, fork=l,...,N.
dx >
An initial condition is required for a nontrivial solution to the model equation (22). Using our
approximation of ya
N
i=l
which yields the equation
Wl = y(0) = 1.
This equation can be incorporated into the coefficient matrix M and the vector b of equation (27)
and, therefore,
Solution of the modified algebraic system provides the basis expansion coefficients wi for the
differential equation being approximated.
It was mentioned earlier that the type of coefficient matrix created by the weighted residual
method (or specifically the Bubnov-Galerkin method) depended on the type of basis functions
employed. Analysis of the integral forming the coefficient matrix M of equation (27) shows that
@k,z-@i ~0,
Mki =
> for i < k - 1 and i>k+l.
This is due to the local nature of the hat functions as well as their distribution in the problem
domain and linear independence. The original matrix M of equation (27) is thus positive definite
and tridiagonal. Modification of the matrix by incorporation of the initial condition does not alter
these advantages. As a result, the modified algebraic system can be solved in O(N) operations
by the Thomas algorithm [20]. With the expansion coefficients known, the output weights of the
neural network can be easily computed by equation (14).
The determination of the output weights completes the specification of the parameters for the
feedforward network that models equation (22). The accuracy of the output can be controlled by
increasing the number of knots in the problem domain, thereby creating additional hat functions.
This then translates into additional neurons.
Feedforward Neural Networks 17
5.2. Results
The exact solution of equation (22) with the boundary condition y(O) = 1 is
y = e”.
The feedforward network was constructed using 12 processing elements with a single hidden layer,
corresponding to an even spacing of six hat functions from six knots within the problem domain
and two auxiliary knots. Figure 10 compares the output of the constructed feedforward neural
network (denoted by triangles) with the exact solution (solid line) at 21 equally spaced sample
points.
2.6
Ya
2.2
1.8
1.4
1
0 0.2 0.4 0.6 0.8 1
X
Figure 10. Comparison of exact solution and network output (A) at S = 21 sample
points with 2’ = 12 processing elements. RMS = 5.60 x 10m5.
The RMS error of the network is 5.60 x lo-‘, where the RMS error is defined as
w RMS = 7
and where S is the number of samples for the evaluation of the error. The lack of snaking of
the results about the exact solution can be attributed to the constraints on the input and bias
weights to guarantee linear interpolation, and to the use of the Bubnov-Galerkin method that
minimizes the error across the problem domain and not just at discrete points.
The input, bias, and output weights for the network were computed via a conventional Fortran
code. All computations were done in double precision. The IMSL subroutine DQDAG [21] was
used for numerical integration of the coefficient matrix components and the Thomas algorithm was
used for the solution of the tridiagonal linear system of algebraic equations. The program ran on
a standard SUN Microsystems Spare 2 workstation with a run time of approximately 1 second.
When optimized for speed, similar programs used in computational physics and engineering
applications can run even faster for this problem size on comparable hardware.
5.3. Convergence
The error bounds derived for the Bubnov-Galerkin method lead us to expect that the Lz norm
of the error will be bounded by a quadratic term in the spacing of the knots (grid spacing) [22].
For uniform grid spacing
So, for the error E = y - ya, and with the Ls norm defined as
we have
IIEII I Ch2, where C is a constant. (29)
In practice, the discrete L2 norm of the error is often used to avoid having to perform the actual
integration in equation (28):
where the integral has been approximated by a scaled summation; integration schemes of higher
accuracy could have been used if deemed necessary. The similarity in the definition of the
discrete Lz error and the RMS error, when the network is sampled at the knots (S = N), allows
the following relation to be established:
llEllL2
w ItMS = m = /-
y IIEIIL2~
Substitution of this relation into equation (28) would yield an expected convergence behavior in
the RMS norm for the network; it should be slightly greater than quadratic. The L2 norm is used
here due to its wide use in function approximation and the simpler expression of equation (30)
in terms of the La norm.
If we take the log of IIEII,then
Considering the form of equation (30), it would be reasonable to expect that a log plot of the L2
norm of the error versus the log of the grid spacing would yield a straight line with slope of 2.0.
A suitable convergence plot of this type is shown in Figure 11. The size of the mesh spacing
was halved starting at h = 0.2 to a value of h = 0.00625, with the values marked by circles in
the figure. The line is found to have an actual slope closer to 2.03, showing slightly greater than
expected convergence. This is not completely unusual for some problems, although a slope of 2.0
is the most that can normally be expected from Bubnov-Galerkin utilizing a hat function basis on
an arbitrary problem, barring any attempts to force superconvergence at specific points through
special techniques outside the scope of this investigation.
lo-'
10-1
log h “-I
A slightly more challenging problem of practical interest is the eigenvalue problem [23]. The
eigenvalue problem, familiar to workers in the dynamic systems analysis field, is described by the
following second order linear ordinary differential equation with associated boundary conditions
The solutions y(Xj) corresponding to the eigenvalues are known as the eigenfunctions. An ad-
ditional condition we impose on the eigenfunction is that they be orthonormalized, which is
described using the inner product
The eigenvalue problem is a good test of the noniterative method, not only because of the higher
order of differentiation, but also because its homogeneous boundary conditions can easily force a
trivial solution ( ya = 0).
Application of the Bubnov-Galerkin method to the problem produces
Note that since the basis functions being used are linear, the second derivative is uniformly zero.
Therefore, we must reduce the order of the derivatives using integration by parts. We can then
rewrite equation (34) as
- xj(fk,?h) =.fk$f ’ 0
7
for k = l,...,N.
Note that the use of integration by parts brings in boundary derivative information (Neumann
conditions). In this example, the value of the dependent variable is known at the boundaries
(Dirichlet conditions). In using hat functions, the boundary derivative term is nonzero only for
k = 1 and N.
Unlike Example 1, the approximate solution yr of the eigenvalue problem requires a basis ex-
pansion that uses the independent variable x explicitly. This is because the problem with its
associated boundary conditions is such that the substitution of a simple expansion like equa-
tion (23) will lead to a zero known vector (b) and yield only a trivial solution for a nonsingular
matrix [19].
The following expansion is used for ya:
where
Wi = Cl(Zi - Tj,). (37)
Since xi are the locations of the knots and @piact as linear interpolation functions
N
c
i=l
QiXj = 2.
20 A. J. MEADE, JR. AND A. A. FERNANDEZ
(33)
where Cl is some scaling coefficient to be determined when the approximate solution is ortho-
normalized as in equation (33).
Note that with this expansion
and, therefore,
71 = 0 = br and ?-N = 1 = bN.
This has the effect of creating a nonzero vector b. As a result, a nonsingular coefficient matrix
will yield a unique, nontrivial solution.
Applying the expansions to equation (35), we obtain the following linear system:
bk = 0, fork=2,...,N-1, (41)
with
M NN=l and MN( = 0, fori=1,2,...,N-1 and brJ = ?-1\1
= 1.
1
1
or
J[
0
C15
i=l
@i~i . Cl e@iUi
i=l
dx = 1, (42)
(43)
But,
@i(G) = 17 for r = i. (45)
Thus, solving for Cl, we find that
With ri and Cr known, then wi can be determined. Computing the network output weights
from wi completes the construction of the network.
FeedforwardNeural Networks 21
Results
Equation (31), with its associated boundary and orthonormality conditions, has the following
exact solution:
y(5) = Jzsin(j7rz).
Figures 12-17 compare the sampled outputs of generated feedforward neural networks (denoted
by triangles) with the exact eigenfunctions (solid line) for integers j = 1 to j = 6. The number of
sampled points vary from 15 to 45. The size of the networks in terms of the number of processing
elements T was kept as low as possible while keeping (E)n~s 2 0.01. For instance, for j = 1,
the RMS error was 7.23 x 10V5 with T = 24, while for j = 6, the RMS error was 9.84 x 10e3
with T = 90.
The program ran on a standard SUN Microsystems Spare 2 workstation with a run time of
approximately 1 minute for the largest case of j = 6. Again, this run time could be significantly
shortened with properly optimized code.
Figure 14. Comparison of exact eigenfunctions and network output (A) for j = 3
with T = 44, S = 35, and RMS = 2.70 x 10m3.
-0.
-1.
0 0.2 0.4 0.6 0.8 1
x
Figure 15. Comparison of exact eigenfunctions and network output (A) for j = 4
with T = 58, S = 35, and RMS = 5.00 x 10m3.
Convergence Plots
Following the same arguments as for Example 1, we would again expect a logarithmic conver-
gence plot of the error versus the grid spacing to be quadratic in the LZ norm. As Figure 18
shows, this is indeed the case. The slope of the line is 1.945, or approximately 2.0. The slightly
subquadratic convergence rate is due to the inaccuracies introduced by the process of orthnor-
malizing the solution. The convergence plot shown was produced for an eigennumber j = 1 and
for a sequence of mesh spacings between h = 0.1 and h = 0.00625. As before, the values are
marked with circles on the convergence plot.
6. CONCLUSION
In an effort to both develop more sophisticated engineering analysis software and enhance
understanding of connectionist systems, the popular feedforward artificial neural network was
applied to the solution of linear ordinary differential equations. Observing that even the most
basic FFANN architecture involved a large number of interacting parameters of uncertain effects,
an effort was made to impose constraints on the network system by attempting to assign specific
roles to the various parameters.
An analogy was made between supervised learning and function approximation. Following
this analogy, concepts from function approximation theory were brought to bear on the problem
of parameter determination in the net. It was observed that a clear analogy could be made
Feedforward Neural Networks 23
Figure 16. Comparison of exact eigenfunctions and network output (A) for j = 5
with T = 70, S = 35, and RMS = 8.93 x 10e3.
-1.5
0 0.2 0.4 0.6 0.8 1
X
Figure 17. Comparison of exact eigenfunctions and network output (A) for j = 6
with T = 90,S = 45, and RMS = 9.84 x 10-3.
I . .‘...I . . . . ..*-
lo-’ 10-a
109 h lo-’ loo
between the basis functions of approximation theory and the transfer functions of the network.
The output weights of the network were viewed as basis expansion coefficients, and the input and
bias weights were seen as controlling the size and location of the interpolation functions in the
problem space.
24 A. J. MEADE, JR. AND A. A. FEFLNANDEZ
Further analysis revealed that hard-limit transfer functions could be easily and economically
employed to represent the first order spline basis function, more commonly known as the hat
function in the function approximation literature. This allows nets to be viewed as straightforward
functional basis expansions for the relationships they are to model, using commonplace basis
functions.
The method of weighted residuals, and one of its variations, the Bubnov-Galerkin method, were
introduced as mathematical algorithms to determine the values of basis expansion coefficients
when approximating the solution to differential equations.
The analogies and equivalences thus uncovered allowed explicit formulae for the input and bias
weights to be formulated. Additionally, they revealed how to transform the results of applying
the Bubnov-Galerkin method into the output weights of a neural network.
Example problems were approached in the following manner: linear ordinary differential equa-
tions were solved using the Bubnov-Galerkin method with hat basis functions and the results used
to construct neural networks. Results of the output of the networks were shown to demonstrate
the accuracy of the approximation.
Thus, it is possible to construct directly and noniteratively a feedforward neural network to
approximate arbitrary linear ordinary differential equations. The methods used are all linear
(O(N)) in storage and processing time. The Lz norm of the network approximation error de-
creases quadratically with the increasing number of hidden layer neurons. The construction
requires imposing certain constraints on the values of the input, bias, and output weights, and
the attribution of certain roles to each of these parameters.
All results presented used the hard limit transfer function. However, the noniterative approach
should also be applicable to the use of hyperbolic tangents, sigmoids, and radial basis functions.
REFERENCES
1. J. Freeman and D. Skapura, Neural Networks: Algorithms, Applications, and Programming Techniques,
Addison-Wesley, New York, (1991).
2. M. Takeda and J. Goodman, Neural networks for computation: Number representation and programming
complexity, Applied Optics 25 (18), 3033 (1986).
3. E. Barnard and D. Casasent, New optical neural system architectures and applications, Optical Computing 88
963, 537 (1988).
4. L. Wang and J.M. Mendel, Structured trainable networks for matrix algebra, In Proceedings of IEEE
International Joint Conference on Neural Networks, Vol. 2 p. 125, San Diego, (June 1990).
5. H. Lee and I. Kang, Neural algorithms for solving differential equations, Journal of Computational Physics
91, 110 (1990).
6. A.J. Meade, Jr., An application of artificial neural networks to experimental data approximation, AIAA-93-
0408, AIAA Aerospace Sciences Meeting, Reno, NV, (January 1993).
Feedforward Neural Networks 25
7. S. Omohundro, Efficient algorithms with neural network behaviour, Complex Systems 1, 237 (1987).
8. T. Poggio and F. Girosi, A theory for networks for approximation and learning, A.I. Memo No. 1140,
Artificial Intelligence Laboratory, Massachusetts Institute of Technology, (July 1989).
9. F. Girosi and T. Poggio, Networks for learning: A view from the theory of approximation of functions,
In Proceedings of The Genoa Summer School on Neuml Networks and Their Applications, Prentice-Hall,
(1989).
10. G. Cybenko, Approximation by superposition of a sigmoidal function, Math. Control Signals Systems 2,
303 (1989).
11. Y. Ito, Approximation of functions on a compact set by finite sums of a sigmoid function without scaling,
Neuml Networks 4, 817 (1991).
12. L.O. Chuo, CA. Desoer and E.S. Kuh, Linear and Nonlinear Circuits, McGraw-Hill, New York, (1987).
13. P.J. Davis, Interpolation and Approximation, Blaisdell, New York, (1963).
14. P.M. Prenter, Splines and Variational Methods, Wiley, New York, (1989).
15. A. Maren, C. Harston and R. Pap, Handbook of Neural Computing Applications, Academic Press, New
York, (1990).
16. B.A. Finlayson, The Method of Weighted Residuals and Variational Principles, Academic Press, New York,
(1972).
17. P. Lax and B. Wendroff, Systems of conservation laws, Comm. Pure and Applied Mathematics 13, 217
(1960).
18. C.A.J. Fletcher, Computational Galerkin Methods, Springer-Verlag, New York, (1984).
19. G. Strang, Linear Algebra and Its Applications, Second Edition, Academic Press, New York, (1980).
20. D.A. Anderson, J.C. Tannehill and R.H. Pletcher, Computational Fluid Mechanics and Heat IPransfer,
Hemisphere Publishing Corporation, New York, (1984).
21. IMSL User’s Manual, MATH/LIBRARY, Version 2, 1991.
22. C. Johnson, Numerical Solution of Partial Differential Equations by the Finite Element Method, Cambridge
University Press, Cambridge, (1990).
23. D. Trim, Applied Partial Diflerential Equations, PWS-KENT, Boston, (1990).