0% found this document useful (0 votes)

15 views25 pages

The Numerical Solution of Linear Ordinary Differential Equations by Feedforward Neural Networks

This document discusses a method for numerically solving linear ordinary differential equations using feedforward neural networks (FFANNs) without iterative training. The authors demonstrate that by imposing specific constraints on the network's parameters, a noniterative approach can yield accurate approximations of solutions, with error decreasing quadratically as the number of hidden neurons increases. The paper also explores the theoretical underpinnings of function approximation and the role of transfer functions in enhancing the network's performance.

Uploaded by

gozen17

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views25 pages

The Numerical Solution of Linear Ordinary Differential Equations by Feedforward Neural Networks

Uploaded by

gozen17

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Muthl. Comput. Modelling Vol. 19, No. 12, pp.

l-25, 1994
Copyright@1994 Elsevier Science Ltd
Printed in Great Britain. All rights reserved
0895-7177/9437.00 + 0.00
08957177(94)00078-6

The Numerical Solution of Linear Ordinary

Differential Equations by Feedforward
Neural Networks
A. J. MEADE, JR.* AND A. A. FERNANDEZ
Department of Mechanical Engineering and Materials Science, Rice University
Houston, TX, 77251-1892, U.S.A.
meadeQrice.edu

(Received and accepted April 1994)

Abstract-It is demonstrated, through theory and examples, how it is possible to construct

directly and noniteratively a feedforward neural network ‘co approximate arbitrary linear ordinary
differential equations. The method, using the hard limit transfer function, is linear in storage and
processing time, and the 152 norm of the network approximation error decreases quadratically with the
increasing number of hidden layer neurons. The construction requires imposing certain constraints
on the values of the input, bias, and output weights, and the attribution of certain roles to each of
these parameters.
All results presented used the hard limit transfer function. However, the noniterative approach
should also be applicable to the use of hyperbolic tangents, sigmoids, and radial basis functions.

Keywords-Artificial neural networks, Neural computation, Differential equations, Basis func-

tions.

1. INTRODUCTION
The rapidly growing field of connectionism is concerned with parallel, distributed, and adaptive
information processing systems. This includes such tools as genetic learning systems, simulated
annealing systems, associative memories, and fuzzy learning systems. However, the primary tool
of interest is the artificial neural network.
The term Artificial Neural Network (ANN) refers to any of a number of information processing
systems that are more-or-less biologically inspired. Generally speaking, they take the form of
directed-graph type structures [I] w h ose nodes perform simple mathematical operations on the
data to be processed. Information is represented within them by means of the numerical weights
associated with the links between nodes. The mathematical operations performed by these nodes,
the manner of their connectivity to other nodes, and the flow of information through the structure
is patterned after the general structure of biological nervous systems. Much of the terminology
associated with these systems is also biologically inspired; thus, networks are commonly said to
be “trained,” not programmed, and to “learn” rather than model.
ANNs have proven to be versatile tools for accomplishing what could be termed higher order
tasks such as sensor fusion, pattern recognition, classification, and visual processing. All these

*Author to whom all correspondence should be addressed.

The authors would like to thank technical monitor M. Tracy of NASA Langley Research Center and G. Lind
of Rice University for their helpful suggestions and assistance. This work was supported under NASA Grant
NAG 1-1433.

Typeset by AM-T@

1
2 A. J. MEADE, JR. AND A. A. FEFLNANDEZ

applications are of great interest to the engineering field, however, present ANN applications have
created the opinion that networks are unsuited for use in tasks requiring accuracy and precision,
such as mathematical modelling and the physical analysis of engineering systems. Certainly the
biological underpinnings of the neural network concept suggest networks would perform best on
tasks at which biological systems excel, and worse or not at all at other tasks.
Contrary to this opinion, the authors believe that continued research into the approximation
capabilities of networks will show that ANNs can be as numerically accurate and predictable
as conventional computational methods. It is also believed that in viewing ANNs as numerical
tools, advances in training and analyses of existing architectures can be made, In a more im-
mediate sense, benefits of this approach will enable the neural network paradigm, with all of its
richness of behavior and adaptivity, to be mated to the more purely computational paradigms of
mathematically oriented programming used by engineers in modelling and analysis.
Presently, the most popular application of ANNs in science and engineering is the emulation
of physical processes by a feedforward artificial neural network (FFANN) architecture using a
training algorithm. Because this paradigm requires exposure to numerous input-output sets, it
can become memory intensive and time consuming. Considerable effort may be saved if the
mathematical model of a physical process could be directly and accurately incorporated into the
FFANN architecture without the need of examples, thereby shortening or eliminating the learning
phase.
As a consequence, our efforts concentrate on developing a general noniterative method in which
the FFANN architecture can be used to model accurately the solution to algebraic and differen-
tial equations, using only the equation of interest and the boundary and/or initial conditions. A
FFANN constructed by this noniterative, and numerically efficient, method would be indistin-
guishable from those trained using conventional techniques.
A number of researchers have approached the problem of approximating the solution of equa-
tions, whether algebraic or differential, in a connectionist context. Most of these attempts have
proceeded from an applications oriented viewpoint, where the solution of the equations is viewed
as a new area in which to apply conventional connectionist paradigms. Specifically, the solution
of the equations is transformed into an optimization problem by defining some objective function
and its associated metric. The problem is then placed in a connectionist structure and solutions
pursued by means of an optimization based training algorithm.
The majority of the work in connectionist equation solving is concerned with the solution of
algebraic equations, such as the solution of an arbitrary linear system of equations. Takeda and
Goodman [2] and Barnard and Casasent [3] use variations on the Hopfield network to solve linear
systems of equations. The linear systems to be solved are expressed as the minimization of a
quadratic function, and the weights of the Hopfield net are trained until the function is minimized.
These algorithms have some interesting features, among them inherent high parallelism with very
simple components and the promise of high speeds. However, they are generally computationally
expensive to implement once the training time is taken into account [2], and they often rely on
specialized hardware such as optical implementation to acquire the high speeds. In addition,
representation of real numerical values can be somewhat involved if a discrete Hopfield model is
used, as in [2].
In contrast, Wang and Mendel [4] use a feedforward architecture to solve linear algebra prob-
lems, with connection structure constrained by the nature of the problem being solved. For a
particular matrix or algebraic system, the constrained network is trained to associate the input
(for example, an input matrix A) with the output (the LU decomposition of A). The solution
is then read from the connection weights of the trained network. The authors successfully train
these networks in problems of LU decomposition, linear equation solving, and singular value
decomposition. The algorithm can be implemented on more standard hardware than those based
on the Hopfield model and is easily parallelizable according to the authors.
Feedforward Neural Networks 3

Lee and Kang [5] h ave solved first order differential equations using the optimization network
paradigm. The differential equation is first discretized using finite difference techniques, and the
resulting algebraic equations are transformed into an energy function to be minimized by the
net. Unlike the energy functions arising from the formulation of the linear algebraic problem
of [2] and [3], the energy functions to be minimized by the net in the solution of a general partial
differential equation are not guaranteed to be quadratic in nature. Thus, Lee and Kang use
modified Hopfield nets able to minimize arbitrary energy functions. They then formulate the
energy quantity to be minimized for a general order nonlinear partial differential equation on a
unit hypercube, but only the solutions to first order linear differential equations are presented.
The results are quite good, although the solutions produced show a tendency to oscillate about
the exact solution in some fairly simple cases. Lee and Kang suggest an optical implementation
for more complex problems.
The approach of this paper is markedly different from the previously cited works. Our efforts
concentrate on developing a general, numerically efficient, noniterative method in which the
FFANN architecture can be used to model accurately the solution to algebraic and differential
equations, using only the equation of interest and the boundary and/or initial conditions. A
FFANN constructed by this noniterative method would be indistinguishable from those trained
using conventional techniques.
Progress in the noniterative approach should also be of interest to the connectionist community,
since the approach may be used as a relatively straightforward method to study the influence
of the various parameters governing the FFANN. When taken from the viewpoint of network
learning, applying an FFANN to the solution of a differential equation can effectively uncouple
the influences of the quality of data samples, network architecture, and transfer functions on
the network approximation performance. Insights provided by this uncoupling may help in the
development of faster and more accurate learning techniques [6]. We have borrowed a technique
from applied mathematics, known as the method of weighted residuals (MWR), and shown how
it can be made to operate directly on the net architecture. The method of weighted residuals is
a generalized method for approximating functions, usually from given differential equations, and
will be covered in detail in Section 4 of this paper.
The development of the noniterative approach to the determination of the FFANN weights can
be likened to the development of a new numerical technique. When evaluating the capabilities of
a new numerical method for science and engineering, it is prudent to apply it first to the solution
of simple equations of practical interest and known behavior. This same approach will be used
in our evaluation of the noniterative method for FFANNs. A simple feedforward network using
a single input and output neuron with a single hidden layer of processing elements utilizing the
hard limit transfer function is constructed to accurately approximate the solution to a first and
second order linear ordinary differential equation. It will also be demonstrated how the error of
the constructed FFANNs can be predicted and controlled.

2. FUNCTION APPROXIMATION
Function approximation theory, which deals with the problem of approximating or interpolating
a continuous, multivariate function, is an approach that has been considered by other researchers
[‘i-9] to explain supervised learning and ANN behavior in general. There are four sets of pa-
rameters that influence the approximation performance of the simple FFANN architecture of
Figure 1:
(1) the input weights,
(2) the bias weights,
(3) the transfer functions, and
(4) the output weights. In an effort to provide guidelines in the interpretation of the mathe-
matical roles of each set of parameters, function approximation theory is investigated.
A. J. MEADE, JR. AND A. A. FERNANDEZ

Input node
Output node

Hidden Nodes

eq : Bias weight for the qth hidden node

aq : Input weight for the pt h hidden node

th
0 : Output weight for the q hidden node
P

Figure 1. Standard feedforward neural network architecture.

The most common method of function approximation is through a functional basis expansion.
A functional basis expansion represents an arbitrary function y(x) as a weighted combination of
linearly independent basis functions (T) that are functions of the variable x,

Y(X)= 5 TQ(“)CQI
q=l

where T is the total number of basis functions and cq are the basis coefficients. This is analogous
to representing arbitrary multidimensional vectors by a weighted sum of linearly independent
basis vectors. A common example of a functional basis expansion is the Fourier series, which
uses trigonometric basis functions. Other basis functions can be used as well; the values of the
coefficients C~ will, of course, vary depending on the basis functions selected.
It is important to note that the basis expansion can also be viewed as an interpolation scheme,
where the nature of the interpolation depends on the kind of basis function used. In fact, func-
tion approximation literature often uses the terms “basis functions” and “interpolation functions”
interchangeably, with the former sometimes preferred by mathematicians and the latter by engi-
neers. Therefore, a basis employing linear polynomials can be said to exhibit a linear interpolation
of the approximated function.
Examination of equation (1) reveals it to be a scalar product of the basis functions and ex-
pansion coefficients. Note that the equation makes no mention of the dependence of the basis
functions on any parameter other than the independent variable x. Similarly, the mathematical
operations performed by the simple FFANN of Figure 1 can be interpreted as a scalar product
where the transfer functions act as basis functions and the output weights act as basis expansion
coefficients. This insight is behind many of the attempts to prove that finite sums of transfer
functions, such as sigmoids, can be used to approximate arbitrary continuous functions [lO,ll].
The link between a FFANN and functional basis expansion assigns a mathematical role both
to the output weights and to the transfer functions, two of the parameter sets mentioned earlier
as affecting the performance of the network.
FeedforwardNeural Networks 5

2.1. Transfer Functions as Basis Functions

Let us evaluate the potential usefulness of network transfer functions as basis functions. For the
remainder of this paper, we will use the hard limit of Figure 2 as the transfer function. We have
based this choice on the simplicity of the function and the ease by which it can be implemented
in software and hardware [12].

Figure 2. Hard limit transfer function.

The hard limit can be modelled by the following equations:

T4 = -1.0, for & < -1.0,

Ts = &?P(~L for - 1.0 < Eq I 1.0,

Tq = f1.0, for & > 1.0, (2)

where & is some linear function of the independent variable x.

In the approximation of a function by basis expansion, the bases that are chosen for the
expansion must either span the space inhabited by the function to be approximated or a subspace
thereof. In theory, this leads to a very large choice of possible bases. The commonly used basis
functions in function approximation literature are said to be either local (exhibiting a nonzero
value over a limited part of the domain only) or global (nonzero almost everywhere in the domain
they are to span).
The hard limit transfer functions may be viewed as global linear polynomial basis functions
defined in a piecewise manner with the aid of the transformation into &. An appeal to the inter-
polatory nature of the basis expansion explains the distribution of the ramp functions across the
domain as being necessary in order to improve the quality of the interpolation. The distribution
of the hard limits is accomplished using the input and bias weights, as will be shown later.
A question that must be answered is how effective are the hard limits as basis functions. To help
in answering this question , we use the Weierstrass Approximation Theorem which can be stated
as follows: given any interval [a, b], any real number E > 0, and any real valued function f(z)
continuous on [a, b], there exists a polynomid p(x) such that Ilf - pII < 6 , where ]I . II denotes
some function norm. A constructive proof of this theorem is found in [13]. A welcome conclusion
from this theorem is that the hard limits should at least theoretically allow representation of
arbitrary functions and, hence, arbitrary mappings. Yet Weierstrass provides no clue as to the
performance, let alone optimality, of the infinite number of polynomial functions that exist. The
fact that it is possible to represent arbitrary functions with some primitive does not mean that
it is either practical or desirable to do so.
6 A. J. MEADE, JR. AND A. A. FERNANDEZ

Prenter [14] defines a good polynomial basis 8s one which allows for a unique solution to the
expansion coefficient problem

f(G) = P(Xi), (i = 0, 1, * . . , iv), (3)

where the N points used to satisfy the function to be approximated are commonly termed knots.
A good polynomial basis should also provide smooth interpolation of the approximated function
between the knots. A particular family of polynomials that provides these characteristics is the
family of Lagrange polynomials.
Prenter examines the behavior of global Lagrange polynomials and discusses certain weaknesses
shared by all global basis functions:
(1) The algebraic systems they generate to solve for the expansion coefficients often suffer
from linear dependence problems as N increases (ill conditioning).
(2) They often suffer from poor approximation (so-called “snaking”) between knots, particu-
larly if the function to be approximated has high gradients, noise, or discontinuities.

2.2. Lagrange Polynomial Splines

Because of the disadvantages of global basis functions, piecewise polynomial splines based on
the Lagrange polynomials are often used. Splines are polynomial curves that extend over a
limited number of knots and which together provide varying degrees of interpolation between
their knots. The most common polynomial spline is the Chapeau or “hat” function as shown
in Figure 3. This is sometimes referred to in the literature as a first-order Lagrange polynomial
(also known as a first order Lagrangian interpolating function or shape function). Its formula is:

Gi = (x - Xi-l) xi-1 I x 5 xi,

(Xi - X&l) ’
Qi = (Xi+1 x)
-
Xi L- X 5 Xi+17
(Xi+1- Xi) ’
otherwise @‘i = 0, x < xi__1 or X > Xi+l. (4
Note that the polynomial varies between the values of 0 and 1.

4
1 --

Figure 3. The Chapeau or “hat” function.

Figure 4 shows, for a problem domain [xi, ZN] discretized with N = 3 knots (xi, xz, and xs),
how the hat functions must be distributed to provide a linear interpolation. Notice that for any
value of x in the domain, the sum of all hat functions will equal a value of 1. This means that the
hat functions distributed in this manner will represent a constant accurately which is the first,
and most important, test of a good interpolation scheme. An additional benefit of the distribution
Feedforward Neural Networks

Figure 4. Proper distribution of hat functions along the problem domain (11 5
2 I x3).

with respect to computational efficiency is that the hat functions are linearly independent. This
will be of importance in the determination of the basis expansion coefficients.
The distribution of the hat functions is justified also on purely mathematical grounds. Con-
sidering that the hat functions are being used as a functional basis, and that the function being
represented has been discretized at N knots, it is clear there are N degrees of freedom for the
function being represented. This implies N dimensions in the function subspace being used to
represent the function, necessitating N hat functions in the domain, one centered at each knot.
This particular polynomial spline does not suffer from the common maladies of the global
Lagrange polynomials. Namely:

(1) The algebraic system generated to evaluate the expansion coefficients is very sparse and
well-conditioned. In this sense, a sparse system means that less than the total number (N)
of unknowns appear in each of the N algebraic equations. This is because the splines exist
only locally in the problem domain and so only a limited number of them are involved in
representing the function at any one value of x.
(2) The interpolation between knots is linear and well-behaved (no snaking) and can easily
handle high gradients or noise.
(3) Since the maximum value of the spline is 1, the values of the expansion coefficients are
identical to the values of the approximated function at the knots.
One drawback, however, is that the approximation is discontinuous in the first derivative. In the
language of functional analysis, this spline is found in C’[a, b], a subspace of Lz[a, b].

3. CONSTRAINT OF NETWORK PARAMETERS

As previously mentioned, the efforts of other researchers in the application of ANNs to the
solution of differential equations involve minimizing an energy function related to the differential
equation on a Hopfield-type network. Supervised learning and solution of differential equations
by minimization suffer from similar problems: long convergence times and the possibility of
converging to a suboptimal state or not at all. It is crucial to realize that these difficulties are not
exclusively due to the algorithms being employed. Part of the difficulty is that the problem as
given is still underdetermined: there are not enough constraints on the parameters in the system
to admit a reasonable number of solutions. As a result, the error surface created is likely to be
complex, and the algorithms may experience problems traversing it.
One obvious solution to this difficulty is to impose constraints on the parameters, either arbi-
trarily or following some rationale, and changing the nature of the error surface into one that is
more tractable, but still approximates the desired function at the error minimum. Ideally, one
would like to constrain the parameters to the extent that optimization is not necessary and the
remaining unknown parameters can be determined uniquely.
8 A. J. MEADE, JR. AND A. A. FERNANDEZ

3.1. Input Weights and Bias Weights

Using the notation of Figure 1, where CQ represents the input weight and 8, represents bias
weight, the output (oq) of a specific processing element q, can be written in equation form

% = rq(QqX+ e,) = rq (5,) whereJ,=cr,z+8,. (5)

While crq and eq are constants, 5, is a variable that is linearly dependent on the input x. By
inspection of equation (5), the input weight oq appears to act as a scaling coefficient for x.
Alternately, the combination of the input weight and bias can be viewed as transforming the
input variable to a new space cq, where tq is specific to the qth processing element. This linear
transformation of x into cq is similar to the unscaled shifted rotations of [ll].
Define a point on the x-axis xq as the “center” of the transfer function Tq such that

aqxq + 0, = 0 or eq = -aqzq.

From this equation, it is seen that the bias weight allows the origin of each transfer function
(Q = 0) to be set in the independent variable space x. The input and bias weights allow each
transfer function to be scaled differently and centered at different locations in the independent
variable space x (Figure 5).

Figure 5. Arbitrary distribution of hard limits along ZEaxis.

3.2. Transforming the Transfer Functions of the Neural Network into Splines

From a purely function approximation perspective, the aptitude of the hard limit as a basis
function is relatively low. Since it is a global polynomial, it would be expected to produce a full
system of algebraic equations for the solution of the basis coefficients and would be sensitive to
noise. In addition, considering that the hard limit is zero at its center and has a nonzero value
throughout the remainder of the problem domain, it is expected that the basis coefficients that
will be generated will not have a direct relation to the values of the function being approximated.
This property is often termed as nonlocal representation in the connectionist literature [15].
Nonlocal representation is a condition that is often viewed favorably in the connectionist lit-
erature, as indicating “redundancy,” and is sometimes discussed as a characteristic unique to
networks. In fact, it is a common feature of any functional basis expansion utilizing global basis
functions, and its net effect is often negative as it couples the values of all the basis coefficients
in the approximation of a function, making the system more difficult to solve and complicating
the imposition of boundary conditions.
A considerable number of advantages, as discussed in Section 2.2, may be obtained if the hard
limit is transformed into the hat function. Consider a one-dimensional domain as in Figure 6,
with two hard limit functions centered in neighboring intervals as shown. If the function centered
between xi and xi+1 is multiplied by -1 and added to the second unaltered function (Figure 7),
it is clear by inspection that the result will be a hat function with a maximum value of 2.
Feedforward Neural Networks 9

Additionally, if the hard limits in the figure are scaled between -0.5 and +0.5 by suitable output
weights, the resulting hat function is indistinguishable from that used in the function approxi-
mation literature. The transformation of hard limits to hat functions can be described by the
following equation:
Y; - TB = 2@i, (i = 1,2,. . . ) N),

where the superscripts of the hard limits, A and B, refer to adjoining intervals xi-1 < x 5 zi
and zi 5 z < zi+r, respectively. The functions ‘If’: and ‘Yf are defined as zero at the midpoints
of their respective intervals. A consequence of this formulation is that the number of hard limits
can be linked to the number of hat functions desired. We will require twice as many hard limits
as hat functions (T = 2N).

4
c----
1.0 --
/
/
/

I
I b
xi+1 x

-1.0 -- -_---

Figure 6. Constrained distribution of hard limits along z axis using the input and
bias weights.

4
l_O-_ _--__-__-_

\
\
\
\
I
\ I b
‘,x;i+l X
\
-1.0 -- L m--w

Figure 7. Right side hard limit (dashed) flipped by multiplying by negative output
weight. Addition of the two hard limits creates a hat function. Refer to Figure 3.

3.3. Constraint of Input, Bias, and Output Weights

Let us now derive the remaining specific constraints required to equate the hard limit function
representation (equation (1)) to one using the hat functions

T N

?/a(x)= c Tq(“)C* = c %(xh,

q=l i=l

where the hat functions @i are defined as having a value of 1 at the knot xi, as per equation (4)
and wi are the basis coefficients associated with the hat functions,
Let us discretize the problem domain using the knots xi in the following manner:

a = x1 < x2 < . . . < x),-l < x&7 = b.

We will label this discretization as the problem mesh. To generate the appropriate hat functions
at the boundaries, two auxiliary knots xc and zN+r are required such that 20 < a and zN+r > b.
10 A. J. MEADE, JR. AND A. A. FERNANDEZ

Using the constraint T = 2N, we may rewrite equation (1) as

T=2N
c Tq(“)Cq = ~rqwcq + 2 TqWq7 (8)
q=l q=l q=N+l

where the summations of the right-hand side can be rewritten without loss of generality as

q=l

2N N
YqCq = c YfWi, (9)
c
q=N+l i=l

therefore,

5 Tq(x)cq = 5 r+i +2 Tfs4, (10)

q=l izl i=l

where

rf = -1.0, for x < xi-l,

TA = CA(x), for xi-1 I x I xi,
-rs = 1.0, for 2 > xi,

Tf=O, @x= xi-1 2+ zi

)
(11)
and where

Tf = -1.0, for x < xi,

rs = .$(x), for xi < x 5 xi+i,
TB = 1.0, for 2 > xi+l,
xi + xi+1
TB=O, @x= 2 . (12)
If we constrain the output weights such that
wi
uj = -q = -, (13)
2

then, substituting this relation into equation (10) gives:

But, from equation (6) ‘If - Tf = 2%; therefore,

which is the basis expansion for ya(x) in terms of hat functions (equation (7)).
It is relatively straightforward to derive actual formulae for the input, bias, and output weights.
From equations (9) and (13), the output weights are given by

Wi
ci =ui=-, (i=1,2 )...( N),
2
-wj
Ci+N
=‘ui=- (i = 1,2,. . .,N), (14)
2 ’
Feedforward Neural Networks 11

where the numbering of the weights with respect to the actual net architecture can be in any
order. The result is simple. The ith pair of the T = 2N output weights in the FFANN must be
set to the values ui and zli, that is, f and -3 respectively, of the 2‘th basis expansion coefficient wi
for (i = 1,2,. . . , IV).
To derive the input and bias weight formulae, refer again to Figures 3, 5-7 and also to equations
(4), (ll), and (12). It is clear from Figure 6 that the hard limits must be placed in the problem
space in such a manner that their linear behavior (<f( z ) and t?(z) respectively) occurs across
the appropriate interval. Therefore, by inspection,

2(X - Pi-i)
J?(z) = (zi _ zi.& - l, Xi-_1 5 2 I zi,

2(2 - z:i)
GYz) = @+i _ zi) - l, zi < z I G+1,

i = 1,2, . . . , N. (15)

Since J:(z) = CY~X+ 0: and c?(x) = $2 + Of, we can calculate the value of the input and
bias weights using equation (15). Therefore,

4 = (xi ef = - (xi2;i;;_l)
_2xi_1) 7 - l, (16)

and

af=(xi+12_e?=-(xi+ylxi)’ xi) - 1. (17)

3.4. Constraint Results

The results of this section can be summarized as follows:

(1) Two hard limits can be added to form a hat function spline with a maximum value of two.
With appropriate scaling, the canonical hat function can be formed.
(2) By using T = 2N hard limits, where N is the number of hat functions and knots in
the problem domain [a, b], it is shown that by appropriately constraining the input, bias,
and output weights, the feedforward network can be made to form a hat function based
functional basis expansion everywhere in the problem domain.
(3) The input and bias weights can be computed from equations (16) and (17) given the
problem mesh.
(4) The values of the expansion coefficients wi can be evaluated using the hat function basis Cp.
This eliminates all the problems associated with using the hard limits as basis functions
directly, while preserving the architecture of the net. From equation (14), the values
of the coefficients wi are used to calculate the output weights C~ that will complete the
construction of the feedforward network.
Figure 8 shows the resulting architecture for six processing elements used to form three hat
functions for three knots plus two auxiliary knots, as in Figure 4. Figure 9 shows an alternate
network architecture that can be constructed using two hidden layers, where the second hidden
layer has linear transfer functions. Although the number of processing elements T is almost
the same for this architecture (T = 2N + l), only N + 1 of the processing elements have hard
limit transfer functions associated with them. Note also the connections between the first and
second hidden layer are local and repeating. The economy on the use of hard limits and the local
connections between the first and second hidden layer should make this architecture particularly
attractive for hardware implementation. For the remainder of this paper, we will refer to the
first architecture, with the understanding that the transformation to the second architecture is
trivial.
12 A. J. MEADE. JR. AND A. A. FERNANDEZ

Groupings form 3 hat

functions (I,II,III)
on 3 knots

Figure 8. Single hidden layer feedforward network with constrained weights (T = 6,

N=3).

Groupings form 3 hat

functions (1.11.111)
on 3 knots

Figure 9. Two hidden layer feedforward network with constrained weights (T = 7,

N=3).

The implication of Section 2 was that estimation of the weights of a FFANN (“training”) could
be viewed as the equivalent problem of finding a suitable basis expansion for the relationship to
be modelled by the net. This section adds the additional consideration that for suitable transfer
functions and constraints on the input and bias weights, the basis expansion being sought can be
a classical expansion in polynomial splines. Thus, training in the traditional sense is eliminated
in this approach. The construction of the network has been reduced to a question of finding
the appropriate basis expansion coefficients wi and then setting the values of the pairs of output
weights ui and vi from equation (14). The values of the input and bias weights are fixed for a
FeedforwardNeural Networks 13

given number of knots in the problem domain and are effectively uncoupled from the problem of
finding the output weights.
The computation of the expansion coefficients wi can be made from a variety of numerical
methods. As mentioned in Section 1, we will use the method of weighted residuals [16] to
determine the output weight values.

4. DETERMINATION OF OUTPUT WEIGHTS:

THE METHOD OF WEIGHTED RESIDUALS
The problem of finding the expansion coefficients can be expressed in terms quite familiar to the
connectionist: given a differential equation whose solution is a function y, and an approximation
to the solution expressed as a basis expansion yar find expansion coefficients such that the error
of the approximation is small in terms of some norm in the domain 2).
To be more precise, if spatial variables are described by the vector s and L(y) is some differential
operator, then the substitution of equation (18)

(where @i(s) are basis functions) (18)

i=l

into the time dependent governing differential equation L(y) = g(s,t), results in the equation
L(Y,) - g(s,t) = R(w1,. . .7 wN, s, t), where R is some function of nonzero value that can be
described as the error or the differential equation residual. In addition, if equation (18) is subject
to the initial and boundary conditions of the governing differential equation, we may write

r(yla)=R~(wl,...,W~,S) and B(y,)= RB(w,. .. ,wN, t), respectively.

In principle, basis coefficients wi(t) can be found so that R becomes small in terms of some norm
over the problem domain V as N + co.
Let us require that ya satisfy the initial and boundary conditions exactly so that RI = Rg = 0.
The basis coefficients wi(t) that satisfy the differential equation can be determined by requiring
that the equation residual R be multiplied or weighed by a function f(s) (often called the weight
function or test function), integrated over the problem domain 2) and set to zero,

Iv
fRdD=((f,R)=O, (19)

where (f, R) is the inner product of functions f and R. It is from equation (19) that the method
of weighted residuals derives its name. It may be noted that equation (19) is closely related to
the weak form of the governing equation

This relation to the weak form has the benefit of allowing discontinuities in the exact solu-
tion [17], which is particularly advantageous when approximating the solution to problems with
large gradients or discontinuities.
Since linearly independent relationships are needed to solve for the basis coefficients of equa-
tion (19), it is clear that f must be made up of linearly independent functions fk. By letting
k = l,... , N, a system of N equations for the basis coefficients is generated. For a time depen-
dent case, a system of ordinary differential equations in t result. For the steady-state case, the
basis coefficients are constants and a system of algebraic equations is generated.
14 A. J. MEADE, JR. AND A. A. FERNANDEZ

Different choices to the weight function fk give rise to different computational methods that
are subclasses of MWR. Some of these methods are:

(1) The subdomain method. The computational domain is discretized into N subdomains Dk,
which may overlap where

fk = 1 within Z)k and fk = 0 outside Dk, fork=l,...,N.

The subdomain method is identical to the finite volume method when evaluating equa
tion (19). With the subdomain method, equations (18) and (20) provide the appropriate
framework for enforcing conservation properties in the governing equation, both locally
and globally.
(2) The collocation method. The weight functions are set to
fk = ‘@ - Sk), fork= l,...,N,

where 6 is the Dirac delta function. Substitution of this relation into equation (19) gives

6(s - Sk) R dV = R(wI, ~2,. . . , Sk) = 0.

I 2)

Since the finite difference method requires the solution of the differential equation only
at nodal points, one can interpret the finite difference method as a collocation method
without the use of an approximate solution ya.
(3) The method of moments. The weight functions are chosen from a set of linearly indepen-
dent functions such that successively higher moments of the equation residual are required
to be zero. For a one-dimensional problem, we have

fk = xk, for k = 0,. . . , N - 1.

Substitution of this relation into a one-dimensional form of equation (19) gives

J
b
xkR dx = 0, fork=O,...,N-1.
a

(4) The least-square method. The weight functions are set to

fk=$ fork= l,...,N,

where wk are the basis coefficients. The application of fk to equation (19) is identical to
finding the minimum to the square of the residual summed over the problem domain i.e.,

a
-/R2m=o.
awk 2)

(5) The generalized Galerkin method. The weight functions are set to

fk = @k(S), fork=l,...,N,

where gk(s) are analytic functions similar to the bases, but modified with additional
terms to satisfy the boundary and/or initial conditions. The finite element and spectral
methods can be considered subclasses of the generalized Galerkin method. Galerkin based
methods are considered particularly accurate if basis functions are the first N members of a
complete set, since equation (19) indicates that the residual is orthogonal to every member
of the complete set. Consequently, as N tends to infinity, the approximate solution ya will
converge to the exact solution y.
Feedforward Neural Networks 15

It should also be noted that MWR is not limited to cases where initial and boundary conditions
are satisfied exactly (RI = RB = 0). If we require that only the differential equation be satisfied
exactly (R = 0), we may use MWR to derive boundary methods such as the panel and boundary
element methods. The various subclasses of the method of weighted residuals are discussed and
compared at great length by Finlayson [16] and Fletcher [18].
By approaching the evaluation of the output weights through the method of weighted residuals,
we have access to both theoretical and computational results from the most commonly used
techniques in the fields of scientific and engineering computing. Each method has advantages
and disadvantages depending on the particular application. For this paper, we will make use
of the generalized Galerkin technique following the observation made by Fletcher that “. . . the
Galerkin method produces results of consistently high accuracy and has a breadth of application
as wide as any method of weighted residuals” 118, p. 381.
Specifically, 6(z) will be identical to the basis functions used to describe the approximation ya.
This is known as the Bubnov-Galerkin technique [i8]:

f&c) = @k(Z), where k= 1,2 ,..., N. (21)

With f(z) so defined, evaluation of the integrals leads to a system of algebraic equations that
is usually linear if the differential equation to be approximated is linear. Otherwise, the algebraic
equations are nonlinear, though still tractable. These equations can be evaluated for a unique
solution through standard techniques [19], a11 owing the evaluation of both linear and nonlinear
differential equations.

5. MODEL PROBLEMS

5.1. Example 1: A First Order Linear Ordinary Differential Equation

The most logical and practical model problem in which to first test the accuracy and conver-
gence properties of the network methodology is a first order linear ordinary differential equation
with its associated initial condition
&
-- y(x = 0) = 1, OIz<l.
dx
Y = 0, (22)
This equation was also used as an example by [5]. A simple feedforward network is constructed
to approximate the nontrivial solution to equation (22). We select:
(1) A single input processing element using a linear transfer function for the independent
variable x.
(2) A single output processing element using a linear transfer function for the dependent
variable approximation ya.
(3) A number of hidden layer processing elements using the hard limit transfer function.
(4) A single bias node connected to each of the hidden layer processing elements.
Applying the results of the discussion in Section 3.3, we select the number of knots N to dis-
tribute in the problem domain and determine the values of the input and bias weight sets of, c$,
et, and OF, respectively. Using hat basis functions @, we can now represent the approximation
of the dependent function,

ya = 2 Q&r):). (23)
i==l

What remains to complete the network are the values of the output weights and, consequently,
the values of the coefficients wi.
Substituting equation (23) into the model equation results in

4/a
-- ya = R. (24)
dx
16 A. J. MEADE, JR. AND A. A. FERNANDEZ

The Bubnov-Galerkin technique (fk = Qjk) is applied to equation (24):

- (@k,Ya)= 0, fork= l,...,N. (25)

(26)

then equation (25) becomes

Since the problem is steady, wi are constants and we can write

~(mk,~)wi=~(~k,~i)wi,
fork=l,...,N.
i=l i=l

Evaluating the inner products, we arrive at the following system of linear algebraic equations:

where
d$
Mki = @k,- - @pi and b,, = 0, fork=l,...,N.
dx >
An initial condition is required for a nontrivial solution to the model equation (22). Using our
approximation of ya
N

i=l
which yields the equation
Wl = y(0) = 1.
This equation can be incorporated into the coefficient matrix M and the vector b of equation (27)
and, therefore,

MII = 1 and Mli = 0, for i = 2,. . . , N and bl = y(0) = 1.

Solution of the modified algebraic system provides the basis expansion coefficients wi for the
differential equation being approximated.
It was mentioned earlier that the type of coefficient matrix created by the weighted residual
method (or specifically the Bubnov-Galerkin method) depended on the type of basis functions
employed. Analysis of the integral forming the coefficient matrix M of equation (27) shows that

@k,z-@i ~0,
Mki =
> for i < k - 1 and i>k+l.

This is due to the local nature of the hat functions as well as their distribution in the problem
domain and linear independence. The original matrix M of equation (27) is thus positive definite
and tridiagonal. Modification of the matrix by incorporation of the initial condition does not alter
these advantages. As a result, the modified algebraic system can be solved in O(N) operations
by the Thomas algorithm [20]. With the expansion coefficients known, the output weights of the
neural network can be easily computed by equation (14).
The determination of the output weights completes the specification of the parameters for the
feedforward network that models equation (22). The accuracy of the output can be controlled by
increasing the number of knots in the problem domain, thereby creating additional hat functions.
This then translates into additional neurons.
Feedforward Neural Networks 17

5.2. Results

The exact solution of equation (22) with the boundary condition y(O) = 1 is

y = e”.

The feedforward network was constructed using 12 processing elements with a single hidden layer,
corresponding to an even spacing of six hat functions from six knots within the problem domain
and two auxiliary knots. Figure 10 compares the output of the constructed feedforward neural
network (denoted by triangles) with the exact solution (solid line) at 21 equally spaced sample
points.

2.6
Ya

2.2

1.8

1.4

1
0 0.2 0.4 0.6 0.8 1
X

Figure 10. Comparison of exact solution and network output (A) at S = 21 sample
points with 2’ = 12 processing elements. RMS = 5.60 x 10m5.

The RMS error of the network is 5.60 x lo-‘, where the RMS error is defined as

w RMS = 7

and where S is the number of samples for the evaluation of the error. The lack of snaking of
the results about the exact solution can be attributed to the constraints on the input and bias
weights to guarantee linear interpolation, and to the use of the Bubnov-Galerkin method that
minimizes the error across the problem domain and not just at discrete points.
The input, bias, and output weights for the network were computed via a conventional Fortran
code. All computations were done in double precision. The IMSL subroutine DQDAG [21] was
used for numerical integration of the coefficient matrix components and the Thomas algorithm was
used for the solution of the tridiagonal linear system of algebraic equations. The program ran on
a standard SUN Microsystems Spare 2 workstation with a run time of approximately 1 second.
When optimized for speed, similar programs used in computational physics and engineering
applications can run even faster for this problem size on comparable hardware.

5.3. Convergence

The error bounds derived for the Bubnov-Galerkin method lead us to expect that the Lz norm
of the error will be bounded by a quadratic term in the spacing of the knots (grid spacing) [22].
For uniform grid spacing

xi+l - xi = xi - xi_-1 = h = &.

18 A. J. MEADE, JR. AND A. A. FEF~NANDEZ

So, for the error E = y - ya, and with the Ls norm defined as

we have
IIEII I Ch2, where C is a constant. (29)
In practice, the discrete L2 norm of the error is often used to avoid having to perform the actual
integration in equation (28):

where the integral has been approximated by a scaled summation; integration schemes of higher
accuracy could have been used if deemed necessary. The similarity in the definition of the
discrete Lz error and the RMS error, when the network is sampled at the knots (S = N), allows
the following relation to be established:

llEllL2
w ItMS = m = /-
y IIEIIL2~
Substitution of this relation into equation (28) would yield an expected convergence behavior in
the RMS norm for the network; it should be slightly greater than quadratic. The L2 norm is used
here due to its wide use in function approximation and the simpler expression of equation (30)
in terms of the La norm.
If we take the log of IIEII,then

WIIW 2 log(C)+ 2log(h). (30)

Considering the form of equation (30), it would be reasonable to expect that a log plot of the L2
norm of the error versus the log of the grid spacing would yield a straight line with slope of 2.0.
A suitable convergence plot of this type is shown in Figure 11. The size of the mesh spacing
was halved starting at h = 0.2 to a value of h = 0.00625, with the values marked by circles in
the figure. The line is found to have an actual slope closer to 2.03, showing slightly greater than
expected convergence. This is not completely unusual for some problems, although a slope of 2.0
is the most that can normally be expected from Bubnov-Galerkin utilizing a hat function basis on
an arbitrary problem, barring any attempts to force superconvergence at specific points through
special techniques outside the scope of this investigation.

lo-'

10-1
log h “-I

Figure 11. Logarithmic convergence plot. Slope = 2.03.

Feedforward Neural Networks 19

5.4. Example 2: A Second Order Linear Differential Equation

A slightly more challenging problem of practical interest is the eigenvalue problem [23]. The
eigenvalue problem, familiar to workers in the dynamic systems analysis field, is described by the
following second order linear ordinary differential equation with associated boundary conditions

$$+ Xjy = 0, Olzll,

Y(O) = Y(l) = 0, (31)

where the eigenvalues (Xj) are given by

Xj = (jr)27 forj=1,2,3 ,.... (32)

The solutions y(Xj) corresponding to the eigenvalues are known as the eigenfunctions. An ad-
ditional condition we impose on the eigenfunction is that they be orthonormalized, which is
described using the inner product

(YCxj>3Y(XL)) = l, ifj=k and (?/(xj), Y(Ak)) = 0, ifjfk. (33)

The eigenvalue problem is a good test of the noniterative method, not only because of the higher
order of differentiation, but also because its homogeneous boundary conditions can easily force a
trivial solution ( ya = 0).
Application of the Bubnov-Galerkin method to the problem produces

•I- (fk, XjYa) = 0, fork= l,...,N.

Note that since the basis functions being used are linear, the second derivative is uniformly zero.
Therefore, we must reduce the order of the derivatives using integration by parts. We can then
rewrite equation (34) as

- xj(fk,?h) =.fk$f ’ 0
7
for k = l,...,N.

Note that the use of integration by parts brings in boundary derivative information (Neumann
conditions). In this example, the value of the dependent variable is known at the boundaries
(Dirichlet conditions). In using hat functions, the boundary derivative term is nonzero only for
k = 1 and N.
Unlike Example 1, the approximate solution yr of the eigenvalue problem requires a basis ex-
pansion that uses the independent variable x explicitly. This is because the problem with its
associated boundary conditions is such that the substitution of a simple expansion like equa-
tion (23) will lead to a zero known vector (b) and yield only a trivial solution for a nonsingular
matrix [19].
The following expansion is used for ya:

y=(x) = 2 @i(X)Wi = Cl 2 &(Xi - 7i), (36)

i=l i=l

where
Wi = Cl(Zi - Tj,). (37)
Since xi are the locations of the knots and @piact as linear interpolation functions
N

c
i=l
QiXj = 2.
20 A. J. MEADE, JR. AND A. A. FERNANDEZ

Therefore, equation (36) becomes

(33)

where Cl is some scaling coefficient to be determined when the approximate solution is ortho-
normalized as in equation (33).
Note that with this expansion

,(,,=o(,-$Qi(O)r,) =0 and ,.(,,=.,(l-$Qi(l),) =O, (3%

and, therefore,
71 = 0 = br and ?-N = 1 = bN.
This has the effect of creating a nonzero vector b. As a result, a nonsingular coefficient matrix
will yield a unique, nontrivial solution.
Applying the expansions to equation (35), we obtain the following linear system:

j&?-i = bk, for k=l,...,N, (40)

5
i=l
where

h&i= ($$+-$$)) -kj($&--@i)) and

bk = 0, fork=2,...,N-1, (41)

Ml1 = 1 and Mli = 0, for i = 2,. . . , N, and

bl = ~1 = 0,

with
M NN=l and MN( = 0, fori=1,2,...,N-1 and brJ = ?-1\1
= 1.

Solution of equation (40) yields the values of ri.

Applying the orthonormalization requirement to the Bubnov-Galerkin solution yields the fol-
lowing formula:

1
1

or
J[
0
C15
i=l
@i~i . Cl e@iUi
i=l
dx = 1, (42)

(43)

We can replace the integral with an approximate sum,

C12& (&&)w)2dz = 1. (44)

But,
@i(G) = 17 for r = i. (45)
Thus, solving for Cl, we find that

With ri and Cr known, then wi can be determined. Computing the network output weights
from wi completes the construction of the network.
FeedforwardNeural Networks 21

Results

Equation (31), with its associated boundary and orthonormality conditions, has the following
exact solution:

y(5) = Jzsin(j7rz).

Figures 12-17 compare the sampled outputs of generated feedforward neural networks (denoted
by triangles) with the exact eigenfunctions (solid line) for integers j = 1 to j = 6. The number of
sampled points vary from 15 to 45. The size of the networks in terms of the number of processing
elements T was kept as low as possible while keeping (E)n~s 2 0.01. For instance, for j = 1,
the RMS error was 7.23 x 10V5 with T = 24, while for j = 6, the RMS error was 9.84 x 10e3
with T = 90.
The program ran on a standard SUN Microsystems Spare 2 workstation with a run time of
approximately 1 minute for the largest case of j = 6. Again, this run time could be significantly
shortened with properly optimized code.

0 0.2 0.4 0.6 0.8 1

x
Figure 12. Comparison of exact eigenfunctions and network output (A) for j = 1
with T = 24, S = 15, and RMS = 7.23 x 10e5.

0 0.2 0.4 0.6 0.8 1

x
Figure 13. Comparison of exact eigenfunctions and network output (A) for (b) j = 2
with T = 32, S = 15, and RMS = 8.71 x 10e5.
22 A. J. MEADE, JR. AND A. A. FERNANDEZ

Figure 14. Comparison of exact eigenfunctions and network output (A) for j = 3
with T = 44, S = 35, and RMS = 2.70 x 10m3.

-0.

-1.
0 0.2 0.4 0.6 0.8 1
x
Figure 15. Comparison of exact eigenfunctions and network output (A) for j = 4
with T = 58, S = 35, and RMS = 5.00 x 10m3.

Convergence Plots

Following the same arguments as for Example 1, we would again expect a logarithmic conver-
gence plot of the error versus the grid spacing to be quadratic in the LZ norm. As Figure 18
shows, this is indeed the case. The slope of the line is 1.945, or approximately 2.0. The slightly
subquadratic convergence rate is due to the inaccuracies introduced by the process of orthnor-
malizing the solution. The convergence plot shown was produced for an eigennumber j = 1 and
for a sequence of mesh spacings between h = 0.1 and h = 0.00625. As before, the values are
marked with circles on the convergence plot.

6. CONCLUSION
In an effort to both develop more sophisticated engineering analysis software and enhance
understanding of connectionist systems, the popular feedforward artificial neural network was
applied to the solution of linear ordinary differential equations. Observing that even the most
basic FFANN architecture involved a large number of interacting parameters of uncertain effects,
an effort was made to impose constraints on the network system by attempting to assign specific
roles to the various parameters.
An analogy was made between supervised learning and function approximation. Following
this analogy, concepts from function approximation theory were brought to bear on the problem
of parameter determination in the net. It was observed that a clear analogy could be made
Feedforward Neural Networks 23

Figure 16. Comparison of exact eigenfunctions and network output (A) for j = 5
with T = 70, S = 35, and RMS = 8.93 x 10e3.

-1.5
0 0.2 0.4 0.6 0.8 1
X
Figure 17. Comparison of exact eigenfunctions and network output (A) for j = 6
with T = 90,S = 45, and RMS = 9.84 x 10-3.

. ..... .1.......... .__i_i_._:,__

., : : :: :: j
‘::“
!;I: :i
1o_~ i
:i : I: :,j:: .:’
: :
,: 0.

I . .‘...I . . . . ..*-

lo-’ 10-a
109 h lo-’ loo

Figure 18. Convergence plot for j = 1. Slope = 1.94.

between the basis functions of approximation theory and the transfer functions of the network.
The output weights of the network were viewed as basis expansion coefficients, and the input and
bias weights were seen as controlling the size and location of the interpolation functions in the
problem space.
24 A. J. MEADE, JR. AND A. A. FEFLNANDEZ

Further analysis revealed that hard-limit transfer functions could be easily and economically
employed to represent the first order spline basis function, more commonly known as the hat
function in the function approximation literature. This allows nets to be viewed as straightforward
functional basis expansions for the relationships they are to model, using commonplace basis
functions.
The method of weighted residuals, and one of its variations, the Bubnov-Galerkin method, were
introduced as mathematical algorithms to determine the values of basis expansion coefficients
when approximating the solution to differential equations.
The analogies and equivalences thus uncovered allowed explicit formulae for the input and bias
weights to be formulated. Additionally, they revealed how to transform the results of applying
the Bubnov-Galerkin method into the output weights of a neural network.
Example problems were approached in the following manner: linear ordinary differential equa-
tions were solved using the Bubnov-Galerkin method with hat basis functions and the results used
to construct neural networks. Results of the output of the networks were shown to demonstrate
the accuracy of the approximation.
Thus, it is possible to construct directly and noniteratively a feedforward neural network to
approximate arbitrary linear ordinary differential equations. The methods used are all linear
(O(N)) in storage and processing time. The Lz norm of the network approximation error de-
creases quadratically with the increasing number of hidden layer neurons. The construction
requires imposing certain constraints on the values of the input, bias, and output weights, and
the attribution of certain roles to each of these parameters.
All results presented used the hard limit transfer function. However, the noniterative approach
should also be applicable to the use of hyperbolic tangents, sigmoids, and radial basis functions.

7. CURRENT AND FUTURE RESEARCH

Since the quality of the basis function affects the interpolation, storage, speed, and convergence
properties of the noniterative method, work is progressing on generating higher order splines
using the hyperbolic tangent, sigmoid, and radial basis functions. The higher order splines being
investigated include Lagrange, Hermite, and B-splines.
The noniterative approach outlined in this paper has allowed suitable algorithms to be devised
for the synthesis of FFANNs that approximate nonlinear ordinary differential equations. In ad-
dition, the noniterative algorithm has allowed work to progress in the construction of FFANNs,
Sigma-Pi, and recurrent networks, to approximate linear and nonlinear partial differential equa-
tions.
In closing, work is progressing in applying the noniterative approach to the problem of su-
pervised learning. A noniterative “training” algorithm has been developed which allows the
supervised learning problem to be solved deterministically for the needed weights and number of
hidden layers. The algorithm requires only the solution of systems of linear algebraic equations.

REFERENCES

1. J. Freeman and D. Skapura, Neural Networks: Algorithms, Applications, and Programming Techniques,
Addison-Wesley, New York, (1991).
2. M. Takeda and J. Goodman, Neural networks for computation: Number representation and programming
complexity, Applied Optics 25 (18), 3033 (1986).
3. E. Barnard and D. Casasent, New optical neural system architectures and applications, Optical Computing 88
963, 537 (1988).
4. L. Wang and J.M. Mendel, Structured trainable networks for matrix algebra, In Proceedings of IEEE
International Joint Conference on Neural Networks, Vol. 2 p. 125, San Diego, (June 1990).
5. H. Lee and I. Kang, Neural algorithms for solving differential equations, Journal of Computational Physics
91, 110 (1990).
6. A.J. Meade, Jr., An application of artificial neural networks to experimental data approximation, AIAA-93-
0408, AIAA Aerospace Sciences Meeting, Reno, NV, (January 1993).
Feedforward Neural Networks 25

7. S. Omohundro, Efficient algorithms with neural network behaviour, Complex Systems 1, 237 (1987).
8. T. Poggio and F. Girosi, A theory for networks for approximation and learning, A.I. Memo No. 1140,
Artificial Intelligence Laboratory, Massachusetts Institute of Technology, (July 1989).
9. F. Girosi and T. Poggio, Networks for learning: A view from the theory of approximation of functions,
In Proceedings of The Genoa Summer School on Neuml Networks and Their Applications, Prentice-Hall,
(1989).
10. G. Cybenko, Approximation by superposition of a sigmoidal function, Math. Control Signals Systems 2,
303 (1989).
11. Y. Ito, Approximation of functions on a compact set by finite sums of a sigmoid function without scaling,
Neuml Networks 4, 817 (1991).
12. L.O. Chuo, CA. Desoer and E.S. Kuh, Linear and Nonlinear Circuits, McGraw-Hill, New York, (1987).
13. P.J. Davis, Interpolation and Approximation, Blaisdell, New York, (1963).
14. P.M. Prenter, Splines and Variational Methods, Wiley, New York, (1989).
15. A. Maren, C. Harston and R. Pap, Handbook of Neural Computing Applications, Academic Press, New
York, (1990).
16. B.A. Finlayson, The Method of Weighted Residuals and Variational Principles, Academic Press, New York,
(1972).
17. P. Lax and B. Wendroff, Systems of conservation laws, Comm. Pure and Applied Mathematics 13, 217
(1960).
18. C.A.J. Fletcher, Computational Galerkin Methods, Springer-Verlag, New York, (1984).
19. G. Strang, Linear Algebra and Its Applications, Second Edition, Academic Press, New York, (1980).
20. D.A. Anderson, J.C. Tannehill and R.H. Pletcher, Computational Fluid Mechanics and Heat IPransfer,
Hemisphere Publishing Corporation, New York, (1984).
21. IMSL User’s Manual, MATH/LIBRARY, Version 2, 1991.
22. C. Johnson, Numerical Solution of Partial Differential Equations by the Finite Element Method, Cambridge
University Press, Cambridge, (1990).
23. D. Trim, Applied Partial Diflerential Equations, PWS-KENT, Boston, (1990).

(Applied Mathematical Sciences 126) Frank C. Hoppensteadt, Eugene M. Izhikevich (Auth.) - Weakly Connected Neural Networks (1997, Springer-Verlag New York) PDF
No ratings yet
(Applied Mathematical Sciences 126) Frank C. Hoppensteadt, Eugene M. Izhikevich (Auth.) - Weakly Connected Neural Networks (1997, Springer-Verlag New York) PDF
419 pages
Solution of Nonlinear Ordinary Differential Equations by Feedforward Neural Network
No ratings yet
Solution of Nonlinear Ordinary Differential Equations by Feedforward Neural Network
26 pages
A Feedforward Neural Network Framework For Approximating The Solutions To Nonlinear Ordinary Differential Equations
No ratings yet
A Feedforward Neural Network Framework For Approximating The Solutions To Nonlinear Ordinary Differential Equations
13 pages
ANNfor Solving ODE
No ratings yet
ANNfor Solving ODE
29 pages
Artificial Neural Network Based Numerical Solution of Ordinary Differential Equations
No ratings yet
Artificial Neural Network Based Numerical Solution of Ordinary Differential Equations
29 pages
Solving Ordinary Differential Equations and Systems Using Neural Network Methods
No ratings yet
Solving Ordinary Differential Equations and Systems Using Neural Network Methods
77 pages
Neural Algorithms For Solving Differential Equations
No ratings yet
Neural Algorithms For Solving Differential Equations
22 pages
Artificial Neural Networks For Solving Ordinary and Partial Differential Equations
No ratings yet
Artificial Neural Networks For Solving Ordinary and Partial Differential Equations
14 pages
Neural ODES
No ratings yet
Neural ODES
32 pages
Neural Differential Equations: A Comprehensive Review and Applications
No ratings yet
Neural Differential Equations: A Comprehensive Review and Applications
14 pages
Articulo 2222
No ratings yet
Articulo 2222
16 pages
ANN Assignment
No ratings yet
ANN Assignment
10 pages
Bachelor Thesis Jort Bouma
No ratings yet
Bachelor Thesis Jort Bouma
25 pages
Solving Ode by Ann PDF
No ratings yet
Solving Ode by Ann PDF
14 pages
Neural Operator Graph Kernel Network For Partial Differential Equations
No ratings yet
Neural Operator Graph Kernel Network For Partial Differential Equations
21 pages
An Introduction To Neural Network Methods For Differential Equations
No ratings yet
An Introduction To Neural Network Methods For Differential Equations
124 pages
Solving Flows of Dynamical Systems by Deep Neural Networks and A Novel Deep Learning Algorithm
No ratings yet
Solving Flows of Dynamical Systems by Deep Neural Networks and A Novel Deep Learning Algorithm
12 pages
DeepXDE A Deep Learning Library For Solving Differ
No ratings yet
DeepXDE A Deep Learning Library For Solving Differ
17 pages
Session 1
No ratings yet
Session 1
8 pages
Physics-Informed Neural Networks
No ratings yet
Physics-Informed Neural Networks
22 pages
Artificial Neural Network Methods For The Solution of Second Order Boundary Value Problems
No ratings yet
Artificial Neural Network Methods For The Solution of Second Order Boundary Value Problems
15 pages
Journal of Computational Physics: M. Raissi, P. Perdikaris, G.E. Karniadakis
No ratings yet
Journal of Computational Physics: M. Raissi, P. Perdikaris, G.E. Karniadakis
22 pages
Finite Basis Physics-Informed Neural Networks (Fbpinns) : A Scalable Domain Decomposition Approach For Solving Differential Equations
No ratings yet
Finite Basis Physics-Informed Neural Networks (Fbpinns) : A Scalable Domain Decomposition Approach For Solving Differential Equations
39 pages
Lec 105
No ratings yet
Lec 105
19 pages
Neural Networks, Radial Basis Functions, and Complexity
No ratings yet
Neural Networks, Radial Basis Functions, and Complexity
26 pages
Artificial Intelligence in Mechanical Engineering: A Case Study On Vibration Analysis of Cracked Cantilever Beam
No ratings yet
Artificial Intelligence in Mechanical Engineering: A Case Study On Vibration Analysis of Cracked Cantilever Beam
4 pages
AreviewofHopfieldNNforsolvingMPP PDF
No ratings yet
AreviewofHopfieldNNforsolvingMPP PDF
13 pages
Applications of ANN
No ratings yet
Applications of ANN
19 pages
Improving Physics-Informed Neural Networks With Meta-Learned Optimization
No ratings yet
Improving Physics-Informed Neural Networks With Meta-Learned Optimization
26 pages
Accepted Manuscript: Journal of Computational Physics
No ratings yet
Accepted Manuscript: Journal of Computational Physics
47 pages
Network Learning and Training of A Cascaded Link-Based Feed Forward Neural Network (CLBFFNN) in An Intelligent Trimodal Biometric System
No ratings yet
Network Learning and Training of A Cascaded Link-Based Feed Forward Neural Network (CLBFFNN) in An Intelligent Trimodal Biometric System
21 pages
Mgfno: Multi-Grid Architecture Fourier Neural Operator For Parametric Partial Differential Equations
No ratings yet
Mgfno: Multi-Grid Architecture Fourier Neural Operator For Parametric Partial Differential Equations
29 pages
Solving Simultaneous Linear Equations Using Recurrent Neural Networks
No ratings yet
Solving Simultaneous Linear Equations Using Recurrent Neural Networks
23 pages
A Comprehensive and Fair Comparison of Two Neural Operators
No ratings yet
A Comprehensive and Fair Comparison of Two Neural Operators
42 pages
Fundamentals of Artificial Neural Networks-Book Re
No ratings yet
Fundamentals of Artificial Neural Networks-Book Re
3 pages
Similarity-Based Heterogeneous Neural Networks: Llu Is A. Belanche Mu Noz Julio Jos e Vald Es Ramos
No ratings yet
Similarity-Based Heterogeneous Neural Networks: Llu Is A. Belanche Mu Noz Julio Jos e Vald Es Ramos
14 pages
Pfe - Final Aziz Younes Et Youssef Enawi
No ratings yet
Pfe - Final Aziz Younes Et Youssef Enawi
31 pages
On Neural Diferential Equations
No ratings yet
On Neural Diferential Equations
231 pages
Genetic Algorithms Versus Traditional Methods
No ratings yet
Genetic Algorithms Versus Traditional Methods
7 pages
Information Sciences: Le Zhang, P.N. Suganthan
No ratings yet
Information Sciences: Le Zhang, P.N. Suganthan
3 pages
Modeling Systems With Machine Learning Based Differential Equations
No ratings yet
Modeling Systems With Machine Learning Based Differential Equations
12 pages
Renato - A Tutorial On Solving Ordinary Differential Equations Using Python and Hybrid Physics-Informed Neural Network
No ratings yet
Renato - A Tutorial On Solving Ordinary Differential Equations Using Python and Hybrid Physics-Informed Neural Network
11 pages
Fibonacci NN
No ratings yet
Fibonacci NN
17 pages
An Introduction To Back
No ratings yet
An Introduction To Back
4 pages
A Survey of Randomized Algorithms For Training Neural Networks
No ratings yet
A Survey of Randomized Algorithms For Training Neural Networks
10 pages
Practical Aspects On Solving Differential Equations Using Deep Learning
No ratings yet
Practical Aspects On Solving Differential Equations Using Deep Learning
32 pages
Solving Differential Equations Via Artificial Neural Networks Findings and Failures in A Model Problem
No ratings yet
Solving Differential Equations Via Artificial Neural Networks Findings and Failures in A Model Problem
6 pages
2022 - Neural Optimization Machine-A Neural Network Approach For Optimization
No ratings yet
2022 - Neural Optimization Machine-A Neural Network Approach For Optimization
22 pages
Pde Homework Solutions
100% (1)
Pde Homework Solutions
5 pages
Thesis Pdepinns
No ratings yet
Thesis Pdepinns
67 pages
PINN Gentle Introduction
No ratings yet
PINN Gentle Introduction
26 pages
TFG Baldillou Salse Pau
No ratings yet
TFG Baldillou Salse Pau
73 pages
Application of Recurrent Neural Networkusing Matlab Simulink in Medicine
No ratings yet
Application of Recurrent Neural Networkusing Matlab Simulink in Medicine
8 pages
Deep Learning and Pure Mathematics
No ratings yet
Deep Learning and Pure Mathematics
16 pages
Pino-Mbd: Physics-Informed Neural Operator For Solving Coupled Odes in Multi-Body Dynamics
No ratings yet
Pino-Mbd: Physics-Informed Neural Operator For Solving Coupled Odes in Multi-Body Dynamics
9 pages
Two-Dimensional Klein-Gordon and Sine-Gordon Numerical Solutions Based On Deep Neural Network
No ratings yet
Two-Dimensional Klein-Gordon and Sine-Gordon Numerical Solutions Based On Deep Neural Network
13 pages
Ai Unit 4 Notes
No ratings yet
Ai Unit 4 Notes
11 pages
An Introduction To Mathematics of Deep Learning
No ratings yet
An Introduction To Mathematics of Deep Learning
14 pages
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet
Neural Networks and Fuzzy Logic
From Everand
Neural Networks and Fuzzy Logic
C. Naga Bhaskar
No ratings yet
Siwes Technical Report
No ratings yet
Siwes Technical Report
29 pages
DLL 3is Week 1
No ratings yet
DLL 3is Week 1
4 pages
Lesson Plan in Logic and Critical Thinking
No ratings yet
Lesson Plan in Logic and Critical Thinking
3 pages
BPD, Diagnosis
100% (1)
BPD, Diagnosis
5 pages
DSSSB Assistant Superintendent Question Paper 2019 PDF
No ratings yet
DSSSB Assistant Superintendent Question Paper 2019 PDF
55 pages
Laporan Praktikum MPS (Processing) - Teknisi 1200 2024
No ratings yet
Laporan Praktikum MPS (Processing) - Teknisi 1200 2024
10 pages
Rao Colostate 0053A 10885
No ratings yet
Rao Colostate 0053A 10885
145 pages
1Z0 1035 24 Demo
No ratings yet
1Z0 1035 24 Demo
4 pages
Alexandra Mae Dela Cruz Guevarra
No ratings yet
Alexandra Mae Dela Cruz Guevarra
2 pages
ElementarySchool 1-3 LessonPlan LearningtobeaSmartShopper
No ratings yet
ElementarySchool 1-3 LessonPlan LearningtobeaSmartShopper
5 pages
Haley Assignment 5
No ratings yet
Haley Assignment 5
11 pages
خطة علاجية لمادة اللغة الانجليزية للصف الثامن بعد نتائج الاختبار الوطني
No ratings yet
خطة علاجية لمادة اللغة الانجليزية للصف الثامن بعد نتائج الاختبار الوطني
1 page
Resume Rudra
No ratings yet
Resume Rudra
12 pages
ATL Matrix NO ICONS
No ratings yet
ATL Matrix NO ICONS
2 pages
Chapter 4 Rhetoric, Religion and Race
No ratings yet
Chapter 4 Rhetoric, Religion and Race
10 pages
Chapter Two: Individual Behavior in Organizations
No ratings yet
Chapter Two: Individual Behavior in Organizations
65 pages
Cohort 10 Recruitment Strategy Memo For Team
No ratings yet
Cohort 10 Recruitment Strategy Memo For Team
3 pages
Yr 8 Mathematics Final Revision - Worksheet
No ratings yet
Yr 8 Mathematics Final Revision - Worksheet
4 pages
Ejercicios de Ingles
No ratings yet
Ejercicios de Ingles
9 pages
SU ERP Student My Class Routine
No ratings yet
SU ERP Student My Class Routine
1 page
3599 - 02.seema G.Jagatap, & Dr.S.D.Sindkhedkar
No ratings yet
3599 - 02.seema G.Jagatap, & Dr.S.D.Sindkhedkar
3 pages
Remedial Teaching For Slow Learners
100% (4)
Remedial Teaching For Slow Learners
2 pages
A Pragmatic Study of Yoruba Proverbs in English-90
No ratings yet
A Pragmatic Study of Yoruba Proverbs in English-90
10 pages
Assignment 1 (Interviewing - Leadership & Motivation)
No ratings yet
Assignment 1 (Interviewing - Leadership & Motivation)
4 pages
Authentic Assessment
No ratings yet
Authentic Assessment
26 pages
Pa-1 Psy Paper Xi Final
No ratings yet
Pa-1 Psy Paper Xi Final
4 pages
Creative Writing in Eastern Visayas 1982-2018 (Merlie Alunan)
No ratings yet
Creative Writing in Eastern Visayas 1982-2018 (Merlie Alunan)
4 pages
Evolution of Management Theory: The Case of Production Management in Construction
100% (1)
Evolution of Management Theory: The Case of Production Management in Construction
9 pages
Level 3 NDT
No ratings yet
Level 3 NDT
5 pages
2.1-6 Task Sheet (Training Activity Matrix)
100% (1)
2.1-6 Task Sheet (Training Activity Matrix)
4 pages