0% found this document useful (0 votes)
67 views14 pages

Common Misconceptions About Neural Networks As Approximators

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views14 pages

Common Misconceptions About Neural Networks As Approximators

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

COMMON MISCONCEPTIONS ABOUT NEURAL

NETWORKS AS APPROXIMATORS

By William C. Carpenter, ~ Member, ASCE, and Jean-Francois Barthelemyz


Downloaded from ascelibrary.org by Universidad Nacional De Ingenieria on 10/19/18. Copyright ASCE. For personal use only; all rights reserved.

ABSTRACT." A current trend in scientific and engineering computing is to use


neural-network approximationsinstead of polynomialapproximationsor other types
of approximations involving mathematical functions. A number of misconceptions
have arisen concerning neural networks as approximators. This paper eliminates
these misconceptions. In so doing, the paper examines the computational efficiency
of neural-network approximations compared to polynomial approximations, ex-
amines the effect of using underdetermined neural-network approximations, ex-
amines the effect of design point selection on the quality of neural-network ap-
proximations, and examines the computing time required to train neural networks
compared to the time to develop polynomial approximations.

INTRODUCTION

A p p r o x i m a t i o n s have been used in engineering since the profession's


inception. Very recently, the use of neural networks as a p p r o x i m a t o r s has
become popular. The brain is an amazing organ with incredible cognitive
and computational capacity. Since neural networks mimic the workings of
the brain, neural networks have been attributed with numerous advantages
over other types of approximators. A n u m b e r of the supposed advantages
of neural networks, however, are misconceptions. This p a p e r addresses a
number of these misconceptions.
Neural-network a p p r o x i m a t i o n s are c o m p a r e d to polynomial approxi-
mations to ascertain their characteristics. The criteria used to compare the
quality of approximations is discussed and a brief description of the meth-
odology for making neural-network and polynomial approximations is given.
Various misconceptions about neural-network approximations are then dis-
cussed using examples to illustrate salient points.

QUALITY OF FIT
Consider a p r o b l e m with n i n d e p e n d e n t variables, the c o m p o n e n t s of the
vector {x} = (xl, x 2 , . . . , x,,)'. A total of N sets of the i n d e p e n d e n t variables,
referred to as design points, will be considered, {x}j, j = 1, N. A t the design
point {x}j, let y~ be the value of the function to be approximated. The pairing
of {x}j and yj is referred to as the jth training pair. Let 9j be the value of
the approximating function. The approximating function, ))j, should closely
match the function, yj, not only at the design points, {x}j, but over the entire
region of interest.

~Prof., Dept. of Civ. Engrg. and Mech., Univ. of South Florida, Tampa, FL
33620.
2Sr. Aerosp. Engr., NASA Langley Res. Center, Hampton, VA 23681.
Note. Discussion open until December 1, 1994. To extend the closing date one
month, a written request must be filed with the ASCE Manager of Journals. The
manuscript for this paper was submitted for review and possible publication on
January 14, 1993. This paper is part of the Journal of Computing in CivilEngineering,
Vol. 8, No. 3, July, 1994. 9 ISSN 0887-3801/94/0003-0345/$2.00 + $.25 per
page. Paper No. 5442.

345

J. Comput. Civ. Eng., 1994, 8(3): 345-358


Fit at Design Points
The approximating function 3~ closely approximates the function y at the
design points when 8 2, the sum of the squares of the residuals, is small where
N
Downloaded from ascelibrary.org by Universidad Nacional De Ingenieria on 10/19/18. Copyright ASCE. For personal use only; all rights reserved.

82 = ~ (yj _ ))~)2 (1)


j=l

Let )7 be the average value of y at the design points, In this study, one
measure of the closeness of fit to be considered is the nondimensional value
v where

v =
•/• =1
(y~ - ~)-~

N
9 100 (2)

The coefficient v = nondimensional root-mean-square (RMS) error at the


design points. Thus, v = 0 is a necessary and sufficient condition that the
approximating function fit the actual function at the N design points.

Overall Fit
Just because the approximating function exactly fits the function at N
design points does not guarantee that it gives a good fit over the region of
interest. It is therefore desirable over the region of interest to have a measure
of the quality of overall fit. Consider N G design points, referred to as grid
points, scattered throughout the region of interest and assume that if the
approximating function closely matches the exact function at these points
that a good approximation is obtained throughout the entire region of in-
terest. A measure of the quality of overall fit is taken as

U/.~ [(yj -- yj)~/NO]

vo = 9100 (3)

where Yo = average value of y at the grid points. A small value of vo


indicates that the approximating function did a good job of approximation
over the region of interest.

APPROXIMATIONS
Neural-Network A p p r o x i m a t i o n s
While the initial motivation for developing artificial neural nets was to
develop computer models that could imitate certain brain functions, neural
nets can be thought of as another way of developing approximations (these
approximations in many references are referred to as response surfaces).
Different types of neural networks are available (Rumelhart et al. 1986;
Anderson et al. 1988) but the type of neural nets considered in this paper
are feed-forward networks with one hidden layer as shown in Fig. 1. This
type of neural net has been used previously to develop response surfaces
(Vanluchene et al. 1990; Hajela et al. 1990; Swift et al. 1991; Berte et al.
1990; Rogers et al. 1992) and is capable, with enough nodes on the hidden

346

J. Comput. Civ. Eng., 1994, 8(3): 345-358


Oj

'i WN -
Downloaded from ascelibrary.org by Universidad Nacional De Ingenieria on 10/19/18. Copyright ASCE. For personal use only; all rights reserved.

input output

hidden
FIG. 1. Neural Network

layer, of approximating any continuous function (Hornik et al. 1989; Hornik


et al. 1990; White 1990; Gallant et al. 1992).
For the neural net of Fig. 1, associated with each node on the hidden
layer_, node j, and each output node, node k, are coefficients or weights, 0j
and Ok, respectively. These weights are referred to as the biases. Associated
with each path, from an input node i to node j on the hidden layer, is an
associated weight, wji and from node j on the hidden layer to output node
k is an associated weight wkj. Let qi be inputs entered at node i. Node j on
the hidden layer receives weighted inputs, wjiq~, which are summed and used
with a bias 0j and an activation function to yield an output rj. The activation
function considered in this paper is the sigmoid function (Rumelhart et al.
1986; Anderson et al. 1988)
1
rj = 1 + e ~j,q,-o, (4)

Output node k then receives inputs ~kirj, which are summed and used
with a bias 0k and an activation function to yield an output y~. Some
variation of the delta-error back-propagation algorithm (Rumelhart et al.
1986; Anderson et al. 1988) is then used to adjust the weights on each
learning cycle so as to reduce the difference between the predicted and
desired outputs. In this investigation, studies were performed using the
program NEWNET (Carpenter 1992b). N E W N E T minimizes the sum of
the squares of the residuals in (1) with respect to the weights and biases of
the net. Training of the net is thus formulated as an unconstrained min-
imization problem. Solution of this minimization problem is performed using
the method of Davidon, Fletcher, and Powell, a quasi-Newton method
(Reklaitis et al. 1983; Fox 1971). That algorithm performs a series of one-
dimensional searches along search directions. Search directions are deter-
mined by building an approximation to the inverse Hessian matrix using
gradient information. Gradients required by that algorithm are obtained
using back-propagation. One-dimensional searches are performed along the
search directions using an interval-shortening routine.

347

J. Comput. Civ. Eng., 1994, 8(3): 345-358


Polynomial Approximations
Polynomial approximations can be made using an m = k + 1 term
polynomial expression (Box et al. 1987; Khuri et al. 1987; Myers et al. 1971)
thus
Downloaded from ascelibrary.org by Universidad Nacional De Ingenieria on 10/19/18. Copyright ASCE. For personal use only; all rights reserved.

Y = bo + b~Xl + "" bkXk (5)


where Xj = some expression involving the design variables. For example,
a second-order polynomial approximation in two variables could be of the
form
.9 = bo + blxl + b2x2 + b3x2 q- b4xlx2 + bsx~ (6)
Values of the function to be approximated at the N design points can be
used to determine the m = k + 1 undetermined coefficients in the poly-
nomial expression. For N design points, (5) yields

Y2 = X12 "'" Xk2 bl


(7)
XIN ' XkA
or

{Y} = [Z]{b} (8)


where {Y} = N • 1 matrix; [Z] = an N x m matrix; and {b} = an m x
1 matrix.

Exactly Determined Approximation


When N = m, the approximation is exactly determined and the matrix
{b} can be determined from (8).

Overdetermined Approximation
With N > m, (8) can be solved in a least-squares sense thus (Box et al.
1987; Khuri et al. 1987; Myers 1971)
[Z]'{Y} = [Z]'[Z]{b} (9)
or

{b} = ( [ Z ] t [ Z ] ) - I [ Z ] t { Y } = [H]-I[Z]t(Y} (10)


Eq. (10) in effect, chooses the terms of {b} so as to minimize the square of
the residual as defined in (1).

Underdetermined Approximation
When N < m, the approximation is underdetermined. A solution can be
obtained by choosing the terms of {b} so as to minimize the square of the
residual as defined in (1). However, a direct solution can be obtained by
using the concept of pseudoinverse (Greenville 1959; Penrose 1955). As-
sume that the rank of matrix [Z] is N and define the pseudoinverse of matrix
Z, Z* thus
[z]* = [z]'([z][z]9 -1 (11)
Solution of (8) is then
348

J. Comput. Civ. Eng., 1994, 8(3): 345-358


{b} : [Z]*{Y} + [Q]{w} (12)
where {w} = an (m - N) column matrix of arbitrary coefficients and [Q]
= a m x (m - N) matrix formed from any rn - N independent columns
of the matrix [R] thus
Downloaded from ascelibrary.org by Universidad Nacional De Ingenieria on 10/19/18. Copyright ASCE. For personal use only; all rights reserved.

[R] : [I] - [Z]*[Z] (13)


The approximating function using (12) exactly matches the function at the
design points for any values of wi. Thus, nonunique approximations are
obtained when approximations are underdetermined.

Types of Approximations
Overdetermined, exactly determined, and underdetermined approxima-
tions have more, an equal number, or fewer training pairs than there are
undetermined parameters associated with the approximations. To have an
approximation, 3~, that exactly matches the function, y, not only at the design
points but over the region of interest, one should use an overdetermined
approximation. Underdetermined and exactly determined approximations
may give a good approximation at the design points, but not necessarily a
good approximation over the region of interest. Recent studies (Carpenter
1992a) have indicated that approximations that are from 20% to 50%
overdetermined tend to be computationally efficient. In other words, they
are a good compromise between doing as few functional evaluations as
possible while still obtaining a good approximation with the chosen ap-
proximating function.

MISCONCEPTIONS
A number of presentations and publications on the application of neural
networks to engineering problems indicate that several misconceptions exist
concerning neural networks as approximators. This paper points out these
misconceptions.

Misconception: Neural-Network Approximations are Superior to


Other Types of Mathematical Approximations such as
Polynomial Approximations
The number of undetermined parameters associated with a neural net-
work are the weights and biases of the network. The number of undeter-
mined parameters associated with a polynomial approximation are the coef-
ficients associated with the terms of the polynomial approximation. Carpenter
et al. (1993) found on a limited number of test problems that, in a general
sense, the performance of an approximation depends upon the number of
undetermined parameters associated with the approximation. An example
elucidates this point.
Consider the problem of determining the minimum volume of the five-
bar truss of Fig. 2 subject to stress, stability, and member size constraints.
The form of the constraint equations are detailed by Swift et al. (1991) and
Carpenter et al. (1993). The design variables of the problem are the areas
of the five members of the truss and the xl and x2 coordinates of node 2.
One solution technique is to obtain the minimum volume (VOL) of the
truss for various fixed values of the coordinates of node 2, and then to obtain
the optimum truss volume using the functional relationship between VOL
and the coordinates of node 2.
349

J. Comput. Civ. Eng., 1994, 8(3): 345-358


Downloaded from ascelibrary.org by Universidad Nacional De Ingenieria on 10/19/18. Copyright ASCE. For personal use only; all rights reserved.

x
2
.o .to
,~ 1 0 in -J,,-~

'1'1 X
I'0

-r!.
?
w
'-rl
,,-I
V--
t~ 0
3 84
o
0 ~ b
0 --.,.
,,-,i ?
0
w
C

,,-I F
I
I r~
0 r
Nw Go I l
~" I O-t E ---.k
C

J. Comput. Civ. Eng., 1994, 8(3): 345-358


3 r,
o_
Ig

X
1

21 • ......... . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .

I61.4 - - ~ ' i ....~ I0


Downloaded from ascelibrary.org by Universidad Nacional De Ingenieria on 10/19/18. Copyright ASCE. For personal use only; all rights reserved.

~; ....... -~,_ ...... -====~ ......................... . .......... : L - ~

o 81 ....................... :::::::::::::::::::::::::::::::::::: ............ :;]I[::::L[[[;;L:2;L[Z_.


:::t .............. ,........... : ..... _ ......... :-?==?%==t .............: ..........
20 30 48 50 60 70 80 90 100 110 120 130
N, Number of Designs

2nd order pely - + - 3rd order po[y .-ee- 4th order poly
5th orderpdy - x - . 3node net -:~" 5 node net
- ~ - 7 node net

FIG. 4. Five-Bar Truss, Performance of Approximations

Fig. 3 represents the functional relationship between VOL and the xl and
x2 coordinates of node 2. Using differing numbers of design points to build
the approximations, various order-polynomial approximations and various
neural-net approximations with varying numbers of nodes on the hidden
layer were developed to approximate VOL. The approximations were then
compared to the exact function at N G = 961 points (a 31 x 31 evenly
spaced grid of points) and the parameter vc as defined in (3) was calculated.
Fig. 4 gives the value of va versus the number of design points used to build
the approximation for each of the approximations examined. The approx-
imations considered were polynomial approximations of orders two through
five and neural nets with three, five, and seven nodes on the hidden layer.
There are a different number of undetermined parameters associated with
the various approximations. These numbers (coefficient m) are shown next
to the curves on Fig. 4. Notice that the performance of the approximations
is directly related to the number of undetermined parameters associated
with that approximation. The 2nd-order-polynomial approximation with six
undetermined parameters performed the poorest. The artificial neural net
with seven nodes on the hidden layer with 29 undetermined parameters
performed the best.
Note that one method of forming an approximation is not inherently
superior to the other. On selected test problems, the performance of good
polynomial approximations and good neural-net approximations have been
found to be comparable. Obtaining a good approximation for a given func-
tion, however, is not a simple task. One must search over a series of network
architectures or over a family of polynomial functions to find a good ap-
proximation.
Searches over a family of polynomial functions can be accomplished, for
example, by starting with a linear function and then examining functions of
increasing higher order. Some may find that such an approach yields more
insight into the nature of the function being approximated than with neural-
network approximations. Searches over network architecture can be readily
accomplished by varying the number of nodes on a hidden layer or layers
351

J. Comput. Civ. Eng., 1994, 8(3): 345-358


which may, in part, account for the current popularity of using neural net-
works as approximators.

Misconception: Neural Networks Can be Trained with Fewer


Downloaded from ascelibrary.org by Universidad Nacional De Ingenieria on 10/19/18. Copyright ASCE. For personal use only; all rights reserved.

Training Pairs Than Other Types of Approximations


A number of papers have appeared recently reporting neural-network
approximations that were made using very few training pairs to train the
approximation (Vanluchene et al. 1990). The argument used to justify these
approximations is that the approximations can be trained to give a very
small or zero root-mean-square error at the design points [parameter v
0 in (2)]. The key here is that such an approximation can match the exact
function at the design points but be a poor and nonunique approximation
over the region of interest.
Underdetermined polynomial approximations can be developed using the
pseudoinverse technique (Greville 1959; Penrose 1955). However, under-
determined polynomial approximations are seldom, if ever, used because
the approximations obtained are not unique. For undetermined polynomial
approximations, information is only available to determine some of the
parameters associated with the approximation. Approximations obtained
are then functions of the remaining associated parameters. Different values
of these remaining parameters give different approximations. The nonu-
niqueness of the undetermined parameters is shown by (12). Different sets
of these parameters give approximations that match the exact function at
the design points but are different over the region of interest.
Underdetermined neural networks are networks in which the number of
training pairs used to train the network are fewer than the number of weights
and biases associated with the network. Such networks can be trained to
exactly duplicate the exact function at the design points. However, just as
with underdetermined polynomial approximations, approximations thus ob-
tained are not unique. Such networks, starting from different initial values
of the weights and biases, when trained will, in general, yield different
approximations. An example illustrates this point.
Consider the following one-variable function:
y = 2x + sin(~rx) + sin(2"~x); 0 -< x <- 1 (14)
A neural-net approximation was made of this function. The net had one
node on the input layer, which receives the x value of the design points;
had one node on the output layer, which predicts the y response; and had
one hidden layer with four nodes on that hidden layer. There are 13 un-
determined parameters associated with the net (eight weights and five biases).
Thus, at least 13 training pairs are required to uniquely determine these
parameters. For illustration purposes, however, the net was trained with
only four training pairs. As the number of training pairs is less than the
number of undetermined parameters, the net is underdetermined. One can
see in Fig. 5 the variability in predicted response. This variability points out
the nonunique nature of undetermined approximations. Fig. 6 shows results
using a net with two nodes on the hidden layer trained with seven training
pairs. Here, there are seven pieces of information to solve for the seven
undetermined parameters associated with the net (four weights and three
biases). Notice in Fig. 6 that each training of the net yielded the same
approximation.
The conclusion to be drawn from this example is that it is not desirable
352

J. Comput. Civ. Eng., 1994, 8(3): 345-358


2.5- .2.~!.~ .~.~..'.. ~..%~- ..................................................................
2"

1,5
Downloaded from ascelibrary.org by Universidad Nacional De Ingenieria on 10/19/18. Copyright ASCE. For personal use only; all rights reserved.

0.5
.............................................................. 0-1 trainina pairs I......

-0.5
o:1 o:2 o:3 0:4 ors o:6 d.7 o:8 o:9
x

[ .--~---exact-e-- net 1 -~-- net2 --~-" net3 I

FIG. 5. One Dimensional Example, Neural Network, ih = 4, Four Training Pairs

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1


X

[+ exact-~-, net1 -~-- net2 --~-- net3 1

FIG. 6. One Dimensional Example, Neural Network, ih = 2, Seven Training Pairs

to train large nets with only a few training pairs. These nets can be trained
so that the approximation exactly fits the function to be approximated at
the design points and thus can be trained to have v = 0 but there can be
a great variation, from one training to the next, in the approximations over
a region of interest (i.e., there will be a large variation in the parameter
vc). The next example reemphasizes this point.
Fox (1971) investigated a function
y = 10x 4 - 20x2 x2 + 10x~ + x 2 - 2x a + 5 (15)
which has banana shaped contours as seen in Fig. 7. Fig. 8 shows results
using various neural-network approximations. In each case there are two
input nodes receiving the Xl and x2 coordinates of the design and one output
node predicting the response ,9. One hidden layer was considered with three,
five, or seven nodes on the hidden layer. Each net was trained three times
using 16 training pairs. Fig. 8 gives, for each neural net, the lowest value
of the parameter VG obtained in the three trainings and the difference of
the highest value of vc obtained minus the lowest value of VG in the three
trainings. The underdetermined nets (five and seven nodes on the hidden
layer) did not yield unique approximations. Three training runs for each of
these nets all gave v = 0 (exact fit at the design points) but yielded greatly
different results over the region of interest. The net with three nodes on
353

J. Comput. Civ. Eng., 1994, 8(3): 345-358


2.0
Downloaded from ascelibrary.org by Universidad Nacional De Ingenieria on 10/19/18. Copyright ASCE. For personal use only; all rights reserved.

1.5 Level func


7 20.00
6 17.50
5 15.00
1.0 4 12.50
3 10,00
2 7.50
c~ 1 5.00
0.5

0.0

-0.5

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0

X
1
FIG. 7. Fox's Banana Function

/
100
90. / ~[over-determined i ~under'determined l - -
,/
u) 80-
e 70-
E 60

I J--
50-
40-
b
30
20-
10-
0-
3 5 7
Number of nodes on Ndden layer, ih

[ ~ lowest vg ~ (highest-lowest) vg ]

FIG. 8. Fox's Banana Function, Error Parameters for Three, Five, and Seven Nodes
on Hidden Layer

the hidden layer (an overdetermined net) had a small variation in vc of


approximately 5% (a manifestation of the algorithm employed to terminate
training). Nets with five and seven nodes on the hidden layer had large
variations of v a indicating that nonunique approximations are obtained with
underdetermined neural nets.
These examples demonstrate that a necessary condition for obtaining a
unique approximation is to have the number of design points used to train
the approximation equal to or greater than the number of parameters as-
sociated with the approximation.
354

J. Comput. Civ. Eng., 1994, 8(3): 345-358


Misconception: Neural Networks are Less Sensitive to Training Data
Than Other Types of Approximations
A deficient set of training pairs, which would give a singular [HI matrix
in (10) when attempting a polynomial approximation, can be used to train
a neural network. Depending on the architecture or the network, the neural-
Downloaded from ascelibrary.org by Universidad Nacional De Ingenieria on 10/19/18. Copyright ASCE. For personal use only; all rights reserved.

network approximation can be trained to yield a small or zero root-mean-


square error at the design points. Thus, the misconception has arisen that
neural networks are not as sensitive to which training pairs are selected as
other types of approximations (Carpenter et al. 1993). The next example
indicates that a deficient set of training pairs, which would not permit certain
polynomial approximations, may not yield a unique neural-network ap-
proximation. In other words, different trainings of the network from dif-
ferent initial values of the associated weights and biases, using the deficient
set of training pairs, may yield different approximations.
Consider the function
y = 1 + X 1 -~- X 2 -~- X 3 -~- X 2 "t- X I X 2 -}- X l X 3 -}- X ~ "l- X 2 X 3 -~- X ~ (16)
Twelve training pairs were used to make a polynomial approximation and
a two-node neural-net approximation of this function. The design points
associated with these training pairs are shown in Fig. 9. Information is not
available from this design for determining the mixed derivatives of the
function to be approximated. Thus, a complete second-order polynomial
approximation is not possible (Carpenter 1992a). If a solution is attempted
using (10), a singular [H] matrix is encountered. An approximation can be
made, however, using a polynomial of the form
Y = bo + blxl + b2x2 + b.3x3 -~- bex~ + bsx 2 + b6x2 (17)
As there are now more training pairs than there are undetermined param-
eters, an overdetermined approximation is obtained. Such an approximation
was developed and yielded a value of vc = 34.6.
A neural net with two nodes on the hidden layer was then trained with
the 12 training pairs. The net was trained 10 times starting from different

x
]2

x
1

/:
x3

FIG. 9. Star Pattern of Design Pointsm12 Design Points

355

J. Comput. Civ. Eng., 1994, 8(3): 345-358


randomly selected sets of weights and biases. Even though the number of
training pairs, 12, is greater than the number of undetermined parameters
associated with the net, 11, nonunique approximations were obtained. The
value of vc for these approximations ranged from 32.9 to 93.5. Some of the
approximations were as good as or better than the polynomial approxi-
Downloaded from ascelibrary.org by Universidad Nacional De Ingenieria on 10/19/18. Copyright ASCE. For personal use only; all rights reserved.

mation. However, a great deal of variability was encountered from one


approximation to the next. Thus, it can be concluded that for neural-net
approximations, having more training pairs than the number of associated
undetermined parameters is only a necessary condition for obtaining a unique
approximation but that it is not a sufficient condition. Deficient designs can
lead to nonunique approximations.

Misconception: Neural-Network Approximations Are as Easy to


Obtain as Other Types of Approximations such as
Polynomial Approximations
Training of a neural network can be a lengthy procedure. Most neural-
network algorithms use the delta-error back-propagation algorithm to adjust
the weights and biases on each learning cycle so as to reduce the difference
between the predicted and desired outputs. This method is basically a
steepest descent method of minimizing an error functional in which the
amount that the weights are changed in a cycle is defined by a learning
parameter, Steepest descent methods are notoriously inefficient (Reklaitis
et al. 1983; Fox 1971). For example, 20,000-100,000 training cycles were
typically required to train the nets reported in Carpenter (1993) when a
variation of the steepest descent method was employed. The use of quasi-
Newton method to minimize the error functional and the use of an algorithm
for obtaining the optimum value of the learning parameter for each cycle
greatly reduces the amount of training time required. These modifications
were incorporated in the program N E W N E T (Carpenter 1992b). Using
NEWNET, the nets in Carpenter et al. (1993) were trained in several thou-
sand cycles. Even with this improvement, training times for the neural nets
were many times slower than that required with corresponding polynomial
approximations.
To give insight into the time involved in training a neural network, training
times for a neural network investigated at NASA Langley Research Center
are also reported. The network was to be used to yield a response surface
for an engineering design optimization study. The network had four input
nodes, 15 nodes on a single hidden layer, and one output node. It had
previously been trained with 46 training pairs using the program NETS
(Baffes 1989) to a root mean square error of 0.002. Training using the
program NETS had taken from four to five days of continuous training on
a SUN, SPARC Station 1+ computer. The authors trained this net using
the program NEWNET. Training of the net with program N E W N E T took
22 central processing unit (CPU) min. Thus the use of the improved training
scheme of NEWNET greatly improved the training time of the net. Still,
this training time is much greater than that required with a polynomial
approximation.
Training times, however, may not be good measures of the desirability
of using one type of approximation over another. In general, a researcher
does not initially know the correct network architecture or polynomial func-
tion. The researcher must systematically search over a series of network
architectures or over a family of polynomial functions to find a good ap-
proximation. Thus, training times for a given network or polynomial ap-
356

J. Comput. Civ. Eng., 1994, 8(3): 345-358


proximation may be insignificant compared to the total effort involved in
developing a good approximation. A systematic search of network archi-
tecture can be accomplished by simply varying the number of nodes on a
single hidden layer (Hornik et al. 1989, 1990). Thus, some researchers may
find searching over net architecture easier than searching over a family of
Downloaded from ascelibrary.org by Universidad Nacional De Ingenieria on 10/19/18. Copyright ASCE. For personal use only; all rights reserved.

polynomials.

CONCLUSION
This paper points out the danger of using underdetermined approxima-
tions. This danger is present whether the approximations are underdeter-
mined polynomial approximations or underdetermined neural nets. The
authors are not aware of any studies using underdetermined polynomial
approximations. However, a number of reported studies have used nets with
a large number of nodes on a hidden layer or layers (and thus a large number
of associated undetermined parameters) trained with a relatively few num-
ber of training pairs. Thus, these nets are underdetermined approximators.
The variability in approximations, which can be obtained with such under-
determined nets, has been emphasized. The paper further points out that
design-point selection is important when training neural networks. Deficient
design-point selection can also give variability in an approximation.
For the examples in Carpenter et al. (1993) and those of this study,
overdetermined polynomials and neural nets with equivalent numbers of
associated coefficients gave comparable results. It should be emphasized
that only a limited set of examples and only feedforward nets with one
hidden layer were considered. While it is not possible to make a general
statement based on these limited studies, it does seem that the selection of
one type of approximation over the other can be reasonably based on per-
sonal preference. Training times of neural networks and polynomial ap-
proximations were discussed. While it was found that neural networks take
much longer, in general, to train than polynomial approximations, it was
pointed out that the ease in which a family of network architectures can be
examined may more than compensate for the lengthy training times of those
networks.

ACKNOWLEDGMENT
Selected examples from this paper were presented at the conferences The
Third International Conference on the Application of Artificial Intelligence
to Civil & Structural Engineering, Edinburgh, Scotland, 17-19 August, 1993
and ANNIE'93 Artificial Neural Networks in Engineering, Rolla, Missouri,
November 14-17, 1993.

APPENDIX. REFERENCES
Anderson, J., and Rosenfeld, E. (1988). Neurocomputing; foundations of research.
MIT Press, Cambridge, Mass.
Baffes, P. T. NETS user's guide. (1989). Software Technology Branch, Lyndon B.
Johnson Space Center.
Berke, L., and Hajela, P. (1990). "Application of artificial neural nets in structural
mechanics." Shape and Layout Optimization of Structural Systems, Int. Center for
Mech. Sciences, Udine, Italy.
Box, G. E. P., and Draper, N. R. (1987). Empirical model-building and response
surfaces. John Wiley and Sons, New York, N.Y.
Carpenter, W. C. (1992a). "Effect of design selection on response surface perfor-

357

J. Comput. Civ. Eng., 1994, 8(3): 345-358


mance." Final Rep. NASA Grant No. NAG-l-1378, Univ. of South Florida, Tampa,
Fla.
Carpenter, W. C. (1992b). NEWNET user's guide. Univ. of South Florida, Tampa,
Fla.
Carpenter, W. C., and Barthelemy, J. F. M. (1993). "A Comparison of polynomial
Downloaded from ascelibrary.org by Universidad Nacional De Ingenieria on 10/19/18. Copyright ASCE. For personal use only; all rights reserved.

approximations and artificial neural nets as response surfaces." Struct. Optimi-


zation, Vol. 5, 1-15.
Fox, R. L. (1971). Optimization methods for engineering design. Addison-Wesley
Publishing Co., Reading, Mass.
Gallant, A., and White, H. (1992). "On learning the derivatives of an unknown
mapping with multilayer feedforward networks." Neural networks 5, 129-138.
Greville, T. N. E. (1959). "The pseudoinverse of a rectangular or singular matrix
and its application to the solution of systems of linear equations." SlAM Rev.,
1(1), 38-43.
Hajela, P., and Berke, L. (1990). "Neurobiological computational models in struc-
tural analysis and design." Paper AIAA-90-1133-CP, AIAA/ASME/ASCE/AHS/
ASC 31st Struct., Struct. Dynamics and Mat. Conf., New York, N.Y.
Hornik, K., Stinchcombe, M., and White, H. (1989). "Multilayer feedforward net-
works are universal approximators." Neural networks 2, 359-366.
Hornik, K., Stinchcombe, M., and White, H. (1990). "Universal approximation of
an unknown mapping and its derivatives using multilayer feedforward networks."
Neural networks 3, 551-560.
Khuri, A. I., and Cornell, J. A. (1987). Response surfaces, designs and analyses.
Marcel Dekker, Inc., New York, N.Y.
Penrose, R. (1955). "A generalized inverse for matrices." Proc. Cambridge Philos-
ophy Soc., Vol. 51,406-413.
Myers, R. H. (1971). Response surface methodology. Allyn and Bacon, Boston.
Reklaitis, G. V., Ravindran, A., and Ragsdell, K. M. (1983). Engineering optimi-
zation, methods and applications. John Wiley & Sons, New York, N.Y.
Rogers, J. L., and LaMarsh, W. J. (1992). "Application of a neural network to
simulate analysis in an optimization process." Artificial intelligence ht design 92,
John S. Gero, ed., Kluwer Academic Publishers, Boston, Mass., 739-754.
Rumelhart, D., and McClelland, J. (1986). Parallel distributed processing, vols. I
and H, MIT Press, Cambridge, Mass.
Swift, R. A., and Batill, S. M. (1991). "Application of neural networks to preliminary
structural design." AIAA/ASME/AHS/ASC 32nd Struct., Struct. Dynamics and
Mat. Conf., Baltimore, Md., 335-343.
Vanluchene, R. D., and Roufei Sun. (1990). "Neural networks in structural engi-
neering." Microcomputers in Cir. Engrg., 5(3), 207-215.
White, H. (1990). "Connectionist nonparametric regression--multilayer feedfor-
ward networks can learn arbitrary mappings." Neural networks 3, 535-549.

358

J. Comput. Civ. Eng., 1994, 8(3): 345-358

You might also like