Common Misconceptions About Neural Networks As Approximators
Common Misconceptions About Neural Networks As Approximators
NETWORKS AS APPROXIMATORS
INTRODUCTION
QUALITY OF FIT
Consider a p r o b l e m with n i n d e p e n d e n t variables, the c o m p o n e n t s of the
vector {x} = (xl, x 2 , . . . , x,,)'. A total of N sets of the i n d e p e n d e n t variables,
referred to as design points, will be considered, {x}j, j = 1, N. A t the design
point {x}j, let y~ be the value of the function to be approximated. The pairing
of {x}j and yj is referred to as the jth training pair. Let 9j be the value of
the approximating function. The approximating function, ))j, should closely
match the function, yj, not only at the design points, {x}j, but over the entire
region of interest.
~Prof., Dept. of Civ. Engrg. and Mech., Univ. of South Florida, Tampa, FL
33620.
2Sr. Aerosp. Engr., NASA Langley Res. Center, Hampton, VA 23681.
Note. Discussion open until December 1, 1994. To extend the closing date one
month, a written request must be filed with the ASCE Manager of Journals. The
manuscript for this paper was submitted for review and possible publication on
January 14, 1993. This paper is part of the Journal of Computing in CivilEngineering,
Vol. 8, No. 3, July, 1994. 9 ISSN 0887-3801/94/0003-0345/$2.00 + $.25 per
page. Paper No. 5442.
345
Let )7 be the average value of y at the design points, In this study, one
measure of the closeness of fit to be considered is the nondimensional value
v where
v =
•/• =1
(y~ - ~)-~
N
9 100 (2)
Overall Fit
Just because the approximating function exactly fits the function at N
design points does not guarantee that it gives a good fit over the region of
interest. It is therefore desirable over the region of interest to have a measure
of the quality of overall fit. Consider N G design points, referred to as grid
points, scattered throughout the region of interest and assume that if the
approximating function closely matches the exact function at these points
that a good approximation is obtained throughout the entire region of in-
terest. A measure of the quality of overall fit is taken as
vo = 9100 (3)
APPROXIMATIONS
Neural-Network A p p r o x i m a t i o n s
While the initial motivation for developing artificial neural nets was to
develop computer models that could imitate certain brain functions, neural
nets can be thought of as another way of developing approximations (these
approximations in many references are referred to as response surfaces).
Different types of neural networks are available (Rumelhart et al. 1986;
Anderson et al. 1988) but the type of neural nets considered in this paper
are feed-forward networks with one hidden layer as shown in Fig. 1. This
type of neural net has been used previously to develop response surfaces
(Vanluchene et al. 1990; Hajela et al. 1990; Swift et al. 1991; Berte et al.
1990; Rogers et al. 1992) and is capable, with enough nodes on the hidden
346
'i WN -
Downloaded from ascelibrary.org by Universidad Nacional De Ingenieria on 10/19/18. Copyright ASCE. For personal use only; all rights reserved.
input output
hidden
FIG. 1. Neural Network
Output node k then receives inputs ~kirj, which are summed and used
with a bias 0k and an activation function to yield an output y~. Some
variation of the delta-error back-propagation algorithm (Rumelhart et al.
1986; Anderson et al. 1988) is then used to adjust the weights on each
learning cycle so as to reduce the difference between the predicted and
desired outputs. In this investigation, studies were performed using the
program NEWNET (Carpenter 1992b). N E W N E T minimizes the sum of
the squares of the residuals in (1) with respect to the weights and biases of
the net. Training of the net is thus formulated as an unconstrained min-
imization problem. Solution of this minimization problem is performed using
the method of Davidon, Fletcher, and Powell, a quasi-Newton method
(Reklaitis et al. 1983; Fox 1971). That algorithm performs a series of one-
dimensional searches along search directions. Search directions are deter-
mined by building an approximation to the inverse Hessian matrix using
gradient information. Gradients required by that algorithm are obtained
using back-propagation. One-dimensional searches are performed along the
search directions using an interval-shortening routine.
347
Overdetermined Approximation
With N > m, (8) can be solved in a least-squares sense thus (Box et al.
1987; Khuri et al. 1987; Myers 1971)
[Z]'{Y} = [Z]'[Z]{b} (9)
or
Underdetermined Approximation
When N < m, the approximation is underdetermined. A solution can be
obtained by choosing the terms of {b} so as to minimize the square of the
residual as defined in (1). However, a direct solution can be obtained by
using the concept of pseudoinverse (Greenville 1959; Penrose 1955). As-
sume that the rank of matrix [Z] is N and define the pseudoinverse of matrix
Z, Z* thus
[z]* = [z]'([z][z]9 -1 (11)
Solution of (8) is then
348
Types of Approximations
Overdetermined, exactly determined, and underdetermined approxima-
tions have more, an equal number, or fewer training pairs than there are
undetermined parameters associated with the approximations. To have an
approximation, 3~, that exactly matches the function, y, not only at the design
points but over the region of interest, one should use an overdetermined
approximation. Underdetermined and exactly determined approximations
may give a good approximation at the design points, but not necessarily a
good approximation over the region of interest. Recent studies (Carpenter
1992a) have indicated that approximations that are from 20% to 50%
overdetermined tend to be computationally efficient. In other words, they
are a good compromise between doing as few functional evaluations as
possible while still obtaining a good approximation with the chosen ap-
proximating function.
MISCONCEPTIONS
A number of presentations and publications on the application of neural
networks to engineering problems indicate that several misconceptions exist
concerning neural networks as approximators. This paper points out these
misconceptions.
x
2
.o .to
,~ 1 0 in -J,,-~
'1'1 X
I'0
-r!.
?
w
'-rl
,,-I
V--
t~ 0
3 84
o
0 ~ b
0 --.,.
,,-,i ?
0
w
C
,,-I F
I
I r~
0 r
Nw Go I l
~" I O-t E ---.k
C
X
1
21 • ......... . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
2nd order pely - + - 3rd order po[y .-ee- 4th order poly
5th orderpdy - x - . 3node net -:~" 5 node net
- ~ - 7 node net
Fig. 3 represents the functional relationship between VOL and the xl and
x2 coordinates of node 2. Using differing numbers of design points to build
the approximations, various order-polynomial approximations and various
neural-net approximations with varying numbers of nodes on the hidden
layer were developed to approximate VOL. The approximations were then
compared to the exact function at N G = 961 points (a 31 x 31 evenly
spaced grid of points) and the parameter vc as defined in (3) was calculated.
Fig. 4 gives the value of va versus the number of design points used to build
the approximation for each of the approximations examined. The approx-
imations considered were polynomial approximations of orders two through
five and neural nets with three, five, and seven nodes on the hidden layer.
There are a different number of undetermined parameters associated with
the various approximations. These numbers (coefficient m) are shown next
to the curves on Fig. 4. Notice that the performance of the approximations
is directly related to the number of undetermined parameters associated
with that approximation. The 2nd-order-polynomial approximation with six
undetermined parameters performed the poorest. The artificial neural net
with seven nodes on the hidden layer with 29 undetermined parameters
performed the best.
Note that one method of forming an approximation is not inherently
superior to the other. On selected test problems, the performance of good
polynomial approximations and good neural-net approximations have been
found to be comparable. Obtaining a good approximation for a given func-
tion, however, is not a simple task. One must search over a series of network
architectures or over a family of polynomial functions to find a good ap-
proximation.
Searches over a family of polynomial functions can be accomplished, for
example, by starting with a linear function and then examining functions of
increasing higher order. Some may find that such an approach yields more
insight into the nature of the function being approximated than with neural-
network approximations. Searches over network architecture can be readily
accomplished by varying the number of nodes on a hidden layer or layers
351
1,5
Downloaded from ascelibrary.org by Universidad Nacional De Ingenieria on 10/19/18. Copyright ASCE. For personal use only; all rights reserved.
0.5
.............................................................. 0-1 trainina pairs I......
-0.5
o:1 o:2 o:3 0:4 ors o:6 d.7 o:8 o:9
x
to train large nets with only a few training pairs. These nets can be trained
so that the approximation exactly fits the function to be approximated at
the design points and thus can be trained to have v = 0 but there can be
a great variation, from one training to the next, in the approximations over
a region of interest (i.e., there will be a large variation in the parameter
vc). The next example reemphasizes this point.
Fox (1971) investigated a function
y = 10x 4 - 20x2 x2 + 10x~ + x 2 - 2x a + 5 (15)
which has banana shaped contours as seen in Fig. 7. Fig. 8 shows results
using various neural-network approximations. In each case there are two
input nodes receiving the Xl and x2 coordinates of the design and one output
node predicting the response ,9. One hidden layer was considered with three,
five, or seven nodes on the hidden layer. Each net was trained three times
using 16 training pairs. Fig. 8 gives, for each neural net, the lowest value
of the parameter VG obtained in the three trainings and the difference of
the highest value of vc obtained minus the lowest value of VG in the three
trainings. The underdetermined nets (five and seven nodes on the hidden
layer) did not yield unique approximations. Three training runs for each of
these nets all gave v = 0 (exact fit at the design points) but yielded greatly
different results over the region of interest. The net with three nodes on
353
0.0
-0.5
X
1
FIG. 7. Fox's Banana Function
/
100
90. / ~[over-determined i ~under'determined l - -
,/
u) 80-
e 70-
E 60
I J--
50-
40-
b
30
20-
10-
0-
3 5 7
Number of nodes on Ndden layer, ih
[ ~ lowest vg ~ (highest-lowest) vg ]
FIG. 8. Fox's Banana Function, Error Parameters for Three, Five, and Seven Nodes
on Hidden Layer
x
]2
x
1
/:
x3
355
polynomials.
CONCLUSION
This paper points out the danger of using underdetermined approxima-
tions. This danger is present whether the approximations are underdeter-
mined polynomial approximations or underdetermined neural nets. The
authors are not aware of any studies using underdetermined polynomial
approximations. However, a number of reported studies have used nets with
a large number of nodes on a hidden layer or layers (and thus a large number
of associated undetermined parameters) trained with a relatively few num-
ber of training pairs. Thus, these nets are underdetermined approximators.
The variability in approximations, which can be obtained with such under-
determined nets, has been emphasized. The paper further points out that
design-point selection is important when training neural networks. Deficient
design-point selection can also give variability in an approximation.
For the examples in Carpenter et al. (1993) and those of this study,
overdetermined polynomials and neural nets with equivalent numbers of
associated coefficients gave comparable results. It should be emphasized
that only a limited set of examples and only feedforward nets with one
hidden layer were considered. While it is not possible to make a general
statement based on these limited studies, it does seem that the selection of
one type of approximation over the other can be reasonably based on per-
sonal preference. Training times of neural networks and polynomial ap-
proximations were discussed. While it was found that neural networks take
much longer, in general, to train than polynomial approximations, it was
pointed out that the ease in which a family of network architectures can be
examined may more than compensate for the lengthy training times of those
networks.
ACKNOWLEDGMENT
Selected examples from this paper were presented at the conferences The
Third International Conference on the Application of Artificial Intelligence
to Civil & Structural Engineering, Edinburgh, Scotland, 17-19 August, 1993
and ANNIE'93 Artificial Neural Networks in Engineering, Rolla, Missouri,
November 14-17, 1993.
APPENDIX. REFERENCES
Anderson, J., and Rosenfeld, E. (1988). Neurocomputing; foundations of research.
MIT Press, Cambridge, Mass.
Baffes, P. T. NETS user's guide. (1989). Software Technology Branch, Lyndon B.
Johnson Space Center.
Berke, L., and Hajela, P. (1990). "Application of artificial neural nets in structural
mechanics." Shape and Layout Optimization of Structural Systems, Int. Center for
Mech. Sciences, Udine, Italy.
Box, G. E. P., and Draper, N. R. (1987). Empirical model-building and response
surfaces. John Wiley and Sons, New York, N.Y.
Carpenter, W. C. (1992a). "Effect of design selection on response surface perfor-
357
358