Backpropagation
Backpropagation
containing the various distinct characteristics of the Eberhart and Dobbins.” The program was written in C
problem the neural network is likely to encounter in the language. Training was carried out until the average
finished application. A large training set reduces the risk sum squared error over all the training patterns was
of undersampling the nonlinear function but increases minimized. This occurred after about 10 000-20 000
the training time. A general guide is to have at least five cycles of training. Training time on a 80486-33 MHz
to ten training patterns for each weight3 As neural personal computer was usually between 5 and 10min.
networks learn linear relationships more efficiently, to
reduce training time, ‘one goal of data preparation is to
reduce nonlinearity when we know its character and
5 EXAMPLE APPLICATIONS
leave the hidden nonlinearities we don’t understand for
the neural network to resolve’.t7 Hence if it is known
Now two exampIes are presented to demonstrate the
that input X is inversely related to the output, a more
potential of this approach for capturing nonlinear
efficient approach would be to use (l/X) as the input.
interactions between various parameters in complex
Preprocessing of the data is usually required before
civil engineering systems. The first example involves the
presenting the patterns to the neural network. This is
analysis of data obtained from calibration chamber tests
necessary because the sigmoid transfer function modulates
on sand. The other example relates to the prediction of
the output of each neuron to values between 0 and 1.
the ultimate load capacity of driven piles. In both
Various normalization or scaling strategies have been
examples, actual field data were used in training the neural
proposed.‘“18 The following normalization procedure is
network. For brevity, only samples of the training and
commonly adopted and was used in this study.
For a variable with maximum and minimum values of testing data have been included.
Vmaxand Vmi, respectively, each value V is scaled to its
normalized value A using 5.1 Example A
A = (v - Vmin>/(vmax - vmin) (1) Cone penetration test (CPT) measurements are often
used to determine the soil engineering parameters for
The data are then randomly separated into a training
use in foundation design. The relationship between the
set and a testing set. Usually about one-third of the data
are used as the testing set.3 Initially, random scalar measured cone stresses and the soil properties are
weights are assigned to the neurons. The neural network determined from empirical correlations. In sands, these
is then fed the training patterns and learns through the correlations are commonly derived from large scale
adjustment of the weights. Training is carried out laboratory calibration chamber tests. The sand sample
iteratively until the average sum squared error over all of known density is prepared in the chamber and then
the training patterns is minimized. consolidated to the desired stresses. The cone is then
There is currently no rule for determining the optimal pushed into the sample, and the cone tip resistance qC
number of neurons in the hidden layer except through and the sleeve frictionf, are measured. The engineering
experimentation. Using too few neurons impairs the properties of the sample are determined from laboratory
neural network and prevents the correct mapping of testing. The cone measurements are then correlated
input to output. Using too many neurons impedes directly to the engineering properties.
generalization and increases training time.” A common CPT calibration chamber tests have been carried out
strategy and the one used in this study was to replicate by a number of researchers including Holden2’ and
the training several times, starting with two neurons and Baldi et d2’ For this study, the experimental results
then increasing the number while monitoring the average obtained by Baldi et aZ.2’ were used. Their experiments
sum squared error. Training is carried out until there is no involved a comprehensive study of the behaviour and
significant improvement in the error. properties of Ticino sand, under different stress and
As described earlier, the testing set of patterns is then boundary conditions. From statistical analysis, they
used to verify the performance of the neural network, on were able to establish correlations between qC and a
the satisfactory completion of the training. The testing number of engineering parameters. In this paper, the
phase assesses the quality of the neural network model correlation between the tangent constrained modulus
and determines whether the neural network can gener- M, during compression and qC for normally consoli-
alize correct responses for patterns that only broadly dated sand is considered. The correlation determined by
resemble the data in the training set. Baldi et d2’ from statistical analysis is shown below.
h
I
DR urn 9c MO
7 SUMMARY
REFERENCES
Tools: A Practical Guide. Academic Press, San Diego, 22. Garson, G. D. Interpreting neural-network connection
1990. weights. AI Expert, 6(7) (1991) 47-51.
16. Stein, R. Selecting data for neural networks. AI Expert, 23. Wellington, A. M. The iron wharf at Fort Monroe, VA.
8(2) (1993) 42-7. Trans. AXE, 27 (1892) 129-37.
17. Crooks, T. Care and feeding of neural networks. AZ 24. Hiley, A. The efficiency of the hammer blow, and its effects
Expert, 7(9) (1992) 36-41. with reference to piling. Engineering, 2 June (1922) 673.
18. Masters, T. Practical NeuraI Network Recipes in C++. 25. Janbu, N. Une analyse energetique du battage des pieux a
Academic Press, San Diego, 1993. l’aide de parametres sans dimension. Norwegian Geotech-
19. Bailey, D. & Thompson, D. How to develop neural nical Institute, Oslo, 1953, pp. 63-4 (in Norwegian).
network applications. AZ Expert, 5(6) (1990) 38-47. 26. Chellis, R. D. Pile Foundations, 2nd edn. McGraw-Hill,
20. Holden, J. The calibration of electrical penetrometers in New York, 1961.
sand. Internal report, Norwegian Geotechnical Institute, 27. Whitaker, T. The Design of Piled Foundations. Pergamon
Oslo, 152108-2, 1976. Press, Oxford, 1970.
21. Baldi, G., Bellotti, R., Ghionna, V. N., Jamiolkowski, M. 28. Olson, R. E. & Flaate, K. S. Pile-driving formulas for
& Pasqualini, E. Interpretation of CPTs and CPTUs - friction piles in sand. J. Soil Mech. Foundat. Div. ASCE,
2nd Part: Drained penetration of sands. Proc. 4th Int. 93(6) (1967) 279-96.
Geotech. Seminar on Field Instrumentation and Insitu 29. Flaate, K. S. An investigation of the validity of three pile
Measurements. Nanyang Technological Institute, Singa- driving formulae in cohensionless material. Norwegian
pore, 1986, pp. 143-56. Geotechnical Institute, Oslo, 1964, pp 1l-22.
This appendix details the procedure for partitioning the connection weights to determine the relative importance of the
various inputs, using the method proposed by Garson.** The method essentially involves partitioning the hidden-
output connection weights of each hidden neuron into components associated with each input neuron.
Consider the neural network with three input neurons, four hidden neurons and one output neuron with the
connection weights as shown below, as an example.
(1) For each hidden neuron i, multiply the absolute value of the hidden-output layer connection weight by the
absolute value of the hidden-input layer connection weight. Do this for each input variable j. The following
products Pijare obtained:
Input 1 Input 2 Input 3
Hidden 1 P,, = 1.67624 x 4.57857 P,* = 3.29022 x 4.57857 P,3 = 1.32466 x 4.57857
Hidden 2 P2i = O-51874 x 0.48815 Pz2 = 0.22921 x 0.48815 Pz3 = 0.25526 x 0.48815
Hidden 3 Psl = 4.01764 x 5.73901 PJ2 = 2.12486 x 5.73901 P33 = 0.08168 x 5.73901
Hidden 4 P4, = 1.75691 x 2.65221 Pd2 = 1.44702 x 2.65221 Pd3 = 0.58286 x 2.65221
(2) For each hidden neuron, divide P, by the sum for all the input variables to obtain Qij. For example for Hidden
1, Q,, = P,l/(P1l + P12+ P13)= 0.266445.
(3)For each input neuron, sum the product Sj formed from the previous computations of Qij. For example,
Si = QII + Q21 + Q31 + Q41.
Input 1 Input 2 Input 3
Hidden 1 Qll = 0.266445 Qi2 = 0.522994 Qi3 = 0.210560
Hidden 2 Qzl = o-517081 Q2* = O-228478 Qz3 = O-254441
Hidden 3 Q3, = 0.645489 Q32 = 0.341388 Qs3 = 0.013123
Hidden 4 Q4, = 0.463958 Q42 = 0.382123 Q4s = 0.153919
Sum Si = 1.892973 s, = 1.474983 S, = 0.632044
Back-propagation neural networks 151
(4) Divide Sj by the sum for all the input variables. Expressed as a percentage, this gives the relative importance
or
distribution of all output weights attributable to the given input variable. For example, for the input neuron 1,
the relative importance is equal to (S, x lOO)/(S, + S2 + Ss) = 47.3%.
Input 1 Input 2 Input 3
Relative importance (%) 47.3 369 15.8