Shallow and Deep Artificial Neural Networks For Structural Reliability Analysis
Shallow and Deep Artificial Neural Networks For Structural Reliability Analysis
1 Introduction those with just a single hidden layer, is adopted. This definition is
presented, for example, in Ref. [14].
Reliability analysis of real structural engineering systems is still
Considering that many recent publications have shown advan-
a computationally demanding task. Although in some cases failure
tages of deep ANNs over shallow ones [15,16], this paper presents
probabilities can be estimated at acceptable computational costs
a comparison between them in the context of structural reliability.
by using approximated methods such as first- and second-order
To do so, a previously proposed adaptive ANN procedure [5],
reliability methods (FORM and SORM), many times more
which aimed at shallow networks, is simplified, extended to the
demanding approaches such as Monte Carlo simulation (MCS)
case of deep ones and employed in the solution of four benchmark
and other sampling-based methods are the only feasible alterna-
structural reliability problems.
tives. In these cases, surrogate models, also known as metamo-
It is noteworthy that most of the surrogate models found in the
dels, have been widely employed as an attempt to keep the
literature, including shallow neural networks, suffer from what is
computational effort acceptable.
usually known as the curse of dimensionality [17–19]. This basi-
The basic idea of surrogate modeling for reliability analysis
cally means that the surrogates rapidly lose their efficiency as the
purposes is usually to replace the true time-consuming limit state
number of dimensions of the problem increases. However, recent
function by an approximation. In the literature, many different
developments in the area of deep learning have been leading to
surrogate models have been applied on structural reliability analy-
theoretical guarantees that deep neural networks can avoid the
sis, for example: response Surface Method [1], kriging [2], poly-
curse of dimensionality for some types of problems [20,21].
nomial chaos expansions [3], and artificial neural networks
Dimensionality issues are not directly investigated herein, but this
(ANNs) [4,5]. The present paper focuses on ANNs.
is another reason to consider the application of deep ANNs in the
A large number of applications of ANNs in the field of struc-
context of structural reliability, especially because it is common
tural reliability is also available in the literature, as can be seen in
to find structural reliability problems with high dimensionality.
the review paper by Chojaczyk et al. [6] and in many other refer-
The fact that different layers of deep ANNs may have different
ences [7–9]. However, the vast majority of them, if not all, employ
roles, or in other words that different layer types with different
only the so-called shallow neural networks, which are those with
goals may be employed [22,23], could also lead to advantages of
just one hidden layer. The potential of deep neural networks, those
these ANNs over the shallow ones. In the case of system reliabil-
with two or more hidden layers, in structural reliability is still to
ity, for example, the first hidden layer could try to separate the dif-
be explored, although these ANNs have been attracting a lot of
ferent failure modes in such a way that each group of neurons of
research interest in many areas over the last years. In the context
the next layers would be responsible for approximating one spe-
of structural engineering, a few papers with applications of deep
cific limit state function.
ANNs may already be found in the literature [10,11].
The remainder of this paper is organized as follows. In Sec. 2,
As pointed out by Schmidhuber [12], it is not clear in the litera-
some basic concepts related to structural reliability and Monte
ture at which problem depth shallow learning ends and deep learn-
Carlo simulation are presented. Section 2 also presents a brief dis-
ing begins. An attempt to define shallow and deep ANNs is
cussion about why the computational cost may become prohibi-
presented in Ref. [13], where it is said that deep architectures are
tive and points out some alternatives to overcome this. Section 3
composed of multiple levels of nonlinear operations. However, in
describes the artificial neural networks considered herein, as well
the present paper, a simpler definition, that shallow networks are
as the adaptive procedure employed for the shallow and deep
ANNs. Section 4 presents results obtained for the numerical
Manuscript received November 28, 2019; final manuscript received June 9, 2020; examples and some discussions about these results. Finally, some
published online July 17, 2020. Assoc. Editor: Gilberto Francisco Martha de Souza. concluding remarks are drawn in Sec. 5.
ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, DECEMBER 2020, Vol. 6 / 041006-1
Part B: Mechanical Engineering C 2020 by ASME
Copyright V
Downloaded from http://asmedigitalcollection.asme.org/risk/article-pdf/6/4/041006/6550869/risk_006_04_041006.pdf?casa_token=6w9pSmm8azUAAAAA:FolXEzjECWl8hKYRrepuSLq-re9c6-vN1xjlRscyMBBmpC9eG5C81nRSd4IY4sozEuYr08lUxw by Christ University user on 14 June 2021
2 Structural Reliability simplified analogy to the nervous system and have significantly
evolved ever since. Most of the recent developments on ANNs are
Let X be a vector of random variables, which represents all ran-
associated with the area known as deep learning.
dom or uncertain parameters of a structural system, and x be a
In ANNs, information is processed by small processing units,
vector of realizations of these random variables. The boundary
corresponding to the neurons, mathematically represented by sim-
between desirable and undesirable structural responses is defined
ple functions which are usually called activation functions. The
by limit state functions, g(X), in such a way that the failure and
processing units communicate with each other by means of
safe domains, Xf and Xs, respectively, are given by
weighted connections corresponding to the synapses of the brain
[18]. Different networks can be constructed by choosing different
Xf ¼ xjgðxÞ 0
(1) numbers of neuron layers, the type and number of neurons in each
Xs ¼ xjgðxÞ > 0 layer, and the type of connection between neurons.
The most widely used network type for approximation prob-
Each limit state describes one possible failure mode of the lems, which is adopted herein, is the multilayer perceptron (MLP,
structure. The probability of undesirable structural responses for see Ref. [18]). MLP networks are built with: one input layer with
each failure mode, usually known as probability of failure, is one neuron for each input parameter, one output layer with one
defined as neuron for each output parameter, and an arbitrary number of hid-
ð den layers, nhidden, with arbitrary numbers of (hidden) neurons,
nneurons. In the present paper, following some references from the
Pf ¼ P½X 2 Xf ¼ fX ðxÞdx (2) literature (for example, Ref. [14]), the ANN is classified as shal-
Xf
low if it has just one hidden layer and as deep otherwise.
where fX(x) is the joint probability density function of vector X. In feedforward ANNs, the neurons of one layer are connected
Equation (2) may also be employed to compute failure probabil- with each neuron of the previous layer, but information only flows
ities of structural systems. In this case, Xf must be defined as a in the forward direction, from the input toward the output layer.
combination of all limit state functions involved. The type of neuron in each layer is defined by the chosen acti-
The multidimensional integral in Eq. (2) may be solved by vation function. Linear and sigmoid functions are usual choices,
means of structural reliability methods such as FORM, SORM, although the literature is filled with many different types of activa-
and MCS. These methods are described, for example, in Refs. tion functions. In this paper, linear activation functions are used
[24] and [25]. for the input and output layers. For the hidden layers two different
When simple MCS is employed, failure probabilities are esti- functions are chosen to be tested for both types of ANNs: tangent-
mated via Eq. (3). In this case, nMC samples of X are randomly sigmoid (tansig), very common in the context of shallow ANNs,
generated according to the joint distribution, fX(x), and a so-called and rectified linear unit (ReLU), which is a common choice for
indicator function, I[x], which is equal to one if x belongs to the deep ANNs.
failure domain and zero otherwise, is considered. Application of For a given configuration and a given dataset, the so-called
Eq. (3) requires one limit state function evaluation per sample, training of the network consists of adjusting its parameters in such
and large numbers of samples are necessary when dealing with a way that its performance is increased. In other words, during the
small failure probabilities. As engineering structures usually pres- training of the network their parameters are modified in such a
ent very small failure probabilities, the computational burden eas- way that the differences between known outputs and outputs pro-
ily becomes prohibitive vided by the ANN (the error) are reduced. Each iteration of the
training process is called an epoch, and if a better approximation
1 X
nMC is required for some regions of the output space, the error to be
P f ¼ E ½ I ½X ffi I ½x i (3) reduced may be weighted by multiplying it component-wise by a
nMC i¼1 vector of weights, eW.
In the present paper, the Levenberg–Marquardt training method
In the literature, many methods have been proposed to reduce is employed [34], which is a common choice for shallow net-
the number of samples required by MCS to achieve a given accu- works, and the mean-squared error is used as a performance func-
racy. These methods include but are not limited to: importance tion. Although, for deep ANNs, training algorithms such as the
sampling [26], asymptotic sampling [27], and subset simulation adaptive moment estimation method [35] have shown promising
[28]. results, it seems that they usually aim at problems with lots of
Another approach which has been drawing a lot of attention data, which is hardly the case for structural reliability problems.
from researchers over the last years is the one based on surrogate So, the Levenberg–Marquardt training method seems to be a good
models [2,9,29–31]. In this case, a common approach consists of choice also for the deep networks in the context of the present
replacing as many as possible evaluations of the time-consuming paper. In fact, the Levenberg–Marquardt method led to better
limit state function by evaluations of an accurate enough surrogate results than the adaptive moment estimation method, when both
model, which presents smaller computational costs. Most of the were briefly compared considering the problems studied herein.
times, the true model is evaluated on a number of points, which However, a better tuning of the hyperparameters of the adaptive
constitute the so-called experimental design (ED), and the surro- moment estimation method, in the context of reliability problems,
gate is constructed by using this information. The fact that the could still be pursued in future studies.
choice of these points has a significant impact on the accuracy of The MATLAB neural network toolbox [36] is employed herein.
the metamodel has led the path to the development of a number of Further details about ANNs can be obtained, for example, in
adaptive strategies for EDs, such as the one employed in the pres- Ref. [18].
ent paper. In these strategies, points are included in the ED in an
iterative manner, trying to cover the most important regions of the
domain. Identification of these regions takes into account proba- 3.2 Adaptive Artificial Neural Networks for Structural
bility densities as well as the accuracy of the limit state function Reliability Analysis. The adaptive ANN procedure applied in
approximation [3,29,31,32]. this paper for both shallow and deep networks is similar to the one
proposed for shallow networks in Ref. [5].
When using surrogate models for limit state function approxi-
3 Artificial Neural Networks and Adaptive Designs mation in reliability analysis, an experimental design is employed
to
n construct the approximation.
o The ED consists of nED points
3.1 Artificial Neural Networks. Artificial neural networks
ð1Þ ð2Þ ðnED Þ ðiÞ
were introduced by McCulloch and Pitts [33] based on a xED ; xED ; …; xED , with xED 2 Rn , and the respective function
ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, DECEMBER 2020, Vol. 6 / 041006-3
Part B: Mechanical Engineering
Downloaded from http://asmedigitalcollection.asme.org/risk/article-pdf/6/4/041006/6550869/risk_006_04_041006.pdf?casa_token=6w9pSmm8azUAAAAA:FolXEzjECWl8hKYRrepuSLq-re9c6-vN1xjlRscyMBBmpC9eG5C81nRSd4IY4sozEuYr08lUxw by Christ University user on 14 June 2021
Table 1 Example 1: results
Bs ðxðiÞ Þ Bf ðxðiÞ Þ
UFBR ðxðiÞ Þ ¼ (7) Average results
B
Method nhidden Pf nCLS nneurons Time (min)
where Bs ðxðiÞ Þ and Bf ðxðiÞ Þ are the number of surrogates which
identify the sample xðiÞ as being in the safe and in the failure ANN (Tansig) 1 4.452 103 125 17.4 27.7
regions, respectively. If UFBR ðxðiÞ Þ ¼ 1, the classification of the 2 4.456 103 102 20.2 22.7
sample is resulting the same for all surrogates. If UFBR ðxðiÞ Þ is 3 4.458 103 97 20.1 22.3
close to zero, the classifications of xðiÞ are resulting different and 4 4.457 103 97 23.5 25.4
5 4.457 103 103 24.0 30.3
this point should be added to the ED.
In order to add nADD points to the ED at each iteration, the pop- ANN (ReLU) 1 4.457 103 145 20.6 30.6
ulation is clustered in nADD different regions by using the k-means 2 4.464 103 124 23.5 29.8
clustering method [38]. Each time the enrichment of the ED takes 3 4.460 103 132 22.3 33.2
place, UFBR is evaluated on the entire population and one point of 4 4.460 103 116 25.3 29.9
5 4.459 103 121 25.7 32.3
each cluster is selected, among those presenting the smallest value
of UFBR. MCS – 4.458 103 5 106 – <1
applied in the computations of error performances during the Results for this example are shown in Table 1 and Fig. 1. Note
entire process. that the number of neurons given in Table 1, for example, is not
an integer since it refers to the average over five runs.
Average results
Average results
pffiffiffi X
n
gðx1 ; x2 ; …; xn Þ ¼ n þ 3r n xi (10)
i¼1
Fig. 3 Difference between failure probabilities obtained by 4.4 Example 4: Two-Dimensional Truss Structure. The
ANNs and by MCS (Example 2) last example consists of a finite-element model of a 23-bar truss
structure, subject to random loads P1, P2, …, P6 (Fig. 5). This
example was presented by Lee and Kwak [41] and studied in the
4.3 Example 3: High-Dimensional Problem. This example literature, for example, in Refs. [3,30], and [32].
was proposed by Rackwitz [40] and also studied by Echard et al. Ten independent random variables are considered, as summar-
[29]. The limit state function is given by Eq. (10). Note that, for ized in Table 5. The vector of random variables is given by
this limit state function, the number of variables can be easily X ¼ fE1, E2, A1, A2, P1, P2, P3, P4, P5, P6g. It is assumed that all
changed without significantly modifying the level of failure horizontal members have perfectly correlated Young’s moduli,
probability E1, and cross-sectional areas, A1; the same is assumed for the
ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, DECEMBER 2020, Vol. 6 / 041006-5
Part B: Mechanical Engineering
Downloaded from http://asmedigitalcollection.asme.org/risk/article-pdf/6/4/041006/6550869/risk_006_04_041006.pdf?casa_token=6w9pSmm8azUAAAAA:FolXEzjECWl8hKYRrepuSLq-re9c6-vN1xjlRscyMBBmpC9eG5C81nRSd4IY4sozEuYr08lUxw by Christ University user on 14 June 2021
Table 5 Example 4: random variables Results for two values of vAdm, 0.10 m and 0.12 m, respectively,
are presented in Table 6 and Fig. 6.
Variable P.D.F. Mean Standard deviation
11
E1, E2 (Pa) Lognormal 2.1 10 2.1 1010 4.5 Discussions. In most cases, the results obtained by the
A1 (m2) Lognormal 2.0 103 2.0 104 ANNs were very good, with differences to reference values usu-
A2 (m2) Lognormal 1.0 103 1.0 104 ally far below 1%. The nCLS required by the ANNs were compara-
P1,… , P6 (N) Gumbel 5.0 104 7.5 103 ble to and in some cases smaller than the ones required by other
surrogate models using similar adaptive approaches, as can be
diagonal members, where the Young’s moduli and cross-sectional seen in the literature [29,30,32]. Also, as expected, the ANNs
areas are represented by E2 and A2, respectively. required much less calls to the limit state function than crude
The limit state adopted is related to the allowable vertical dis- MCS.
placement at midspan, and the structural displacements are A comparison of the ANNs in terms of activation functions
obtained by means of linear elastic analyses. The limit state equa- shows that only in the third example, where the limit state func-
tion is given by tion is linear on the random variables, the ReLU ANNs led to bet-
ter results. Note that the limit state functions for the first and
gðxÞ ¼ vAdm jvmax j (11) second problems are clearly nonlinear on the random variables, as
where vmax is the vertical displacement at midspan, and vAdm is seen in Eqs. (8) and (9). Although in the last example vmax is com-
the admissible maximal deflection. puted via linear elastic structural analyses, the limit state function
is also nonlinear on the random variables. By comparing failure
Table 6 Example 4: results
Average results
Fig. 6 Difference between failure probabilities obtained by ANNs and by MCS (Example 4): (a) vAdm 5 0.10 m and
(b) vAdm 5 0.12 m
ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, DECEMBER 2020, Vol. 6 / 041006-7
Part B: Mechanical Engineering
Downloaded from http://asmedigitalcollection.asme.org/risk/article-pdf/6/4/041006/6550869/risk_006_04_041006.pdf?casa_token=6w9pSmm8azUAAAAA:FolXEzjECWl8hKYRrepuSLq-re9c6-vN1xjlRscyMBBmpC9eG5C81nRSd4IY4sozEuYr08lUxw by Christ University user on 14 June 2021
[34] Hagan, M. T., and Menhaj, M. B., 1994, “Training Feedforward Networks With [38] Haykin, S., 1996, Adaptive Filter Theory, 3rd ed., Prentice Hall, Upper Saddle
the Marquardt Algorithm,” IEEE Trans. Neural Networks, 5(6), pp. 989–993. River, NJ.
[35] Kingma, D. P., and Ba, J. L., 2015, “ADAM: A Method for Stochastic [39] Waarts, P.-H., 2000, “Structural Reliability Using Finite Element
Optimization,” Proceedings of the Third International Conference on Learning Methods: An Appraisal of DARS—Directional Adaptive Response
Representations, San Diego, CA, May 7–9. Surface Sampling,” Ph.D. thesis, Technical University of Delft, Delft, The
[36] Beale, M. H., Hagan, M. T., and Demuth, H. B., 2011, Neural Network Tool- Netherlands.
box: User’s Guide, Mathworks, Natick, MA, p. 404. [40] Rackwitz, R., 2001, “Reliability Analysis — A Review and Some Perspective,”
[37] Nguyen, D., and Widrow, B., 1990, “Improving the Learning Speed of 2-Layer Struct. Saf., 23(4), pp. 365–395.
Neural Networks by Choosing Initial Values of the Adaptive Weights,” [41] Lee, S. H., and Kwak, B. M., 2006, “Response Surface Augmented
Proceedings of the International Joint Conference on Neural Networks, San Moment Method for Efficient Reliability Analysis,” Struct. Saf., 28(3), pp.
Diego, CA, June 17–21. 261–272.