Stripe-Based Fragility Analysis of Multispan Concrete Bridge Classes Using Machine Learning Techniques
Stripe-Based Fragility Analysis of Multispan Concrete Bridge Classes Using Machine Learning Techniques
Stripe-Based Fragility Analysis of Multispan Concrete Bridge Classes Using Machine Learning Techniques
net/publication/333838676
CITATIONS READS
18 616
2 authors:
58 PUBLICATIONS 665 CITATIONS
Hanyang University
87 PUBLICATIONS 1,031 CITATIONS
SEE PROFILE
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Sujith Mangalathu on 18 June 2019.
RESEARCH ARTICLE
1
Department of Civil and Environmental
Summary
Engineering, University of California, Los
Angeles, California, USA A framework for the generation of bridge‐specific fragility curves utilizing the
2
Department of Civil and Environmental capabilities of machine learning and stripe‐based approach is presented in this
Engineering, Hanyang University, Seoul, paper. The proposed methodology using random forests helps to generate or
Republic of Korea
update fragility curves for a new set of input parameters with less computa-
Correspondence tional effort and expensive resimulation. The methodology does not place
Jong‐Su Jeon, Department of Civil and
any assumptions on the demand model of various components and helps to
Environmental Engineering, Hanyang
University, Seoul 04763, Republic of identify the relative importance of each uncertain variable in their seismic
Korea. demand model. The methodology is demonstrated through the case study of
Email: [email protected]
a multispan concrete bridge class in California. Geometric, material, and
Funding information structural uncertainties are accounted for in the generation of bridge numerical
National Research Foundation of Korea models and their fragility curves. It is also noted that the traditional lognormal-
(NRF); Korea government (MSIT), Grant/
Award Number: NRF‐2019R1C1C1007780
ity assumption on the demand model leads to unrealistic fragility estimates.
Fragility results obtained by the proposed methodology can be deployed in a
risk assessment platform such as HAZUS for regional loss estimation.
KEYWORDS
bridge‐specific fragility, machine learning, multispan bridges, regional risk assessment
1 | INTRODUCTION
Past earthquakes have demonstrated that highway bridges, as key components of transportation networks, are one of
the most vulnerable components. The state of the bridge after an earthquake is critical in deciding the emergency or
ordinary traffic. The likelihood of bridge damage during a seismic event can be obtained through fragility curves.
Fragility curves are conditional probability statements that give the likelihood that a structure or its component will
meet or exceed a certain level of damage for a given ground motion intensity measure (IM).
Extensive studies have been carried out to determine the fragility relationship of highway bridges using analytical
methodology.1-13 Existing analytical fragility methodology consists of convolving demand models with capacity models.
The prevalent analytical approaches in the generation of bridge fragilities are cloud approach, stripe, and incremental
dynamic analysis (IDA) approach. All three approaches typically employ the nonlinear time history analysis (NLTHA)
of bridge models to extract seismic responses (demands). In the cloud approach, the probabilistic demand models are
established by the linear regression of engineering demand parameters (EDPs) and IM in a lognormal space. In the
stripe analysis, ground motions are scaled to the same intensity level to find the probability distribution of EDPs.
In the IDA approach, ground motions are successively scaled until the significant reduction of primary load‐carrying
elements (collapse prevention) in the structural system.14 A comparison of the three approaches in the estimation of
seismic demand model is given in Mackie and Stojadinović.15
Earthquake Engng Struct Dyn. 2019;1–18. wileyonlinelibrary.com/journal/eqe © 2019 John Wiley & Sons, Ltd. 1
2 MANGALATHU AND JEON
All the three methods are used to determine the median (or mean) relationship between EDP and IM, and the asso-
ciated uncertainty. As noted by recent researches,16,17 single‐parameter fragility curves have many drawbacks; single‐
parameter fragility curves (a) demand expensive resimulations, if there is an update in the input parameter from a field
observation or new database, (b) cannot identify the relative importance of each uncertain input variable on the fragility
curves, and (c) need the grouping of bridges that have statistically similar performance prior to performing fragility
analysis. Also, the predictive capability of the single‐parameter demand model is questioned by Mangalathu et al,17
especially at high IMs. At high IMs, bridge components would experience nonlinear behavior, resulting in statistical
errors in their demand models because IM is the only conditioned variable. Mangalathu et al17 used a cloud approach
(NLTHA of the bridges with various unscaled ground motions to estimate the seismic demand model) for the generation
of fragility curves and concluded that lasso regression is the good choice for the cloud approach. However, the
cloud approach involves making a priori assumption about the probabilistic distribution of seismic demands
which tends to be a drawback. Karamlou and Bochini18 noted that the lognormal assumption on the demand model
introduces a significant error in the fragility curves and associated loss estimation. However, the scope of the study
was limited to multispan simply supported steel girder bridges. Further studies are thus needed to check the normality
assumption for concrete box‐girder bridges as the concrete box‐girder bridges occupy more than 60% of California
bridge inventory.7
The objective of the paper is multifold. (a) The fragility analysis of bridges has been traditionally carried out with
only parametric approaches (with an assumption that the demand model is normally distributed in a logarithmic space).
However, no studies have been carried out on the performance evaluation of parametric and nonparametric
approaches for the generation of seismic demand models of bridge components. This study compares the parametric
and nonparametric approaches in the generation of seismic demand models for the regional seismic risk assessment
of bridges. (b) This study checks the validity of the lognormality assumption (or the normality assumption on the
log‐transformed variables) placed on the demand model. As fragility curves are implemented in earthquake alerting sys-
tems these days,12 it is critical to examine the validity of this assumption. No studies have been carried out on the
validity of this assumption for the fragility analysis of California bridge inventory. This study suggests a methodology
for the generation of bridge fragility curves without placing any assumption on the demand model and compares
nonparametric stripe‐based fragility curves with the traditional stripe fragility analysis. Such an exercise helps to gauge
the parametric and nonparametric approaches. (c) A nonparametric multiparameter demand model for the generation
of fragility curves intended for the regional risk assessment of bridges is suggested in this paper. Note that such a non-
parametric approach is not available in bridge community until now. (d) This study proposes a methodology for
addressing the regional risk of bridges that enables assessment authorities to update the fragility curves in light of
new information without expensive NLTHAs. (e) The existing approaches17 can identify only the relative importance
of ground motion IM and other design variables on the seismic response of bridge components. However, the ground
motion uncertainty overshadows other uncertainties as noted in the previous work of the authors.17 To have a realistic
estimation on the uncertainties, the effect of uncertain input parameters at specific IMs is required. This paper identifies
the relative importance of input variables on the seismic demand model at various levels of IM. Finally, (f) this study
extends the stripe‐based fragility approach to account for the uncertainties in geometric, material, and structural
properties. Although various studies have reflected these uncertainties in the cloud‐based approach,8,16,17 there is no
methodology to include these uncertainties in the stripe‐based fragility approach. After noting that the demand does
not follow a lognormal distribution, the superiority of a nonparametric machine learning technique called random for-
est (RF) in the seismic demand estimation is demonstrated. Then, this paper presents a methodology to generate seismic
fragility curves for regional risk assessment by exploiting the advantages of stripe approach and RF. RF is a popular
machine learning technique and is widely used in the field of statistics19,20 and bio‐informatics.21,22 RF has substantial
advantages over other machine learning techniques because of their flexibility, intuitive simplicity, and computational
efficiency. Compared with other machine learning techniques (such as Lasso, Ridge, Naïve Bayes, and simple neural
networks), RF is not placing any strong assumption on the mapping function and is nonparametric.20 Although RF
has a superior performance in comparison to other machine learning techniques such as support vector machines
and neural networks,19 it is not fully explored in the field of structural engineering. Rokneddin23 explored the applica-
tion of RF for the network reliability analysis of highway bridges. The author pointed out that using RF as a surrogate
model can reduce expensive simulations (computational efforts). Tesfamariam and Liu24 investigated the application of
RF and other machine learning techniques in classifying the damage state of buildings from past earthquakes.
However, the capability of RF in the generation of multiparameter fragility curves using the stripe approach is not
yet explored, and this paper is directed toward that.
MANGALATHU AND JEON 3
The proposed bridge methodology is used to suggest fragility curves for three‐span and four‐span bridges with seat
abutments in California. Previous studies17,25 noted that three‐span and four‐span bridges have statistically similar
performance during earthquakes and can be grouped together for fragility analysis. Interested readers are referred to
the studies11,17,25 for the grouping of bridge classes for regional risk assessments. The three‐span and four‐span
bridge classes together occupy more than 40% of the California bridge inventory. Nevertheless, very few studies have
explored the seismic vulnerability of these bridges. Coupling of the bridge‐specific fragility curves with the seismic
hazard at a region enables the regional risk assessment of transportation networks. Geometric, structural, and
material uncertainties are accounted for in the current study to generate fragility curves of the selected bridge
classes. A brief description of the decision trees (DTs) and RF is given in the following section. Then, the description
of the numerical modeling of the selected bridges and the associated uncertainties are provided. After that, this paper
checks the normality assumptions on the seismic demands, outlines the proposed methodology using RF, compares
the fragility results from RF and existing stripe methodology, and identifies the relative importance of uncertain
input variables.
2 | D E C I S I O N T R E E A N D RA N D O M FO R E S T
Decision tree (DT) is a nonparametric regression approach, and the algorithm generates a tree‐like graph using the
training data. The DT partitions the data into distinct and nonoverlapping regions composed of a root node (formed
from the entire data), interior nodes, and terminal nodes. Each node in a DT has only one parent node and binary splits.
A typical representation of a DT is shown in Figure 1A. The tree initially divides the region into two parts, based on the
variable x1, if x1 < a1, the right branch of the tree is activated, and it is assigned to the predictor 1 depending on the
variable x2 (x2 ≥ a2). For the case where x1 < a1 and x2 < a2, the tree is assigned to the predictor 3 or 4 depending
on the value of x3 (x3 ≥ a3). The left branch of the tree from the main node is activated for the case x1 ≥ a1, and the
predictor depends on the value a3 for x3 and a2 for x2.
The two main steps involved in the building the DT are as follows:
1. Dividing the training set space (X1, X2, …, Xp) into J distinct and nonoverlapping regions R1, R2, …, RJ.
2. The prediction is same for an observation falls into the region Rj, which is the mean of the training observations
in Rj.
The regions R1, …, RJ are evaluated using a recursive binary splitting approach such that the residual sum of squares
between the actual and predicted responses is minimum. The tree structure is obtained using a weakest link pruning
approach.20
Random forest (RF) is a learning method consisting of an ensemble of tree‐structures (Figure 1B). RF takes advantage of
two powerful machine learning techniques: bagging and random feature selection.19 In bagging, each tree is indepen-
dently constructed using a bootstrap sample of the training data, and the mean value of the outputs of the trees is used
for prediction.26 RF is a revised version of bagging. Instead of using all features, RF randomly selects a subset of features
to be split at each node when growing a tree. The addition of randomness makes RF perform well compared with other
machine learning techniques such as support vector machines and neural networks and is robust against overfitting.19,27
Interested readers are directed to Breiman12 and Friedman et al20 for a more detailed description of RF and a general
algorithm is as follows:
nt
bf nt ðx Þ ¼ 1 ∑ f ðx Þ (1)
RF
nt 1 nt
where bf nRFt ðx Þdenotes the outcome of RF prediction (average value) from a total of nt trees, and f nt ðx Þ is the individual
prediction of a tree for an input vector x. The variance of the average of nt random variables with a correlation
coefficient ρ and standard deviation σ is20
1−σ 2
varnt ¼ ρσ 2 þ σ : (2)
nt
An estimate of the error rate can be obtained from RF by the following steps:
1. Predict the data which is in the original dataset and not in the bootstrap sample (out‐of‐bag, or OOB, data) using the
tree grown with the bootstrap sample for each bootstrap iteration.
2. Aggregate the OOB predictions. Calculate the error rate and call it the OOB estimate of error rate.
The current study selected three‐span and four‐span bridges in California constructed after 1970 to demonstrate
the proposed fragility methodology. Ramanathan7 indicated that these bridge types occupy more than 40% of the
California box‐girder bridge inventory. Three‐dimensional finite element models of these bridge types are developed
in OpenSees28 with realistic representations for abutments, deck, columns, foundations, bearings, and pounding.
Decks are modeled using linear elements and are connected to columns with rigid links. Displacement‐based
beam‐column elements with fiber‐defined cross sections comprising fibers of confined and unconfined concrete
and longitudinal reinforcement are used to model the columns (Figure 2). Each column is equally divided into nine
beam‐column elements along the clear height of the columns, each of which has five integration points. Elasto‐
plastic model is used to model the bearings, and the pounding effect is simulated using an inelastic compression
element with the gap.29
Earth pressure comprises two types of resistance: active resistance when the abutment wall moves away from the
backfill and the passive resistance when the abutment wall compresses the backfill. The active resistance is assumed
to be provided only by the piles7 while the passive resistance is contributed by the soil and the piles. The hyperbolic soil
model proposed by Shamsabadi et al30 is used to simulate the passive response of the abutment. The response of the
abutment piles is simulated using a trilinear material model presented in Mangalathu et al.11 Exterior shear keys are
simulated following the backbone curve of experimental results by Silva et al.31 The foundations are modeled using
linear translational and rotational springs. The various components previously described are combined to generate
MANGALATHU AND JEON 5
FIGURE 2 Numerical modeling of three‐span and four‐span seat‐type abutment bridges [Colour figure can be viewed at
wileyonlinelibrary.com]
the bridge system model for seismic fragility analysis, and the spring connection for the various components is also
shown in Figure 2. Interested readers are directed to Jeon et al32 for a more detailed description of the numerical
modeling for various bridge components.
To include possible uncertainties in the creation of bridge models, different sources of uncertainties, such as
geometry, material, and system, are accounted for in the current study. Note that the input variables are determined
in the current study on the basis of the numerical modeling technique for various bridge components and the insights
from the sensitivity study on bridge demand models.17
Table 1 presents the mean value, standard deviation, and the associated probability distribution of various input
variables used in the current study. The geometric properties such as the span length (Lm), number of spans (Ns),
deck width (Dw), ratio of approach span to main span (η), column height (Ha), longitudinal gap between the deck
and the abutment (Δl), transverse gap between the deck and the shear key (Δt), abutment backwall height (Ha), and
mass factor (m) reported in Table 1 are generated based on the statistical analysis of the parameters obtained from
the manual plan review of the bridges in California. The statistical distribution of the parameters is determined based
on the plan review of more than 1000 bridges in California, as reported in Mangalathu.12 Note that the parameters
reported in Table 1 are specific to three‐span and four‐span bridges, and interested readers are directed to Mangalathu12
for the parameters for other bridge configurations. The distribution of the column properties such as the concrete
6 MANGALATHU AND JEON
compressive strength ( f c), yield strength of reinforcement ( f y), and longitudinal and transverse reinforcement ratio (ρl
and ρt) are also determined based on the statistical analysis of the values noted from the plan review. The foundation
systems noted from the plan review were modeled in LPILE33 to determine the stiffness of translational (Kft) and
rotational springs (Kfr) that are then located at the base of the columns to represent the behavior of foundation systems.
The abutment piles stiffness (Kp) is calculated based on the statistical distribution of the type and number of piles noted
in the plan review. As the abutments can either be on sand or clay, a binary variable is assigned for the backfill type
(BT). The ground motion properties (earthquake direction ED and mean time period Tm) are based on the ground
motion properties reported in the ground motion suite provided by Baker et al.34 The acceleration for shear key capacity
MANGALATHU AND JEON 7
(in g) is based on the recommendations from the design engineers in California Department of Transportation
(Caltrans), as outlined in Mangalathu.12 The damping ratio (ξ) is based on the recommendation of Ramanathan.7
Interested readers are directed to Mangalathu12 for a more in‐depth discussion of the modeling parameters.
The current study selects the suite of ground motions developed by Baker et al,34 which was developed as part of the
PEER Transportation Research Program for the seismic risk assessment of infrastructure systems in California.
Ramanathan7 conducted an extensive study on the suitable IMs for the bridges in California. Based on the author's
recommendation, spectral acceleration at 1 second (Sa–1.0) is adopted as the IM in the current study.
Statistically significant yet nominally identical three‐dimensional bridge models are created by sampling across
the range of parameters presented in Table 1 using Latin Hypercube Sampling (LHS). Compared with pure
random sampling using naïve Monte Carlo simulation, LHS provides an effective scheme to cover the probability space
of the random variables.35 The generated bridge models are randomly paired with the selected suite of ground motions
to obtain the bridge‐ground motion pair for NLTHA. The two orthogonal components of the ground motions are ran-
domly assigned to the longitudinal and transverse direction of the bridge axis. The various EDPs and the associated
capacity states of bridge components are presented in Table 2. The dispersion of the capacity (βc) is done in a subjective
manner due to lack of sufficient information and is adopted as constant across the components and the respective
damage states. Also, the adopted βc value is a good estimate for columns based on the column data base summarized
by Mangalathu.12
In the existing stripe‐based fragility methodology, ground motions are scaled to the same IMs and perform NLTHA on
the bridge models using the scaled ground motions. Lognormal distribution is fitted on the EDPs obtained from NLTHA
at a single IM and is convolved with capacity models to calculate the probability of failure at the IM. The fragility func-
tion computing a failure probability can be written as
where F (DLS|IM = x) is the cumulative probability of obtaining a specified limit state (DLS) at IM = x. If both the
demand and capacity models follow lognormal distributions, this cumulative distribution function for a specified IM
can be expressed as (see Figure 3)
" #
lnDLS − λ DLS 1 1 lnz−λ 2
F ðDLS jIM ¼ x Þ ¼ Φ ¼ ∫0 pffiffiffiffiffiffi exp − dz (4)
β 2π βz 2 β
Median Value, Sc
Dispersion,
Component Slight (LS1) Moderate (LS2) Extensive (LS3) Complete (LS4) (βc)
FIGURE 3 Development of traditional stripe‐based fragility curves [Colour figure can be viewed at wileyonlinelibrary.com]
where μ is the mean of response data and vx is the coefficient of variation. The overall uncertainty β (dispersion)
can be defined as the square root of the sum of the squares of response uncertainty due to record‐to‐record variation
(βD|IM) from the demand model and capacity uncertainty (βC):
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
β¼ β2D∣IM þ β2C (6)
where
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi
βD∣IM ¼ ln 1 þ v2x : (7)
The failure probability for a limit state for each IM obtained from Equation 3 through Equation 5 is used to construct
a fragility curve for the limit state over the entire range of IMs, as illustrated in Figure 3. This fragility method has an
ability to develop fragility curves with different dispersions at each IM level. However, this method should satisfy the
normality assumption at each level of IMs with an acceptable margin of error.
The predictive capability of parametric machine learning techniques such as Lasso regression and elastic net (EN) and
nonparametric machine learning techniques such as support vector machine (SVM) and RF is initially compared to find
the best model for the seismic demand estimation. Such a study helps to identify the best machine learning model for
the strip‐based approach in the generation of bridge‐specific fragility curves. A penalty factor of 0.01 is chosen for Lasso
and EN based on the study by Mangalathu et al.17 A total of 100 DTs are used in RF because it is noted that the accuracy
(R2 and MSE) remains unchanged with the further addition of DTs. Each tree is pruned based on the weakest‐link prun-
ing. The pruning consists of a sequence of trees decided based on a nontuning parameter such that the cross‐validation
MANGALATHU AND JEON 9
error is minimum. Interested readers are directed to Mangalathu12 and Friedman et al20 for a more detailed explanation
of the machine learning methods.
The usual strategy followed in the machine learning is to randomly divide the data into a training set (70%‐80%) and
a test set (30%‐20%). The model fitted using the training set is used to evaluate the performance of the test set. Such a
strategy is good, if a designated test set is available. However, the case in this study does not belong to the above.
Depending on the random samples in the training set, the model evaluated only on a specific training set can either
underestimate or overestimate the model.20 To avoid this situation, the authors use a “resampling” method to fit the
demand model in the fragility estimate. The resampling method helps (a) to estimate the variability in the machine
learning model with different samples of the training set and test set and (b) to obtain additional insights that would
not be possible by fitting the model on a single set of training sample. To avoid the bias in the selection of the training
samples, a k‐fold cross validation is used in this study to establish the demand model as well as to estimate the mean
square error (MSE) and coefficient of determination (R2) for the test samples. The k‐fold cross validation consists of
splitting the data into k nonoverlapping groups. In each fold, the model is fitted on the k − 1 parts of the data, and
the prediction error is estimated using the held‐out data. This process involves k‐estimates of MSE and R 2 for the train-
ing and test set. The k‐fold MSE and R2 can be estimated as
1k
MSE k−fold ¼ ∑ MSE i
k i¼1
(8)
1k
R2k−fold ¼ ∑ R2i :
k i¼1
Ten‐folds are selected in this study based on the insights of the sensitivity study (Figure 4). It has been noted from the
figure that the MSE remains constant after 10 folds, which is consistent with the recommendations of Breiman and
Spector.36 Interested readers are directed to Friedman et al20 for a more detailed explanation on the bias in the training
set selection and the k‐fold cross validation. Table 3 presents the coefficient of R2 and MSE for the training and test set
with the 10‐fold cross validation at various levels of IM in the case of the column curvature ductility (COL) and bearing
displacement (BRD). It is seen from Table 3 and Figure 4 that (a) nonparametric RF model has the best predictive
capability with high R2 and low MSE in comparison to other methods across all IM levels (marked as bold in the table);
(b) Lasso, EN, and SVM overfit the model, which is evident from the low MSE and high R2 of the test set for BRD;
(c) the need to evaluate the overfitting tendency of the prediction model by a randomly assigned test set; and (d) the
need of k‐fold validation technique for the unbiased estimate of the MSE. Although the results are shown here only
for the two bridge components, the observation is valid for other components as well. The following section examines
the validity of the lognormality assumption placed on the demand model.
The fragility analysis has been traditionally carried out by adopting a lognormality assumption (or the normality
assumption on the log‐transformed variables) placed on the demand model.2,3,7,16 To investigate the validation of
the traditional lognormality assumption, Kolmogorov‐Smirnov test37 is carried out, which checks the null hypothesis
(A) (B)
FIGURE 4 Variation of MSE for the training and test set with the number of folds [Colour figure can be viewed at wileyonlinelibrary.com]
10 MANGALATHU AND JEON
TABLE 3 Comparison of average R2 and MSE for various machine learning methods with 10‐fold validation (training set and test set)
Sa‐1.0
EDP Error Measure Method 0.1 g 0.2 g 0.3 g 0.4 g 0.5 g 0.6 g 0.7 g 0.8 g 0.9 g 1.0 g 1.1 g 1.2 g
COL R2 (training set) Lasso 0.56 0.64 0.68 0.72 0.74 0.74 0.75 0.74 0.73 0.69 0.68 0.67
EN 0.55 0.64 0.68 0.72 0.74 0.74 0.75 0.74 0.72 0.69 0.68 0.67
SVM 0.47 0.57 0.64 0.69 0.71 0.71 0.71 0.71 0.69 0.65 0.63 0.63
RF 0.73 0.78 0.82 0.83 0.83 0.83 0.84 0.83 0.83 0.81 0.79 0.80
R2 (test set) Lasso 0.38 0.32 0.37 0.51 0.56 0.62 0.60 0.63 0.66 0.64 0.62 0.59
EN 0.41 0.36 0.38 0.51 0.57 0.63 0.61 0.64 0.66 0.64 0.61 0.58
SVM 0.57 0.47 0.46 0.56 0.55 0.61 0.54 0.60 0.66 0.65 0.60 0.61
RF 0.74 0.77 0.81 0.82 0.84 0.84 0.85 0.85 0.83 0.83 0.82 0.81
MSE (training set) Lasso 0.12 0.19 0.24 0.20 0.19 0.18 0.17 0.17 0.16 0.18 0.18 0.18
EN 0.12 0.19 0.24 0.20 0.19 0.18 0.17 0.17 0.16 0.18 0.18 0.18
SVM 0.15 0.22 0.28 0.22 0.21 0.19 0.19 0.19 0.18 0.20 0.21 0.20
RF 0.08 0.11 0.13 0.13 0.11 0.11 0.10 0.10 0.10 0.10 0.11 0.11
MSE (test set) Lasso 0.13 0.18 0.27 0.32 0.27 0.26 0.27 0.25 0.22 0.24 0.25 0.28
EN 0.12 0.17 0.26 0.31 0.26 0.26 0.26 0.24 0.21 0.22 0.23 0.26
SVM 0.12 0.19 0.24 0.28 0.25 0.22 0.25 0.26 0.25 0.28 0.27 0.27
RF 0.07 0.10 0.13 0.13 0.13 0.12 0.11 0.10 0.09 0.10 0.11 0.10
BRD R2 (training set) Lasso 0.73 0.75 0.72 0.75 0.67 0.65 0.61 0.59 0.59 0.58 0.58 0.55
EN 0.72 0.74 0.72 0.74 0.67 0.65 0.61 0.58 0.58 0.57 0.57 0.54
SVM 0.71 0.71 0.69 0.71 0.63 0.61 0.57 0.55 0.54 0.53 0.50 0.46
RF 0.78 0.80 0.80 0.82 0.78 0.79 0.79 0.77 0.76 0.74 0.74 0.72
R2 (test set) Lasso 0.35 0.29 0.12 0.15 0.22 0.28 0.26 0.30 0.34 0.36 0.32 0.33
EN 0.36 0.34 0.15 0.18 0.24 0.30 0.28 0.32 0.36 0.38 0.34 0.36
SVM 0.37 0.34 0.18 0.15 0.33 0.38 0.26 0.21 0.26 0.23 0.19 0.32
RF 0.76 0.78 0.77 0.78 0.75 0.79 0.79 0.79 0.78 0.76 0.76 0.73
MSE (training set) Lasso 0.05 0.05 0.06 0.05 0.07 0.08 0.09 0.09 0.09 0.11 0.12 0.15
EN 0.05 0.05 0.06 0.06 0.07 0.08 0.09 0.09 0.09 0.12 0.13 0.15
SVM 0.06 0.06 0.07 0.06 0.07 0.09 0.10 0.10 0.10 0.13 0.15 0.18
RF 0.04 0.04 0.04 0.04 0.04 0.05 0.05 0.05 0.05 0.07 0.08 0.09
MSE (test set) Lasso 0.12 0.10 0.12 0.12 0.10 0.10 0.11 0.12 0.14 0.15 0.16 0.16
EN 0.12 0.09 0.11 0.11 0.10 0.10 0.11 0.12 0.14 0.14 0.16 0.16
SVM 0.12 0.09 0.11 0.12 0.09 0.09 0.11 0.14 0.16 0.18 0.19 0.17
RF 0.05 0.04 0.04 0.04 0.05 0.04 0.05 0.05 0.05 0.07 0.07 0.09
that the data are lognormally distributed. If the p‐value is less than the cut‐off value of 0.05, the null hypothesis is
rejected. Thus, there is enough evidence that the data do not follow a lognormally distributed population. Figure 5
shows the histogram and probability plot of μϕ and δb at Sa‐1.0 = 1.0 g. If the p‐values for μϕ and δb are close to zero,
the data do not follow a lognormal distribution. Although not shown here, the lognormality assumption is not true
for all the EDPs considered in this study. Therefore, use of lognormality assumption leads to unrealistic demand
models.
It is observed from the above results that the RF model with 10‐fold validation has high R2 and low MSE in the
training and test set and the traditional lognormal assumption on the demand model leads to a biased estimate of
the demand model. Thus, RF is adopted as the machine learning algorithm for the further part of the study. As
mentioned before, RF is not placing any assumption on the demand models. A new methodology utilizing the capabil-
ities of RF and stripe approach is presented in the following section.
A new fragility methodology utilizing the features of RF and stripe method is suggested in this section. The proposed
stripe‐based fragility methodology has several advantages compared with the traditional stripe‐based fragility method
such as
MANGALATHU AND JEON 11
(A) (B)
(C) (D)
FIGURE 5 A, Histogram and B, probability plot for column curvature ductility at Sa‐1.0 = 1.0 g and C, histogram and D, probability plot for
bearing deformation at Sa‐1.0 = 1.0 g [Colour figure can be viewed at wileyonlinelibrary.com]
1. The proposed methodology can identify the relative importance of variables at each IM. The identification of the
relative importance of uncertain parameters helps bridge owners to spend their resources judiciously in updating
the input parameter database for future fragility analysis.
2. The methodology can be used to update the fragility curves without expensive resimulations if the input parameters
are required to be updated in future.
3. The proposed methodology does not place any assumptions on the demand model as RF‐based demand model is
nonparametric.
4. Although the proposed methodology involves sampling the bridge models across the range of uncertain input
parameters using LHS, the trained RF model can be later used to develop fragility curves for a fixed set of input
parameters (called bridge‐specific fragility curves).
Step 1: Generate bridge models using LHS technique and conduct the NLTHA of the bridge models with the ground
motions scaled to the desired IMs. As the ground motion suite selected in this study consists of 160 GMs, 160
statistically significant yet nominally identical bridge models are created in this study.
Step 2: Establish a predictive model connecting the input parameters (IM, Tm, and modeling parameters in Table 1)
and output parameter (demand) using RF at each IM. The model can be established through a k‐fold validation
approach, and the accuracy of the model is estimated by comparing the R2 and MSE of the training and test set.
This step also helps identify the relative importance of the uncertain input variables at each IM level. Note that
the predictive model is established by taking logarithms for the input and output data to reduce the nonlinearity
in the relationship between the input and output parameters.16
Step 3: Generate a large number of demand estimates (N, one million in the current study) for each component,
ki, using their respective RF demand model by generating N values of input parameters randomly generated
12 MANGALATHU AND JEON
based on their probabilistic distribution. If the fragility curve is intended for a specific bridge, establish the
demand model by only accounting for the material nonlinearity and Tm.
Step 4: Generate N capacity values for a specific damage state for each bridge component based on the assumed
distribution of the capacity state of the bridge components (Table 2).
Step 5: Obtain the probability of failure (p f ,IM = x) by comparing the capacity values (Step 4) with the demand
Nf
values (Step 3). That is, pf ;IM¼x ¼ , where N f is the number of samples where the demand value is greater than
N
the capacity value.
Step 6: Repeat Steps 2 and 5 for different levels of IM to estimate the probability of failure at specified IMs. Estimate
the fragility curves by a logarithmic fit of the probability of failure versus IM estimates.
Note that the machine learning‐based demand models are available only at the predefined IMs and it requires the
NLTHA of the bridge samples with the scaled ground motions (Step 1) if one would like to estimate the demand models
for a new IM. However, the probability of failure for a new IM can be easily obtained from the lognormal fitted fragility
curves without resimulations. Although there is no assumption placed on the demand model, the capacity model is
assumed to be lognormal in the current study following the previous work on bridge fragilities.3,7,12 Developing bridge‐
attribute related capacity model needs extensive experimental data and is beyond the scope of the current study. Also,
Mangalathu12 suggested that the lognormal fit is the best possible fit for capacity models based on available experimental
results. It is noted that the cloud‐based approach is computationally less expensive compared with the stripe‐based
(existing and proposed) approach. However, the cloud‐based approach involves making a priori assumption about the
probabilistic distribution of seismic demand, which tends to be a drawback. Compared with the existing stripe approach,
the proposed approach involves additional computational efforts through the generation of RF models at each IM.
Figure 6 compares component fragility curves for the selected bridge classes (combined three‐span and four‐span
bridges with the uncertainties mentioned in Table 1) using the existing stripe method and the proposed RF‐based
method. It is seen from Figure 6 that the dispersion (or lognormal standard deviation) associated with the proposed
method is less than that with the existing stripe method in that the slope for the proposed method is stiffer. The obser-
vation is valid for all the bridge components at all the limit states under consideration. It is also noted that there is not
much statistical difference between the median values of the fragilities obtained from the two methods at various limit
states for the selected bridge components.
Depending on the number of the input variables used in the DTs, RF can be used to estimate the relative importance of
the uncertain input variables. RF estimates the relative importance of an input variable by noting the increase in the
OOB error of the variable for different permutations while the other variables are kept constant. Figure 7 shows the rel-
ative importance of the uncertain variables considered in the current study on the column curvature ductility for three
levels of IM: Sa‐1.0 = 0.2, 0.6, and 1.0 g. The relative importance of the input variables for all the EDPs at Sa‐1.0 = 0.6 g is
given in Figure 7. Figures 7 and 8 should be interpreted as the relative importance of the variables in the estimation of
the seismic demand model given the uncertainties reported in Table 1. Following inferences can be obtained from
Figures 7 and 8.
1. Span length (Lm), longitudinal reinforcement ratio (ρl), height of the column (Hc), approach span to main span
ratio (η), and deck width (Dw) are the variables that have a significant influence on the seismic demand of bridge
components for all the EDPs.
2. Depending on the EDP under consideration, different variables differently affect the demand model. For example,
the abutment pile stiffness (Kp) has a significant influence on the abutment response in passive, active, and trans-
verse directions (Figure 8) while Kp does not have a significant influence on the column curvature ductility.
MANGALATHU AND JEON 13
(A) (B)
(C) (D)
(E) (F)
FIGURE 6 Comparison of component fragility curves using the existing and proposed methods [Colour figure can be viewed at
wileyonlinelibrary.com]
3. The relative importance of the variables changes with the change in IM. For example, the relative importance of
Lm is higher at Sa‐1.0 = 0.6 g in comparison to Sa‐1.0 = 0.2 and 1.0 g.
4. Figure 7 underscores the need to include various uncertainties in the seismic demand model, if the fragility curves
are intended for a regional risk assessment. The input parameters including Lm, ρl, Hc, η, and Dw have a significant
influence on the demand model, and thus neglecting the uncertainty in these parameters leads to unreliable
fragility estimates.
5. The relative importance helps quantify the associated error if the uncertainties associated with a specific input
parameter are not properly evaluated in the estimation of seismic vulnerability. For example, the error in the
uncertainty estimation of m f might have a minimal impact on the fragility curves, while the error in the uncer-
tainty estimation of Lm leads to unrealistic fragility estimates.
The influence of Lm and η on the demand models can be attributed to the fact that they increase the mass and
flexibility of deck (and bridge system), leading to the increase of seismic demands. μϕ is significantly affected by D,
Hc, and ρl at all the IM levels. The strength and stiffness of columns are a function of D, Hc, and ρl, which explains
the relative importance of D, Hc, and ρl in μϕ. The abutment response is significantly influenced by the back fill soil type
(BT), abutment height (Ha), and abutment pile stiffness (Kp). This is due to the fact that the force‐displacement relation-
ship of the abutments is a function of these variables.
Fragility curves are generated for three‐span and four‐span bridge configurations using the proposed methodology
reflecting the uncertainties listed in Table 1. Figure 9 shows the comparison of the fragility curves for three‐span and
four‐span bridges at the moderate damage state. Previous studies12,25 based on an extensive statistical analysis noted
that the three‐span and four‐span bridges have statistically similar performance. Such a conclusion is substantiated
by the fragility curves presented in Figure 9, as there is not much statistical difference between the fragilities of these
bridges. However, the grouping of three‐span and four‐span bridges in Mangalathu12 and Mangalathu et al25 was based
on the comparison of the seismic demand models, while the current inference is based on the comparison of bridge
fragilities.
To implement in regional risk assessment platform such as HAZUS (for practical use), the developed RF‐based fra-
gility curves in Figure 6 are defined as a lognormal cumulative distribution function with median (λ) and dispersion (β).
The lognormal cumulative distribution function is fitted to the discrete points of failure probability by minimizing the
sum of the square of residuals between the actual and fitted values.
Figures 10 and 11 show the difference between the actual and fitted fragility curves for all the EDPs of the bridge
class (combined three‐span and four‐span) at the moderate (LS2) and complete (LS3) damage states for the proposed
RF‐based and traditional approaches, respectively. The median (λ) and dispersion (β) of the fitted curves for all bridge
components across four limit states are provided in Table 4. It can be seen from Table 4, the change in the median value
between the two approaches except for the deck unseating is only less than 8%. However, the failure probabilities of the
deck unseating over the entire range of IM are very small, and thus its median and dispersion differences can be
(A) (B)
FIGURE 9 Fragility curves at moderate damage state for three‐span and four‐span bridges [Colour figure can be viewed at
wileyonlinelibrary.com]
MANGALATHU AND JEON 15
(A) (B)
FIGURE 10 Comparison of actual and fitted RF‐based fragility curves for selected bridge class [Colour figure can be viewed at
wileyonlinelibrary.com]
(A) (A)
FIGURE 11 Comparison of actual and fitted traditional fragility curves for selected bridge class [Colour figure can be viewed at
wileyonlinelibrary.com]
TABLE 4 Log‐fitted fragility characteristics for all components of selected bridge class
Random Forest COL 0.175 0.350 0.479 0.301 0.621 0.300 0.745 0.302
ABP 0.824 0.296 1.640 0.251 – – – –
ABA 0.651 0.299 1.114 0.245 – – – –
ABT 0.199 0.303 0.520 0.316 – – – –
BRD 0.107 0.450 0.506 0.431 – – – –
UST – – – – 4.419 0.612 20.275 0.881
Traditional COL 0.179 0.547 0.468 0.469 0.611 0.471 0.735 0.474
ABP 0.817 0.479 1.656 0.419 – – – –
ABA 0.640 0.486 1.103 0.429 – – – –
ABT 0.185 0.488 0.520 0.512 – – – –
BRD 0.107 0.599 0.525 0.652 – – – –
UST – – – – 3.461 0.711 5.510 0.697
negligible. However, the change in fragility dispersion for all components except for the unseating can vary from 36% to
43%. Note that the proposed methodology generates fragility curves with less dispersion. Table 4 substantiates the need
for nonparametric approaches in the generation of reliable fragility curves for seismic risk estimation.
6 | CONCLUSIONS
The existing fragility methodologies often place a lognormal assumption on the seismic demand model. It is critical
to check the validity of such an assumption, as fragility curves are implemented these days in earthquake alerting
16 MANGALATHU AND JEON
systems. This paper (a) investigates the validity of lognormal assumption on the seismic demand model for bridges in
California, (b) evaluates the performance and predictive capability of parametric and nonparametric approaches in
the generation of seismic demand model, (c) suggests a methodology for the generation of fragility curves without
placing any assumption on the seismic demand model, and (d) extends the stripe‐based approach to account for
the material, geometric, and structural uncertainties in the generation of fragility curves intended for regional risk
assessment. To achieve the abovementioned objectives, this paper suggests a new fragility methodology for the
generation of bridge‐specific fragility curves using a stripe approach. The proposed methodology is demonstrated
by generating fragility curves for three‐span and four‐span bridges in California reflecting the material, structural,
and geometric uncertainties.
Numerical bridge models including the above uncertainties are created in OpenSees and are randomly paired with
scaled ground motions. A set of nonlinear time history analyses is carried to estimate the seismic demand of bridge
components at each level of IM. Various demand parameters such as column curvature ductility, abutment displace-
ment in the passive, active, and transverse direction, superstructure unseating displacement, and elastomeric bearing
displacement are considered in the current study. To evaluate the performance of parametric and nonparametric
approaches in the generation of seismic demand model, an initial study is carried out to evaluate the performance
of the predictive model by dividing the seismic demand parameters into training and test sets. The training set is
used to establish the prediction model, which is used to evaluate the performance of the test set. It is noted that
a nonparametric approach called RF has the lowest MSE and highest coefficient of determination for the test set.
The conclusion is true for all the seismic demand parameters considered in the current study. Also, the seismic
demand model using RF is not placing any assumption on the demand model unlike other approaches prevalent
in the seismic fragility analysis. Based on the insights from the RF‐based seismic demand model, a new fragility
methodology is suggested in this paper. The proposed methodology utilizes the capabilities of stripe approach (inclu-
sion of the transition of structural response from elastic to inelastic behavior, which marks the conclusion of the
simulation and ground motion scaling) and RF‐based demand model (no priori assumption about the probabilistic
distribution of the seismic demand).
The proposed stripe‐based fragility methodology can be thus used to estimate the seismic demand and fragility for
a new set of input parameters without expensive simulation (ie, generating and updating bridge‐specific fragility
curves). The proposed methodology also helps to identify the relative importance of uncertain input parameters on
the seismic demand model of various bridge components for each IM. For the bridges considered in the current
study, the variables such as span length, approach span‐to‐main span ratio, longitudinal reinforcement ratio, deck
width, and column height have a significant influence on the seismic demand and fragility of all bridge components,
for a given IM. It is also noted that the traditional lognormal assumption on the seismic demand model leads to
unrealistic fragility estimates. For the specific three‐span and four‐span bridges, the traditional lognormal
assumption results in the higher dispersion (lognormal standard deviation) of component fragility curves. The disper-
sion between the traditional and proposed approaches varies between 36% and 43% depending on the component
under consideration. However, there is only less than 8% variation in the median value of the fragility curves
between the two approaches. The conclusion is true for all the bridge components except for the deck unseating
at all the limit states.
Using the fragility methodology, fragility values are suggested for three‐span and four‐span bridges in California.
The suggested fragility curves can be implemented in risk assessment platform such as HAZUS for a more accurate
and reliable seismic loss estimation. Note that the methodology is evaluated only for some selected bridge classes in this
study, and further studies are needed to check the relevance of this methodology for other bridge configurations and
infrastructure systems (eg, buildings, pipelines).
ORCID
Sujith Mangalathu https://orcid.org/0000-0001-8435-3919
Jong‐Su Jeon https://orcid.org/0000-0001-6657-7265
MANGALATHU AND JEON 17
R EF E RE N C E S
1. Nielson BG. Analytical Fragility Curves for Highway Bridges in Moderate Seismic Zones. GA: Ph.D. thesis, School of Civil and
Environmental Engineering, Georgia Institute of Technology; 2005.
2. Mackie K, Stojadinović B. Post‐earthquake functionality of highway overpass bridges. Earthq Eng Struct Dyn. 2006;35(1):77‐93.
3. Padgett JE. Seismic Vulnerability Assessment of Retrofitted Bridges Using Probabilistic Methods. GA: Ph.D. thesis, School of Civil and
Environmental Engineering, Georgia Institute of Technology; 2007.
4. Banerjee S, Shinozuka M. Mechanistic quantification of RC bridge damage states under earthquake through fragility analysis. Probab
Eng Mech. 2008;23(1):12‐22.
5. Zhang J, Huo YL. Evaluating effectiveness and optimum design of isolation devices for highway bridges using the fragility function
method. Eng Struct. 2009;31(8):1648‐1660.
6. Vosooghi A, Saiidi MS. Seismic damage states and response parameters for bridge columns. ACI Spec Publ. 2010;271:29‐46.
7. Ramanathan KN. Next Generation Seismic Fragility Curves for California Bridges Incorporating the Evolution in Seismic Design Philosophy.
GA: Ph.D. thesis, School of Civil and Environmental Engineering, Georgia Institute of Technology; 2012.
8. Ghosh J. Parameterized Seismic Fragility Assessment and Life‐Cycle Analysis of Aging Highway Bridges. TX: Ph.D. thesis, Department of
Civil Engineering, Rice University; 2013.
9. Mangalathu S, Jeon J‐S, Soleimani F, DesRoches R, Padgett J, Jiang J. Seismic vulnerability of multi‐span bridges: an analytical
perspective. 10th Pacific Conference on Earthquake Engineering, Sydney, Australia, 2015.
10. Monteiro R. Sampling based numerical seismic assessment of continuous span RC bridges. Eng Struct. 2016;118:407‐420.
11. Mangalathu S, Jeon J‐S, DesRoches R, Padgett JE. ANCOVA‐based grouping of bridge classes for seismic fragility assessment. Eng Struct.
2016;123:379‐394.
12. Mangalathu S. Performance Based Grouping and Fragility Analysis of Box‐Girder Bridges in California. GA: Ph.D. thesis, School of Civil
and Environmental Engineering, Georgia Institute of Technology; 2017.
13. Monteiro R, Zelaschi C, Silva A, Pinho R. Derivation of fragility functions for seismic assessment of RC bridge portfolios using different
intensity measures. J Earthq Eng. 2017; in press;1‐17.
14. Vamvatsikos D, Cornell CA. Incremental dynamic analysis. Earthq Eng Struct Dyn. 2002;31(3):491‐514.
15. Mackie K, Stojadinović B. Comparison of incremental dynamic, cloud, and stripe methods for computing probabilistic seismic demand
models. Proceedings of Structures Congress 2005, New York, 1–11, 2005.
16. Jeon J‐S, Mangalathu S, Song J, DesRoches R. Parameterized seismic fragility curves for curved multi‐frame concrete box‐girder bridges
using Bayesian parameter estimation. J Earthq Eng. 2017; in press;1‐26.
17. Mangalathu S, Jeon J‐S, DesRoches R. Critical uncertainty parameters influencing seismic performance of bridges using Lasso regression.
Earthq Eng Struct Dyn. 2018;47(3):784‐801.
18. Karamlou A, Bochini P. Computation of bridge seismic fragility by large‐scale simulation for probabilistic resilience analysis. Earthq Eng
Struct Dyn. 2015;44(12):1959‐1978.
19. Breiman L. Random forests. Mach Learn. 2001;45(1):5‐32.
20. Friedman J, Hastie T, Tibshirani R. The Elements of Statistical Learning. Springer Series in Statistics. Berlin: Springer; 2001.
21. Díaz‐Uriarte R, de Andres SA. Gene selection and classification of microarray data using random forest. BMC Bioinf 2006. 2006;7(1):3.
22. Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP. Random forest: a classification and regression tool for compound
classification and QSAR modeling. J Chem Inf Comput Sci. 2003;43(6):1947‐1958.
23. Rokneddin K. Reliability and Risk Assessment of Networked Urban Infrastructure Systems Under Natural Hazards. Houston, TX: Ph.D.
thesis, Department of Civil and Environmental Engineering, Rice University; 2013.
24. Tesfamariam S, Liu Z. Earthquake induced damage classification for reinforced concrete buildings. Struct Saf. 2010;32(2):154‐164.
25. Mangalathu S, Soleimani F, Jeon J‐S. Bridge classes for regional seismic risk assessment: improving HAZUS models. Eng Struct.
2017;148:755‐766.
26. Breiman L. Bagging predictors. Mach Learn. 1996;24(2):123‐140.
27. Liaw A, Wiener M. Classification and regression by random forest. R News. 2002;2(3):18‐22.
28. McKenna F. OpenSees: a framework for earthquake engineering simulation. Comput Sci Eng. 2011;13(4):58‐66.
29. Muthukumar S, DesRoches R. A Hertz contact model with non‐linear damping for pounding simulation. Earthq Eng Struct Dyn.
2006;35(7):811‐828.
30. Shamsabadi A, Khalili‐Tehrani P, Stewart JP, Taciroglu E. Validated simulation models for lateral response of bridge abutments with
typical backfills. J Bridg Eng. 2010;15(3):302‐311.
31. Silva PF, Megally S, Seible F. Seismic performance of sacrificial exterior shear keys in bridge abutments. Earthq Spectra.
2009;25(3):643‐664.
32. Jeon J‐S, Choi E, Noh M‐H. Fragility characteristics of skewed concrete bridges accounting for ground motion directionality. Struct Eng
Mech. 2017;63(5):647‐657.
18 MANGALATHU AND JEON
33. LPILE. LPILE v6.0—A Program for the Analysis and Design of Piles and Drilled Shafts Under Lateral Loads. Austin, TX: Ensoft, Inc.
Engineering Software; 2012.
34. Baker JW, Lin T, Shahi SK, Jayaram N. New Ground Motion Selection Procedures and Selected Motions for the PEER Transportation
Research Program. PEER Report 2011/03. Berkeley, CA: Pacific Earthquake Engineering Research Center, University of California; 2011.
35. McKay MD, Beckman RJ, Conover WJ. Comparison of three methods for selecting values of input variables in the analysis of output from
a computer code. Dent Tech. 1979;21(2):239‐245.
36. Breiman L, Spector P. Submodel selection and evaluation in regression. The X‐random case. International Statistical Review/Revue
Internationale de Statistique, 291–319, 1992.
37. Vidakovic B. Statistics for Bioengineering Sciences: With MATLAB and WinBUGS Support. Springer Science & Business Media; 2011.
How to cite this article: Mangalathu S, Jeon J‐S. Stripe‐based fragility analysis of multispan concrete bridge
classes using machine learning techniques. Earthquake Engng Struct Dyn. 2019;1–18. https://doi.org/10.1002/
eqe.3183