Physics-Guided Convolutional Neural Network (PhyCNN) For Data-Driven
Physics-Guided Convolutional Neural Network (PhyCNN) For Data-Driven
Engineering Structures
journal homepage: www.elsevier.com/locate/engstruct
A R T I C LE I N FO A B S T R A C T
Keywords: Accurate prediction of building’s response subjected to earthquakes makes possible to evaluate building per-
Deep learning formance. To this end, we leverage the recent advances in deep learning and develop a physics-guided con-
Physics-guided convolutional neural network volutional neural network (PhyCNN) for data-driven structural seismic response modeling. The concept is to
PhyCNN train a deep PhyCNN model based on limited seismic input–output datasets (e.g., from simulation or sensing)
K-means clustering
and physics constraints, and thus establish a surrogate model for structural response prediction. Available
Seismic response prediction
Serviceability assessment
physics (e.g., the law of dynamics) can provide constraints to the network outputs, alleviate overfitting issues,
reduce the need of big training datasets, and thus improve the robustness of the trained model for more reliable
prediction. The surrogate model is then utilized for fragility analysis given certain limit state criteria. In addition,
an unsupervised learning algorithm based on K-means clustering is also proposed to partition the datasets to
training, validation and prediction categories, so as to maximize the use of limited datasets. The performance of
PhyCNN is demonstrated through both numerical and experimental examples. Convincing results illustrate that
PhyCNN is capable of accurately predicting building’s seismic response in a data-driven fashion without the need
of a physics-based analytical/numerical model. The PhyCNN paradigm also outperforms non-physics-guided
neural networks.
⁎
Corresponding author at: Department of Civil and Environmental Engineering, Northeastern University, Boston, MA 02115, USA.
E-mail address: [email protected] (H. Sun).
https://doi.org/10.1016/j.engstruct.2020.110704
Received 25 October 2019; Received in revised form 19 February 2020; Accepted 21 April 2020
Available online 30 April 2020
0141-0296/ © 2020 Elsevier Ltd. All rights reserved.
R. Zhang, et al. Engineering Structures 215 (2020) 110704
community for a long time; however, issues exist in regard to accuracy 2. Physics-guided convolutional neural network (PhyCNN)
as well as stationarity and linearity hypothesis.
Recently, considerable attention has been focused on Artificial Neural networks have been widely recognized as a powerful tool to
Intelligence (AI) which has been proven to be a powerful response deal with problems like classification and regression. Among many
modeling tool and approximator [31–35]. In particular, support vector other neural networks, CNN, which is inspired by the virtual convex of
machines (SVM) and artificial neural networks (ANN) have been used animals [55], can effectively model the grid-structured topology of data
for the identification and modeling of dynamic responses during the (e.g., images), making it especially powerful for image classification.
past decade. For example, Zhang et al. [36] employed SVM to identify However, CNN is also capable of dealing with regression problems,
structural parameters. Dong et al. [37] predicted the dynamic response which is often inconspicuous due to its capability in classification.
of an oscillator and a frame structure based on a SVM-based two-stage Traditionally, deep neural networks are trained solely based on data.
method. In addition, multi-layer perceptron (MLP) ANN has been ap- However, by adding physics (e.g., the governing law of dynamics) into
plied for predicting structural response under static or dynamic loading the training phase, the robustness and reliability of learning from the
conditions. For instance, Lightbody and Irwin [38] applied MLP to data can be further enhanced. In other words, the embedded physics
predict the viscosity of an industrial polymerization reactor. Wang et al. can inform the learning and constrain the training to a feasible space. In
[39] developed a MLP back-propagation network to predict the seismic this paper, a 1D regression-oriented PhyCNN architecture is proposed
response of a bridge structure. Christiansen et al. [40] used previous for time series modeling.
time step information as input to a one layer MLP network to predict To illustrate the concept, let’s consider a dynamic system subjected
the dynamic response of the simplified model of a wind turbine. Huang to the ground excitation following the equation of motion that obeys
et al. [41] identified structural dynamic characteristics and performed the Newton’s second law:
damage diagnosis of a building using a back-propagation MLP taking
Mx¨ (t ) + h (t ) = −MΓx¨ g (t ) (1)
previous time step information as input. Lagaros and Papadrakakis [42]
proposed an MLP network to predict the structural nonlinear behavior where M is the mass matrices; x, ẋ , and ẍ are the relative displacement,
of 3D buildings when earthquake excitations with increasing intensities velocity, and acceleration vectors to the ground; ẍ g represents the
are considered. ground acceleration; Γ is the force distribution vector; and h is the
Nevertheless, a very limited number of studies have been reported latent generalized restoring force vector. Normalizing Eq. (1) by M, the
in literature for structural response modeling and prediction using more governing equation can be expressed as
advanced deep learning models such as the recurrent neural network
f ≔ x¨ (t ) + g (t ) + Γx¨ g (t ) → 0 (2)
(RNN) and the convolutional neural network (CNN). RNN is designed to
learn sequential and time-varying patterns for regression problems where g(t ) is the mass-normalized restoring force, namely,
[43–46], while CNN is known for its capability in classification of data g (t ) = M−1h (t ) , whose closed form is assumed unknown. Hence, it will
with grid-like topology (e.g., 1D sequences and 2D images) [47,48]. be learned by the proposed PhyCNN.
CNN can be also used for solving regression problems. Recently, Sun A PhyCNN framework is developed for surrogate modeling of such a
et al. [49] proposed a virtual sensor model using CNN to estimate the nonlinear dynamic system under ground motion excitation. The pro-
dynamic responses of two numerical structure given measurements at posed deep learning framework consists of a 1D CNN and a graph-based
other locations. However, the network was trained using the partial tensor differentiator. Fig. 1 shows the basic concept and architecture of
response measurements as input to predict responses of the rest of PhyCNN in the context of structural response modeling given the
DOFs, limiting its applications in structural response prediction under ground acceleration as input which contains n sample points from t1 to
new inputs. Another remarkable work by Wu and Jahanshahi [50] used tn . The outputs are state space variables z(t ) including the structural
CNN to estimate structural dynamic response and perform system displacement x(t ) , the velocity ẋ (t ) , and the normalized restoring force
identification, which showed great capability of CNN for sequence re- g(t ) , namely, z (t ) = {x (t ), ẋ (t ), g (t )} , each of which has same number of
gression. Nevertheless, the hypothesis was that the data is sufficient to n sample points ranging from t1 to tn . With the convolution operation
train a reliable predictive model. Challenges arise when the available illustrated in Section 2.1, zero-padding is added to the output sequence
training data is scarce. A potential solution to overcome this limitation of each convolution layer to ensure the identical input/output length.
is to incorporate scientific laws (e.g., partial differential equations, The proposed PhyCNN architecture consists of multiple hidden layers
boundary conditions) into deep neural networks to guide the deep besides the input and output layers, namely, the feature learning layers
learning from scarce data [51–54]. We herein address limitations in and the fully connected layers [56,57]. A typical feature leaning layer
data-driven structural response modeling through developing a novel usually includes a convolution layer, a nonlinear layer (or nonlinear
physics-guided CNN (i.e., PhyCNN), which is capable of accurately activation function), and a feature pooling layer. The output of each
predicting nonlinear structural seismic time-history responses in a data- layer is called a feature map since the feature learning layers are used
driven manner. The basic concept is to (1) embed available physics for extracting features from the input or from the output from the
knowledge into the deep learning model, (2) train a PhyCNN based on previous layer. In the proposed PhyCNN architecture, the dimension of
available seismic input–output datasets (e.g., from simulation or sen- height in a classical CNN is reduced to one, making it possible to take a
sing), and (3) use the trained PhyCNN as a surrogate model for response time-series signal as input, while the dimension of width represents the
prediction. The surrogate model can further be utilized for fragility temporal space. In addition, a graph-based tensor differentiator (e.g.,
analysis given certain limit state criteria (e.g., the serviceability state). the finite different method) is developed to calculate the derivative of
It is noted that the available physics (e.g., the law off dynamics) can state space outputs ż (t ) = {x t (t ), ẋ t (t ), gt (t )} to construct the physics
provide constraints to the network outputs, alleviate overfitting issues, loss from the governing equation, where the subscript t represents the
reduce the need of big training datasets, and thus improve the robust- derivative of the state with respect to time. The basic concept here is to
ness of the trained model for more reliable prediction. optimize the network hyperparameters θ = {Wθ, bθ} such that the
This paper is organized as follows. Section 2 presents the proposed PhyCNN can interpret the measurement data (e.g., xm, ẋ m, gm ) while
PhyCNN architecture for structural response modeling. In Section 3, the satisfying the physical equation of motion in Eq. (2), e.g., f → 0 . Here,
performance of PhyCNN is verified through two numerical examples of Wθ and bθ are the neural network weight and bias parameters. The total
a nonlinear system. Section 4 presents the experimental validation of loss J (θ ) is then defined as
PhyCNN based on filed sensing measurements, where data-driven ser-
J (θ ) = JD (θ ) + JP (θ ) (3)
viceability analysis of a building is also discussed. Section 5 summarizes
the conclusions. with
2
R. Zhang, et al. Engineering Structures 215 (2020) 110704
Fig. 1. The proposed physics-guided convolutional neural network (PhyCNN) for time-series modeling. The PhyCNN architecture includes the input layer, the feature
learning layer, fully-connected layer, the output layer, and the graph-based tensor differentiator. The inputs are ground accelerations (or ground displacements) and
the outputs are state space variables z(t ) including displacement x(t ) , velocity ẋ (t ) , and restoring force g(t ) , namely, z (t ) = {x t (t ), ẋ t (t ), gt (t )} . The derivatives of state
space outputs ż (t ) are calculated through a tensor differentiator using the central finite difference method. The total loss consists of the data loss from the mea-
surements and the physics loss which models the dependency between the output features. Both input and output to the network are time sequences, including p0
(e.g., ground acceleration and/or ground displacement) and po (e.g., state space variables at different floor levels) features, respectively. The size of layer input and
output is given. The convolution layer is defined as “height × width × depth × filters (output channels)”. An identical kernel size is used for all three convolution
layers in this study. Zero-padding is added to the output sequence of each convolution layer due to the convolution operation as illustrated in Section 2.1. Note that
the nonlinear activation functions are not shown. in this figure.
1 1 1 is scanned through sliding the kernels along the temporal space with
JD (θ ) = ‖xp − xm‖22 + ‖ẋ p − ẋ m‖22 + ‖g p − gm‖22
N N N (4) one single stride. In this way, the time dependency is captured by
1 1 convolving a sequence of input across the entire temporal space. The
JP (θ ) = ‖ẋ p − xtp‖22 + ‖ẋ tp + g p + Γx¨ g ‖22 stride operation will lead to smaller output length if zero padding is not
N N (5)
applied. To ensure the output has the same length n as the input in the
where JD (θ ) denotes the data loss based the measurements while JP (θ ) temporal space, zero-padding is added at the end of the input time
represents the physics loss which introduces a constraint for the neural series. The number of required zero-padding, P, is given by P = k − 1.
network that models the dependency in-between the output features; The dimension of the convolution layer output z(l) can be different from
the superscript p and m denote the prediction and measurement, re- the layer input z(l − 1) , where l denotes the layer index. The number of
spectively. Note that the measurements are not necessarily required for filters, p, specifies the dimensionality of the output space. For the jth
the complete state, which could be part of the state variables (e.g., xm (j = 1, 2, ⋯, p) output feature (i.e., or called the channels in a stan-
only) or the accelerations (e.g., ẍ m) . In such a case, the data loss in Eq. dard CNN), the corresponding output of a convolution layer can be
(4) should be adjusted accordingly. Note that equal weights are used for written as
JD (θ ) and JP (θ ) in this study which lead to satisfactory convergence.
p0
Weights may be adjusted to improve the training performance in dif-
z(jl) = ∑ W i(l) ∗ zi(l − 1) + b(jl)
ferent problems. During training, θ will be updated and determined by (6)
i=1
solving the optimization problem θ ̂ ≔ arg minθ J (θ ) . Note that the
proposed PhyCNN architecture used in this study has five convolution which takes the output of the previous layer z(l − 1) with zero-padding as
layers (c = 5) with identical kernel size and three fully-connected input. Here, i represents the ith input feature/channel
layers. Details of each layer are discussed in the following subsections. (i = 1, 2, ⋯, p0 );W i(l) is the receptive field with the kernel size of k for
the ith input feature; b(jl) is the bias vector added to the summed term;
2.1. Convolution layer and ∗ represents the 1D convolution operator. Note that for each output
feature map, same bias is shared across the temporal space.
The convolution (Conv) layer is the basis of the CNN architecture, A simple example is presented in Fig. 2 to illustrate the convolution
which performs the core operations of feature learning. The size of a operation in the temporal space. Herein, a single-channel input se-
convolution layer is defined as “height × width × depth × filters (output quence ( p0 = 1) with a length of n = 10 is considered, with a receptive
channels)”, e.g., 1 × k × p0 × p1 for the first Conv layer as shown in filed size of k = 5 for the jth filter. The kernel slides across the temporal
Fig. 1. Each Conv layer consists of a set of learnable kernels (also known space with a stride s = 1, resulting in the output with a length of 10.
as filters) with a size of 1 × k , which are parameterized by a hy- The weights of the receptive field are [1, 0, − 1, 0, 1], shared across the
perparameter called the receptive field containing a group of weights entire temporal space.
shared over the entire input temporal space. The initial weights of a Conventionally, a convolution layer is always followed by a non-
receptive field are typically randomly generated. During the forward linear activation function, which introduces nonlinearity and makes
pass, the kernels convolve across the temporal space of the input and learning easier by adapting with variety of data and differentiating
compute dot products between the entries of a receptive field and a between the output. The activation layer increases the nonlinearities of
local region of the input which represents a sequence of input time the model and the overall network without affecting the receptive fields
series. The dot products are summed, and the bias is added to the of the convolution layer. Common nonlinear activation functions in-
summed value, forming a single entry of the output. The full input space clude rectified linear unit (ReLU) [58], hyperbolic tangent (Tanh), and
3
R. Zhang, et al. Engineering Structures 215 (2020) 110704
Fig. 2. Illustration of the convolution process in a single convolution layer with i = p0 = 1, n = 10, k = 5 and s = 1.
sigmoid function. In this paper, the ReLU function shown in Eq. (7) is output in the range of [0, infinite), (−1, 1), and (0, 1), respectively.
employed, which gives slightly better performance compared to Tanh Other potential alternatives include parametric rectified linear unit
and sigmoid. (PReLU) [59] and exponential linear unit (ELU) [60]. In this paper, the
Tanh function is used as the activation within the FC layers and the
0, for x < 0
f (x ) = ⎧ linear activation function is applied for the output layer.
⎩ x , for x ≥ 0
⎨ (7)
2.4. Dropout layer
2.2. Pooling layer Dropout layers can be added after each convolution layer and fully-
connected layer to reduce overfitting by preventing complex co-adap-
The pooling layer is often used to reduce the spatial size of the tations on training data [61], which has remained as a common issue in
feature maps when dealing with classification problems with large machine learning. The key idea is to randomly disconnect the connec-
input data. Popular pooling layers include max pooling and average tions and drop units from the connected layer with a certain dropout
pooling, which take either maximum or mean values from a pooling rate during training. Dropout layers can also improve the training
window. For example, Fig. 3 shows the max pooling operation in a speed. Typically, they are applied before the FC layer which has more
standard CNN. In PhyCNN, pooling layer will perform a down-sampling learnable parameters and is more likely to cause overfitting. Although
operation in the temporal space, resulting in smaller output length, convolution layers are less likely to overfit due to their particular
which is undesired for regression problems like time-series prediction. structure where weights are shared over the spatial space, dropout
Although zero-padding can be applied to keep same output length as layers can still be applied to the convolution layers which have huge
input, the time dependencies are alternated. Therefore, the pooling parameters. In this study, dropout layers with a dropout rate of 0.2 are
layer is restricted in the proposed PhyCNN architecture for structural applied before the FC layers.
response modeling.
2.5. PhyCNN for response modeling
2.3. Fully-connected layer
The proposed PhyCNN architecture takes the ground motion (e.g.,
Exactly as its name implies, the fully-connected (FC) layer has full ground accelerations) as input and the structural responses (e.g., story
connections to all activations in the previous layer, as observed in displacements) as output to learn the feature mapping between the
regular neural networks. A FC layer multiplies the input by a weight input and output. First, the model is trained with either synthetic da-
matrix and then adds a bias vector. FC layers are typically used in the tabase or field sensing measurements. Then the trained model can be
last stage of the CNN to connect to the target output layer and construct used to predict the structural responses under new seismic excitations.
the desired number of output classes. For regression problems like time- To train the proposed PhyCNN architecture, both the input and output
series prediction, nonlinear activation functions such as ReLU, Tanh, dataset must be formatted as a three-dimensional array, where the
and sigmoid are inappropriate for the last FC layer since they map the entries are samples in the first dimension, time history steps in the
4
R. Zhang, et al. Engineering Structures 215 (2020) 110704
second dimension, and input or output features in the last dimension. CNN. Fig. 4(a) shows the regression analysis across all 90 prediction
The detailed neural network architecture is illustrated in Fig. 1, in- datasets for both PhyCNN (top-left) and CNN (bottom-left). It can be
cluding five convolution layers and three fully-connected layers in ad- clearly seen that the prediction accuracy is greatly increased by em-
dition to the input and output layer. Each convolution layer has 64 bedding the physics constraints into deep learning. The time histories of
filters with a kernel size of 50 used in this study. The number of filters predicted displacements are presented in Fig. 4(b) corresponding four
(nodes) and kernel length can be adjusted to get better performance for different levels of correlation coefficients (noted by r), namely, 0.95,
different problems. Note that the number of nodes for the last FC layer 0.92, 0.87, 0.61 using PhyCNN and 0.60, 0.72, 0.66, 0.37 using CNN.
must be equal to the number of output features. The PhyCNN prediction matches the reference well in both magnitudes
The entire training process is performed in a Python environment and phases. Even for the worst case of r = 0.61, the proposed PhyCNN
using Keras [62]. Keras is a high-level open source deep learning library approach is able to reasonably predict the structural dynamics. On the
built on top of TensorFlow which offers easy and fast prototyping contrary, CNN produces less satisfactory prediction especially in pre-
neural networks. TensorFlow, a symbolic math library for machine dicting the displacement magnitudes. Another salient feature of
learning applications developed by Google Brain Team [63], is served PhyCNN is that it also accurately predicts the states of velocity ẋ and
as the backend engine in Keras. It offers flexible data flow architecture nonlinear restoring force g, as illustrated by the regression analysis in
enabling high-performance training of various types of neural networks Fig. 5(a) and (c). Fig. 5(b) shows examples of predicted time histories of
across a variety of platforms (CPUs, GPUs, TPUs). The simulations are ẋ and g using PhyCNN indicating a good agreement with the ground
performed on a standard PC with 28 Intel Core i9-7940X CPUs and 2 truth. The predicted nonlinearity is given in Fig. 6 which shows an
NVIDIA GTX 1080 Ti video cards. It is noted that, for all case studies example of the predicted hysteresis of the normalized nonlinear re-
presented in this paper, it takes about 20 min for model training and storing force versus displacement and velocity.
less than 0.01 s for inference. The data and codes used in this paper will
be publicly available on GitHub at https://github.com/zhry10/PhyCNN
after the paper is published. 3.2. Case 2: available measurements of ẍ only
5
R. Zhang, et al. Engineering Structures 215 (2020) 110704
Fig. 4. Regression analysis in (a) and four examples of predicted displacements in (b) for unknown earthquakes using PhyCNN and CNN.
Fig. 5. Prediction performance of ẋ and g using PhyCNN: (a, c) regression analysis of ẋ and g respectively; (b) an example of predicted time history of ẋ and g.
4. Experimental validation of PhyCNN performance serviceability of the 6-story hotel building can be further analyzed
given new seismic inputs.
The PhyCNN architecture is further demonstrated using filed sen-
sing data. A 6-story hotel building in San Bernardino, CA, from the 4.1. 6-Story Hotel Building in San Bernardino, CA
Center for Engineering Strong Motion Data (CESMD), is selected and
investigated [65]. The deep PhyCNN model illustrated in Fig. 1 was The 6-story hotel building in San Bernardino, California (CA) is a
trained for the instrumented building with ground accelerations as mid-rise concrete building designed in 1970 with a total of nine ac-
input and the structural displacements as output. However, the mea- celerometers installed on the 1st floor, 3rd and roof floors in both di-
surement data used to train the PhyCNN are the acceleration time rections. The sensors, with their locations shown in Fig. 10, have re-
histories only (referred to the modified PhyCNN in Fig. 7). A data corded multiple seismic events in history from 1987 to 2018. Table 1
clustering technique is proposed to partition the data sets for training, summarizes a total of 23 available datasets on CESMD used in this
validation and prediction. Based on the trained PhyCNN model, the example. The historically recorded data is then used to train the
6
R. Zhang, et al. Engineering Structures 215 (2020) 110704
Fig. 6. Predicted hysteresis of nonlinear restoring force versus displacement (left) and nonlinear restoring force versus velocity (right).
PhyCNN. The trained surrogate model is then used to predict structural Both training and validation datasets are considered as known where
displacement time histories given new ground motions and develop a both input and output are fully given during the training process, while
fragility function for serviceability assessment of the building. prediction dataset is considered as unknown where only ground ac-
Selecting training/validation datasets plays a critical role in deep celeration is given. The historical data is clustered based on both input
learning. Commonly, the database is divided into training, validation excitations and output structural responses. Fig. 11(a) shows the re-
and prediction datasets randomly (e.g., with ratios of 70%, 15%, 15%, lationship of peak ground acceleration (PGA) versus the peak structural
respectively). In the case when the database is very limited, dataset displacement for all datasets in logarithmic scale. It can be seen that the
partition could have a significant influence on the generalizability of peak structural displacements of most samples are less than 1 cm as
the trained model. To extensively utilize the limited sensing data in this shown in the green region, which is considered as the boundary of in-
study, an unsupervised learning technique based on the K-means al- terest for training. The other two samples in the yellow region which
gorithm is proposed for data clustering. To better illustrate the concept, yield large displacement under Northridge and Landers earthquakes,
the 6-story hotel building in San Bernardino, CA is taken as an example. are used to test the performance of the trained PhyCNN model under a
larger level of response whose information might not be fully covered
during training process.
4.2. K-means clustering The 21 samples within the boundary of interest (green region) are
partitioned into several clusters using the K-means algorithm [66]
Before clustering, the raw sensing measurements with different which is a popular data mining approach in the unsupervised learning
sampling rates and high-frequency noise were first preprocessed. The setting that groups datasets into a certain number of clusters. It starts
measured accelerations were passed through a 2-pole Butterworth high- with the random selection of a set of k cluster centroids, e.g.,
pass filter with a cutoff frequency of 0.1 Hz to remove the low-fre- C = c1, c2, …, ck . Next, each observation is assigned to the cluster, whose
quency behavior. The displacement time-series are further obtained mean has the least squared Euclidean distance, given by
from the high-pass filtered accelerations and used to model the in-
put–output (ground acceleration-structural displacements) relationship. arg min ci∈ C dist (ci, x )2 (10)
The entire historical datasets summarized in Table 1 are divided into
training, validation and prediction datasets using the K-means clus- where dist calculates the Euclidean distance. The new centroid is then
tering and the convex envelope technique discussed in the following. determined by taking the mean of all the observations assigned to that
Fig. 7. The modified PhyCNN for structural displacement prediction without displacement measurements for training. The only available measurements are the
structural accelerations which are used to train the PhyCNN model.
7
R. Zhang, et al. Engineering Structures 215 (2020) 110704
Fig. 8. Prediction performance of structural acceleration ẍ using PhyCNN: (a) regression analysis; (b) examples of predicted time history of ẍ .
cluster as shown in Eq. (11): the proposed PhyCNN architecture both within and out of the boundary
of interest. A summary of the datasets for training, validation and
1 prediction purposes is illustrated in Fig. 11(c). The training and pre-
ci =
|Si |
∑ xi
xi ∈ Si (11) diction performance for this 6-story hotel building is presented in the
following subsection.
where Si is the set of all observations assigned to the ith cluster. The
algorithm converges when the cluster assignments no longer change. To
determine the optimal number of clusters k for the given observations, 4.3. Predicted displacements using PhyCNN
the elbow method is used which calculates the distortions for different
numbers of clusters [67]. As shown in Fig. 12, the optimal value for k is The training and validation datasets discussed above are used to
determined as k = 4 where adding another cluster doesn’t give much train the PhyCNN for the 6-story hotel building in San Bernardino, CA,
better modeling of data. The limitation of this method is that it cannot consisting of 11 and 4 samples respectively, each of which contains
always be unambiguously identified. In such cases, other approaches sequences (with 7,200 data points) of ground motion accelerations as
such as Silhouette [68,69] and Cross-validation [70] can be used to find input and the story accelerations as the measurement data. During
the optimal numbers of clusters, which will be investigated in the future training, the training datasets are fed into the PhyCNN architecture
work. used in the previous numerical example in Section 3.2 with accelera-
The 21 samples in the green region are divided into four clusters tions as the only field measurements. The trained PhyCNN model is
shown in Fig. 11(b). A total number of 11 datasets are selected for then used to predict structural displacements under new earthquakes.
training by picking up the datasets on the convex envelope which de- By simply feeding a new ground acceleration into the trained PhyCNN
fines the boundary of interest, as well as the datasets closest to the model, it accurately predicts the structural displacements under that
cluster centroids. The validation datasets can be determined by ran- excitation. Fig. 13 shows the predicted story displacements of the 3rd
domly and evenly picking from each cluster. In this study, since the floor and roof for Big Bear Lake 2014 and Loma Linda 2016 earth-
datasets in Cluster 2 and Cluster 4 are insufficient, the validation da- quakes. It can be clearly seen that the PhyCNN prediction matches the
tasets are selected from Cluster 1 and Cluster 3 (two for each). The rest historical sensing data very well for earthquakes with different mag-
6 datasets plus the 2 datasets out of the boundary (in yellow region) are nitudes and frequency contents. To better illustrate the prediction error,
considered as the prediction datasets to demonstrate the performance of the probability density function (PDF) of the normalized error
Fig. 9. Prediction performance of structural displacement x using PhyCNN: (a) regression analysis; (b) examples of predicted time history of x.
8
R. Zhang, et al. Engineering Structures 215 (2020) 110704
Fig. 10. Sensor layout of the 6-story hotel in San Bernardino, California (Station Number: 23287) (http://www.strongmotioncenter.org/).
distribution defined in Eq. (12) is presented in Fig. 14. It can be seen earthquakes [71]. For serviceability assessment, the fragility function
that the prediction error is mainly located within 5% for the 3rd floor can be used to describe the probability of exceedance of the service-
and roof with a confidence interval (CI) of 97% and 93%, respectively. ability limit state for a specific earthquake intensity measure (IM). The
This demonstrates the high prediction accuracy of the proposed probability of exceeding a given damage level (DL) is defined as a cu-
PhyCNN approach. mulative lognormal distribution function as follows
ytrue − ypredict ⎫ ln (x / μ) ⎤
P = PDF ⎧ P (DL IM = x ) = Φ ⎡
⎨ ⎢ β ⎥
⎩ max(|ytrue |) ⎬ ⎭ (12) ⎣ ⎦ (13)
The extrapolation ability of the proposed PhyCNN is further verified where P (DL IM = x ) denotes the probability that a ground motion with
using the two samples out of the boundary interest (in the yellow region IM = x exceeds a given performance level (e.g., serviceability limit
as shown in Fig. 11(a)). Fig. 15 shows the predicted structural dis- state); Φ(·) is the standard normal cumulative distribution function
placements under Northridge 1994 and Landers 1992 earthquakes. It is (CDF); μ is the median of the fragility function (the IM level with 50%
observed that the proposed PhyCNN model is able to well predict probability of exceeding the given DL); and β is the standard deviation
structural responses for larger earthquakes, which offers confidence in of the natural logarithm of the IM which describes the variability for
applying the proposed method for building serviceability or fragility structural damage states.
assessment. To calibrate the fragility function, we need to estimate the para-
meters μ and β . Shinozuka et al. [72,73] estimated μ and β using the
maximum likelihood estimation (MLE), denoted by L (·) . In the MLE
4.4. Seismic serviceability analysis of the building approach, the damage state is related to a Bernoulli random variable. If
the limit state is reached, yi is set as 1; otherwise, yi = 0 . The likelihood
The trained PhyCNN model is used as a surrogate model for struc- function is given by
tural seismic response prediction which can be further employed to
N y 1 − yi
develop fragility functions based on certain limit states for seismic ln(x i / μ) ⎤ i ⎡ ln(x i / μ) ⎤ ⎤
L (μ, β ) = ∏ Φ⎡ 1 − Φ⎡
serviceability analysis. The use of limit states in seismic risk assessment ⎢ β ⎥ ⎢ ⎢ β ⎥ ⎥
(14)
i=1 ⎣ ⎦ ⎣ ⎣ ⎦⎦
reflects the vulnerability of the building structures against earthquakes.
The serviceability limit state, as one of the limit states, indicates the where Π denotes a product over N earthquake ground motions. Using
structural performance under operational service conditions and aims an optimization algorithm, the two parameters μ and β can be obtained
to minimize any future structural damage due to relatively low when the likelihood function in the logarithmic space is maximized.
Table 1
Historical data information.
Ind. Earthquake Epicenter Distance (km) PGA(g) Peak Floor Disp. Ind. Earthquake Epicenter Distance PGA (g) Peak Floor Disp.
(cm) (km) (cm)
Training Dataset
1 Borrego Springs 2010 102.5 0.024 0.406 7 Beaumont 2011 22.6 0.028 0.058
2 Devore 2015 18.6 0.054 0.319 8 Lahabra 2014 60.7 0.024 0.181
3 Fontana 2014 17.3 0.034 0.102 9 Loma Linda 2017 4.8 0.025 0.047
4 Inglewood 2009 99.4 0.008 0.052 10 Ontario 2011 27.8 0.004 0.015
5 Ocotillo 2010 197.3 0.007 0.135 11 Yorba Linda 2012 50.3 0.003 0.021
6 San Bernardino 2009 5.1 0.094 0.852
Validation Dataset
12 Banning 2016 38.2 0.019 0.102 14 Redlands 2010 11.5 0.019 0.145
13 Banning 2010 38.9 0.005 0.017 15 Trabuco Canyon 2018 40.9 0.01 0.04
Prediction Dataset
16 Beaumont 2010 28.1 0.005 0.026 20 Devore 2012 23.5 0.01 0.052
17 Big Bear Lake 2014 33.4 0.011 0.107 21 Loma Linda 2013 4.6 0.012 0.058
18 Fontana 2015 15.5 0.01 0.034 22 Northridge 1994 117.4 0.07 2.67
19 Loma Linda 2016 6.9 0.008 0.034 23 Landers 1992 79.9 0.08 9.38
9
R. Zhang, et al. Engineering Structures 215 (2020) 110704
Fig. 11. Data clustering using K-means clustering: (a) overview of historical datasets; (b) illustration of clusters (k = 4 ) and convex envelope; (c) training/validation/
prediction datasets determined based on K-means algorithm and convex envelope.
for serviceability assessment. Typical values of the drift angle for ser-
viceability check lie in the range of [1/600, 1/100] for different
building types and materials [76]. In this paper, the threshold of the
serviceability limit state is defined as 0.5% for the maximum drift angle
under the earthquake with a 10% probability of exceedance in a 50-
year period. For serviceability assessment of a building, the structural
responses can be predicted under a group of new ground motions using
the trained PhyCNN model. Thus, the fragility function is obtained
using Eq. (13) based on the serviceability limit state.
The serviceability of the 6-story hotel building in San Bernardino,
CA is assessed based on the trained PhyCNN model described in Section
4.3. A suite of 100 ground motion records is input to the trained
PhyCNN model to predict the structural displacements in an incre-
Fig. 12. Identification of optimal number of clusters at the Elbow point. mental dynamic analysis (IDA) setting. The 100 earthquake ground
motions are selected from the PEER strong motion database [64] in the
The serviceability assessment is conducted based on the perfor- area of San Bernardino with a 10% probability of exceedance in
mance-based engineering method. According to ASCE/SEI 41-06 stan- 50 years. The mean response spectrum of the selected ground motion
dard [74], the building performance levels include operational, im- records matches the design spectrum of the 6-story hotel building.
mediate occupancy, life safety, and collapse prevention. Both the Fig. 16 shows the predicted displacements under two example new
operational and immediate occupancy performance level can be con- earthquakes. All the predicted displacements are then used to de-
sidered as the serviceability limit state. ASCE/SEI 41–06 also provides termine the fragility function with respect to the serviceability limit
the recommended values for the maximum inter-story drift for each state. The fragility curve of the serviceability limit state is obtained
performance level and type of the structure. An alternative standard for based on Eq. (13) and Eq. (14) and shown in Fig. 17. It is seen that the
fragility analysis is HAZUS [75], in which the damage states of both probabilities of exceeding the serviceability limit state are around 47%,
structural and nonstructural components are defined. However, the 78%, and 90% for future earthquakes with PGA of 0.1 g, 0.2 g, and
inter-story drifts are typically unavailable due to the limitation of the 0.3 g, respectively. It is noted that the data-driven fragility curve can
sensor locations. Instead, the drift angle, defined as the ratio of the story provide valuable information to guide the design of maintenance and
deflection to story height, can be calculated and used as the threshold rehabilitation strategies for the building.
10
R. Zhang, et al. Engineering Structures 215 (2020) 110704
Fig. 14. Error distribution of the prediction datasets using the proposed PhyCNN model.
Fig. 15. Prediction performance of the proposed PhyCNN model under larger seismic intensities.
Fig. 16. Predicted structural responses under two example earthquakes using PhyCNN.
5. Conclusions available physics (e.g., the law off dynamics) that can provide con-
straints to the network outputs, alleviate overfitting issues, reduce the
This paper presents a novel physics-guided convolutional neural need of big training datasets, and thus improve the robustness of the
network (PhyCNN) architecture to develop data-driven surrogate trained model for more reliable prediction. The performance of the
models for modeling/prediction of seismic response of building struc- proposed approach was illustrated by both numerical and experimental
tures. The deep PhyCNN model includes several convolution layers and examples with limited datasets either from simulations or field sensing.
fully-connected layers to interpret the data, a graph-based tensor dif- The results show that the proposed deep PhyCNN model is an effective,
ferentiator, and physics constraints. The key concept to leverage reliable and computationally efficient approach for seismic structural
11
R. Zhang, et al. Engineering Structures 215 (2020) 110704
2015;105(5):2411–9.
[13] Mordret A, Sun H, Prieto GA, Toksöz MN, Büyüköztürk O. Continuous monitoring of
high-rise buildings using seismic interferometry. Bull Seismol Soc Am
2017;107(6):2759–73.
[14] Sun H, Mordret A, Prieto GA, Toksöz MN, Büyüköztürk O. Bayesian characteriza-
tion of buildings using seismic interferometry on ambient vibrations. Mech Syst
Signal Process 2017;85:468–86.
[15] Sjöberg J, Zhang Q, Ljung L, Benveniste A, Delyon B, Glorennec P-Y, et al. Nonlinear
black-box modeling in system identification: a unified overview. Automatica
1995;31(12):1691–724.
[16] Braun JE, Chaturvedi N. An inverse gray-box model for transient building load
prediction. HVAC&R Res 2002;8(1):73–99.
[17] Moaveni B, Conte JP, Hemez FM. Uncertainty and sensitivity analysis of damage
identification results obtained using finite element model updating. Comput-Aided
Civil Infrastruct Eng 2009;24(5):320–34.
[18] Belleri A, Moaveni B, Restrepo JI. Damage assessment through structural identifi-
cation of a three-story large-scale precast concrete structure. Earthquake Eng Struct
Dynam 2014;43(1):61–76.
[19] Yousefianmoghadam S, Behmanesh I, Stavridis A, Moaveni B, Nozari A, Sacco A.
System identification and modeling of a dynamically tested and gradually damaged
Fig. 17. Predicted fragility curve of the serviceability limit state for the 6-story 10-story reinforced concrete building. Earthquake Eng Struct Dynam
hotel building in San Bernardino using PhyCNN. 2018;47(1):25–47.
[20] Sohn H, Farrar CR, Hemez FM, Shunk DD, Stinemates DW, Nadler BR, et al. A
review of structural health monitoring literature. USA: Los Alamos National
response modeling. The trained model can further serve as a basis for Laboratory; 1996-2001.
developing fragility function for building serviceability assessment. [21] Kerschen G, Worden K, Vakakis AF, Golinval J-C. Past, present and future of non-
linear system identification in structural dynamics. Mech Syst Signal Process
Overall, the proposed algorithm is fundamental in nature which is 2006;20(3):505–92.
scalable to other structures (e.g., bridges) under other types of hazard [22] Wu R-T, Jahanshahi MR. Data fusion approaches for structural health monitoring
events. and system identification: past, present, and future. Struct Health Monit 2018.
1475921718798769.
[23] Brownjohn JM, Xia P-Q. Dynamic assessment of curved cable-stayed bridge by
Declaration of Competing Interest model updating. J Struct Eng 2000;126(2):252–60.
[24] Yuen K-V, Katafygiotis LS. Model updating using noisy response measurements
without knowledge of the input spectrum. Earthquake Eng Struct Dynam
None. 2005;34(2):167–87.
[25] Weber B, Paultre P. Damage identification in a truss tower by regularized model
Acknowledgement updating. J Struct Eng 2009;136(3):307–16.
[26] Song W, Dyke S. Real-time dynamic model updating of a hysteretic structural
system. J Struct Eng 2013;140(3):04013082.
This work was supported by the Engineering for Civil Infrastructure [27] Sun H, Betti R. A hybrid optimization algorithm with bayesian inference for
program at National Science Foundation under grant CMMI-2013067, probabilistic model updating. Comput-Aided Civil Infrastruct Eng
2015;30(8):602–19.
and the TIER 1 Seed Grant program at Northeastern University, which
[28] Skolnik D, Lei Y, Yu E, Wallace JW. Identification, model updating, and response
are greatly acknowledged. The data and codes used in this paper will be prediction of an instrumented 15-story steel-frame building. Earthquake Spectra
publicly available on GitHub at https://github.com/zhry10/PhyCNN 2006;22(3):781–802.
after the paper is published. [29] Fishwick PA. Neural network models in simulation: a comparison with traditional
modeling approaches. Proceedings of the 21st conference on Winter simulation.
ACM; 1989. p. 702–9.
Appendix A. Supplementary material [30] Ediger VŞ, Akar S. Arima forecasting of primary energy demand by fuel in turkey.
Energy Policy 2007;35(3):1701–8.
[31] Irie B, Miyake S. Capabilities of three-layered perceptrons. in: IEEE international
Supplementary data associated with this article can be found, in the conference on neural networks 1988; Vol. 1, p. 218.
online version, at https://doi.org/10.1016/j.engstruct.2020.110704. [32] Hornik K. Approximation capabilities of multilayer feedforward networks. Neural
Networks 1991;4(2):251–7.
[33] Chen S, Billings S. Neural networks for nonlinear dynamic system modelling and
References identification. Int J Control 1992;56(2):319–46.
[34] Tianping C, Hong C. Approximations of continuous functions by neural networks
[1] Yuen K-V. Bayesian methods for structural dynamics and civil engineering. John with application to dynamic system. IEEE Trans Neural Networks 1993;4(6):910–8.
Wiley & Sons; 2010. [35] Chen T, Chen H. Approximation capability to functions of several variables, non-
[2] Yuen K-V. Updating large models for mechanical systems using incomplete modal linear functionals, and operators by radial basis function neural networks. IEEE
measurement. Mech Syst Signal Process 2012;28:297–308. Trans Neural Networks 1995;6(4):904–10.
[3] Yuen K-V, Kuok S-C. Efficient bayesian sensor placement algorithm for structural [36] Zhang J, Sato T, Iai S. Novel support vector regression for structural system iden-
identification: a general approach for multi-type sensory systems. Earthquake Eng tification. Struct Control Health Monit 2007;14(4):609–26.
Struct Dynam 2015;44(5):757–74. [37] Yinfeng D, Yingmin L, Ming L, Mingkui X. Nonlinear structural response prediction
[4] Sun H, Büyüköztürk O. Probabilistic updating of building models using incomplete based on support vector machines. J Sound Vib 2008;311(3–5):886–97.
modal data. Mech Syst Signal Process 2016;75:27–40. [38] Lightbody G, Irwin GW. Multi-layer perceptron based modelling of nonlinear sys-
[5] Yan G, Sun H, Büyüköztürk O. Impact load identification for composite structures tems. Fuzzy Sets Syst 1996;79(1):93–112.
using Bayesian regularization and unscented Kalman filter. Struct Control Health [39] Ying W, Chong W, Hui L, Renda Z. Artificial neural network prediction for seismic
Monit 2017;24(5):e1910. response of bridge structure. In: 2009 International conference on artificial in-
[6] Chen Z, Zhang R, Zheng J, Sun H. Sparse Bayesian learning for structural damage telligence and computational intelligence, IEEE, 2009, 2009;2. p. 503–6.
identification. Mech Syst Signal Process 2020;140:106689. [40] Christiansen NH, Høgsberg JB, Winther O. Artificial neural networks for nonlinear
[7] Yang JN, Lin S, Huang H, Zhou L. An adaptive extended Kalman filter for structural dynamic response simulation in mechanical systems. In: NSCM-24; 2011.
damage identification. Struct Control Health Monit 2006;13(4):849–67. [41] Huang C, Hung S, Wen C, Tu T. A neural network approach for structural identi-
[8] Wu M, Smyth AW. Application of the unscented Kalman filter for real-time non- fication and diagnosis of a building from seismic response data. Earthquake Eng
linear structural system identification. Struct Control Health Monit Struct Dynam 2003;32(2):187–206.
2007;14(7):971–90. [42] Lagaros ND, Papadrakakis M. Neural network based prediction schemes of the non-
[9] Xie Z, Feng J. Real-time nonlinear structural system identification via iterated un- linear seismic response of 3d buildings. Adv Eng Softw 2012;44(1):92–115.
scented Kalman filter. Mech Syst Signal Process 2012;28:309–22. [43] Mandic DP, Chambers J. Recurrent neural networks for prediction: learning algo-
[10] Nakata N, Snieder R, Kuroda S, Ito S, Aizawa T, Kunimi T. Monitoring a building rithms, architectures and stability. John Wiley & Sons Inc; 2001.
using deconvolution interferometry. i: Earthquake-data analysis. Bull Seismol Soc [44] Medsker LR, Jain L. Recurrent neural networks. Des Appl, 5.
Am 2013;103(3):1662–78. [45] Yu Y, Yao H, Liu Y. Aircraft dynamics simulation using a novel physics-based
[11] Nakata N, Snieder R. Monitoring a building using deconvolution interferometry. ii: learning method. Aerosp Sci Technol 2019;87:254–64.
Ambient-vibration analysis. Bull Seismol Soc Am 2013;104(1):204–13. [46] Zhang R, Chen Z, Chen S, Zheng J, Büyüköztürk O, Sun H. Deep long short-term
[12] Nakata N, Tanaka W, Oda Y. Damage detection of a building caused by the 2011 memory networks for nonlinear structural seismic response prediction. Comput
Tohoku-oki earthquake with seismic interferometry. Bull Seismol Soc Am Struct 2019;220:55–68.
12
R. Zhang, et al. Engineering Structures 215 (2020) 110704
[47] Cha Y-J, Choi W, Büyüköztürk O. Deep learning-based crack damage detection machine learning and applications (ICMLA). IEEE; 2017. p. 207–14.
using convolutional neural networks. Comput-Aided Civil Infrastruct Eng [61] Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a
2017;32(5):361–78. simple way to prevent neural networks from overfitting. J Mach Learn Res
[48] Atha DJ, Jahanshahi MR. Evaluation of deep learning approaches based on con- 2014;15(1):1929–58.
volutional neural networks for corrosion detection. Struct Health Monitor [62] Chollet F, et al., Keras; 2015.
2018;17(5):1110–28. [63] Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, et al. Tensorflow: large-
[49] Sun S-B, He Y-Y, Zhou S-D, Yue Z-J. A data-driven response virtual sensor technique scale machine learning on heterogeneous distributed systems, arXiv preprint
with partial vibration measurements using convolutional neural network. Sensors arXiv:1603.04467.
2017;17(12):2888. [64] Chiou B, Darragh R, Gregor N, Silva W. Nga project strong-motion database.
[50] Wu R-T, Jahanshahi MR. Deep convolutional neural network for structural dynamic Earthquake Spectra 2008;24(1):23–44.
response estimation and system identification. J Eng Mech 2018;145(1):04018125. [65] Haddadi H, Shakal A, Stephens C, Savage W, Huang M, Leith W, et al. Center for
[51] Raissi M. Deep hidden physics models: deep learning of nonlinear partial differ- engineering strong-motion data (cesmd). Proceedings of the 14th world conference
ential equations. J Mach Learn Res 2018;19(1):932–55. on earthquake engineering. 2008.
[52] Raissi M, Perdikaris P, Karniadakis GE. Physics-informed neural networks: a deep [66] Ng A. Clustering with the k-means algorithm. Mach Learn.
learning framework for solving forward and inverse problems involving nonlinear [67] Kodinariya TM, Makwana PR. Review on determining number of cluster in k-means
partial differential equations. J Comput Phys 2019;378:686–707. clustering. Int J 2013;1(6):90–5.
[53] Sun L, Gao H, Pan S, Wang JX. Surrogate modeling for fluid flows based on physics- [68] Pollard KS, Van Der Laan MJ. A method to identify significant clusters in gene
constrained deep learning without simulation data, arXiv preprint arXiv:1906. expression data.
02382. [69] Kaufman L, Rousseeuw PJ. Finding groups in data: an introduction to cluster ana-
[54] Zhu Y, Zabaras N, Koutsourelakis P-S, Perdikaris P. Physics-constrained deep lysis Vol. 344. John Wiley & Sons; 2009.
learning for high-dimensional surrogate modeling and uncertainty quantification [70] Smyth P. Clustering using monte carlo cross-validation. In: Kdd, Vol. 1; 1996. p.
without labeled data. J Comput Phys 2019;394:56–81. 26–133.
[55] Ciresan DC, Meier U, Masci J, Gambardella LM, Schmidhuber J. Flexible, high [71] Dymiotis-Wellington C, Vlachaki C. Serviceability limit state criteria for the seismic
performance convolutional neural networks for image classification. Twenty-second assessment of rc buildings. In: Proceedings of the 13th world conference on
international joint conference on artificial intelligence. 2011. earthquake engineering. Vancouver, Canada; 2004. p. 1–10.
[56] Lawrence S, Giles CL, Tsoi AC, Back AD. Face recognition: a convolutional neural- [72] Shinozuka M, Feng MQ, Kim H-K, Kim S-H. Nonlinear static procedure for fragility
network approach. IEEE Trans Neural Networks 1997;8(1):98–113. curve development. J Eng Mech 2000;126(12):1287–95.
[57] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521(7553):436. [73] Shinozuka M, Feng MQ, Lee J, Naganuma T. Statistical analysis of fragility curves. J
[58] Nair V, Hinton GE. Rectified linear units improve restricted boltzmann machines. Eng Mech 2000;126(12):1224–31.
Proceedings of the 27th international conference on machine learning (ICML-10). [74] Committee ASRS, et al., Seismic rehabilitation of existing buildings (asce/sei
2010. p. 807–14. 41–06), American Society of Civil Engineers, Reston, VA.
[59] Xu B, Wang N, Chen T, Li M. Empirical evaluation of rectified activations in con- [75] FEMA H. Multi-hazard loss estimation methodology, earthquake model,
volutional network, arXiv preprint arXiv:1505.00853. Washington, DC, USA: Federal Emergency Management Agency.
[60] Trottier L, Gigu P, Chaib-draa B, et al. Parametric exponential linear unit for deep [76] Griffis LG. Serviceability limit states under wind load. Eng J 1993;30(1):1–16.
convolutional neural networks. 2017 16th IEEE International conference on
13