0% found this document useful (0 votes)
20 views22 pages

Computer Aided Civil Eng - 2017 - Lin - Structural Damage Detection With Automatic Feature Extraction Through Deep Learning

This document presents a structural damage detection approach using deep learning. A convolutional neural network is designed to automatically extract features from sensor data and identify damage locations. The network learns features directly from raw sensor signals without requiring pre-defined features. This leads to excellent damage localization performance on both noise-free and noisy data, outperforming a method using wavelet packet component energy as the input feature. Visualizing the features learned by hidden layers shows they evolve from simple filters to representations of vibration modes, allowing the network to learn essential characteristics for damage detection.

Uploaded by

MOHAMAD ALHASAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views22 pages

Computer Aided Civil Eng - 2017 - Lin - Structural Damage Detection With Automatic Feature Extraction Through Deep Learning

This document presents a structural damage detection approach using deep learning. A convolutional neural network is designed to automatically extract features from sensor data and identify damage locations. The network learns features directly from raw sensor signals without requiring pre-defined features. This leads to excellent damage localization performance on both noise-free and noisy data, outperforming a method using wavelet packet component energy as the input feature. Visualizing the features learned by hidden layers shows they evolve from simple filters to representations of vibration modes, allowing the network to learn essential characteristics for damage detection.

Uploaded by

MOHAMAD ALHASAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Computer-Aided Civil and Infrastructure Engineering 32 (2017) 1025–1046

Structural Damage Detection with Automatic


Feature-Extraction through Deep Learning

Yi-zhou Lin & Zhen-hua Nie


School of Mechanics and Construction Engineering, Jinan University & Key Lab of Disaster Forecast and Control in
Engineering, Ministry of Education, Guangzhou, China

&

Hong-wei Ma*
Dongguan University of Technology, Dongguan, China & Department of Civil Engineering, Qinghai University, Xining,
China

Abstract: Structural damage detection is still a challeng- ics and civil engineering field (An et al., 2015; Mu
ing problem owing to the difficulty of extracting damage- and Yuen, 2016). In practice, a real bridge may be
sensitive and noise-robust features from structure re- equipped with multiple sensors to monitor its vibra-
sponse. This article presents a novel damage detection tion response. However, limited information can be ob-
approach to automatically extract features from low-level tained from time-domain data directly. Moreover, the
sensor data through deep learning. A deep convolutional form of excitation is almost impossible to get, giving
neural network is designed to learn features and iden- rise to the difficulty to accurately estimate frequency re-
tify damage locations, leading to an excellent localization sponse function (FRF) and vibration modes. As a re-
accuracy on both noise-free and noisy data set, in con- sult, the first and most important procedure is feature
trast to another detector using wavelet packet component extraction (Amezquita-Sanchez and Adeli, 2015), that
energy as the input feature. Visualization of the features is, the problems may become more accessible, when the
learned by hidden layers in the network is implemented low-level sensor signals transform into significant and
to get a physical insight into how the network works. It damage-sensitive features.
is found the learned features evolve with the depth from One way for feature extraction is computing dynamic
rough filters to the concept of vibration mode, implying signatures of the structure or designing their derivative
the good performance results from its ability to learn es- to obtain a higher sensitivity of damage. Subsequently,
sential characteristics behind the data. damage detection can be implemented by monitoring
the changes in these signatures. As a primary feature,
1 INTRODUCTION natural frequency was investigated first due to the con-
venience in measurement. Cawley and Adams (1979)
In recent years, the aging of large-scale structures started related studies by directly dealing with dam-
has become an unavoidable issue, especially for these age detection based on the changes in the natural fre-
large-scale bridges performing an irreplaceable func- quencies of the structure. However, Salawu (1997), pre-
tion in present-day society. Hence, ensuring the struc- senting an overview of the publications of this topic
tural integrity and detecting the structural damage have before 1997, suggested that natural frequency changes
attracted increasing attention by researchers in mechan- may be confused by multiple damages. Although the ap-
plication of natural frequency changes has been studied
∗ To whom correspondence should be addressed. E-mail: tmahw@ extensively, the damage detection methods based on
jnu.edu.cn.


C 2017 Computer-Aided Civil and Infrastructure Engineering.

DOI: 10.1111/mice.12313
14678667, 2017, 12, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12313 by Karadeniz Technical Universitaet, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1026 Lin, Nie & Ma

frequency changes still have several limitations, such as Blachowski et al. (2017) proposed a damage localiza-
the changes are usually so small to be buried by envi- tion method in truss structures using statistical signal
ronmental and operational conditions, and natural fre- processing. Another methodology in this category is
quencies seem “too general” for local damage. Unlike treating the damage detection task as a pattern recog-
natural frequencies, mode shapes contain more local in- nition problem based on time-series. Sohn and Farrar
formation, leading to a better sensitivity to local dam- (2001) proposed a damage detection approach by mod-
age. Besides, they are less sensitive to environmental eling the dynamic signal using autoregressive moving-
effects, such as the temperature, than natural frequen- average (ARMA) model and monitoring the changes
cies are (Farrar and Iii, 1997). As a result, these meth- in model coefficients. Cavadas et al. (2013) present a
ods that try to link damages and mode shapes or their data-driven method based on moving-load using mov-
derivatives have been developed. Pandey et al. (1991) ing principal component analysis and robust regression
first suggested that mode shape curvatures, the second analysis, which detects damage by statistical features
derivations of mode shapes, have a good performance from time-domain data. However, the features used
in damage localization. Dutta and Talukdar (2004) pre- in above-maintained studies may not be optimized for
sented the mode shape curvatures are more sensitive to damage, as a result of almost all of these methods are
damage location than natural frequencies are, resulting based on signal processing or statistics. They may also
in a better localization accuracy. The problem of the catch anomalies caused by other factors. Besides, even
susceptibility to measurement noise of the modal cur- an experienced researcher may spend a long time to test
vatures was also discussed (Cao et al., 2014). Another which statistic index or signal transformation, even their
derivative of mode shape is the modal strain energy, derivative, is sensitive to structural damage.
which is directly related to mode shape curvatures for On the other hand, artificial neural networks (ANN)
beam-like structure. Shi et al. (2000) proposed a dam- have been utilized in structural damage detection field
age localization method based on modal strain energy for many years (Adeli and Jiang, 2009). The features
change for beam, truss or frame type structures. These mentioned above are usually used as their inputs. Lee
dynamic signatures also are the foundation of finite el- et al. (2005) presented a neural network–based damage
ement (FE) model updating, playing a role as a met- detection method using the modal properties. Wavelet
ric to estimate the differences between FE model and neural network (WNN), an adaptation of ANN using
real structure (Shabbir and Omenzetter, 2015). Never- the concept of the wavelet, was employed to detect
theless, in practice, the unreliability of damage identifi- damage for high-rise buildings (Adeli and Jiang, 2005;
cation using noisy measurements from accelerometers is Jiang and Adeli, 2007). Mehrjoo et al. (2008) used the
still a problem (Adewuyi et al., 2009). In addition, from neural network to overcome the problems caused by
only response signals, it is difficult to precisely measure incompleteness of the measured data. Although par-
some of these signatures, such as FRF and vibration tial successes were obtained, the performances of neural
modes, giving rise to the requirement of an accurate FE networks are influenced by input features greatly, that
model, which is also hard to be established. is, the drawbacks in input features, such as susceptibil-
In contrast to the approaches cited above, another ity to noise, may forbid the neural network to get better
methodology, called “Data-driven,” extracts damage- performance.
sensitive features via only concentrating on the pertur- To address these limitations, this article uses the
bation in sensor data instead of requiring a model of convolutional neural network (CNN), a type of neu-
the structure. Some papers in this category used mod- ral network in deep learning field, as a feature ex-
ern signal processing methods (Amezquita-Sanchez and tractor and a classifier to detect structural damage.
Adeli, 2016; Huang et al., 2014) and tried to catch a Using this method, it is possible to detect damage
sudden change in signals caused by the occurrence of from waveform signal directly without any hand-crafted
damage. Hou et al. (2000) analyzed the characteristics features.
of response signals under the wavelet transformation Deep learning is these methods that learning repre-
and found the wavelet coefficients can clearly show the sentations of data with multiple levels of abstraction
moment when damage occurs. Law et al. (2005) pro- via recomposing of multiple processing layers (LeCun
posed a concept of wavelet packet transform compo- et al., 2015). A CNN can make use of a large amount of
nent energy and its sensitivity to local change in the sys- data, which means the ability to improve itself contin-
tem parameters was discussed. Another modern signal uously with more data. As a result, they have become
processing method, Hilbert-Huang transformation, was pivotal methods of the concept of “big data.” State of
also introduced in structural health monitoring (Yang the art in various fields has been renewed by using
et al., 2004). In addition, the fusion of these signal deep learning algorithms in recent years, for example,
processing methods was discussed by Li et al. (2017). in speech recognition, visual object recognition, object
14678667, 2017, 12, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12313 by Karadeniz Technical Universitaet, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Structural damage detection with automatic feature-extraction through deep learning 1027

detection, and many other domains. Especially, the Detector with


wavelet packet
CNN is most widely utilized in applications of speech Data pre-process component energy Detection accuracy
Numerical
recognition and computer vision. It is designed to pro- simulation
& for signal damage
Data augmentation location
cess data that are in the form of multiple arrays, such Convolutional
neural network
as one-dimensional (1-D) for time-sequences and 2-
D for signal data with multiple channels and images. Hidden layer
Detection of
multiple damage
LeCun et al. (1989) proposed the concept of CNN and visualization
and damage degree
successfully applied it to handwritten zip code recog-
nition task. More recently, Krizhevsky et al. (2012) re- Fig. 1. The overview of the methodology.
ported a large deep CNN for image classification to par-
ticipate in the ImageNet LSVRC-2010 contest, and it
outperformed the previous state of the art by reduc- It implies that the neural network can learn features
ing the error rate by 10%. Simonyan and Zisserman of the structure automatically only from response data.
(2014) achieved new state-of-the-art results by utiliz- Its findings, the structural modes, are helpful for dam-
ing a deeper CNN. To dealing with the audio data, age detection task, which is also consistent with me-
Lee et al. (2009) applied CNN in the audio classifica- chanical knowledge. Due to this promising property,
tion task. Abdel-Hamid et al. (2012) used CNN concept this method has the potential to be used in more com-
in hybrid NN–HMM model, which is a traditional ro- plex problems, in which hand-crafted features are hard
bust method framework for speech recognition, and it to apply.
also renewed the state of the art in this problem. In It is noteworthy that these findings are based on the
structural damage detection field, visual-based crack de- data from a numerical model with different damage lev-
tection is an important topic (Chen et al., 2017), and els and locations. However, this condition is hard to be
CNN also achieves a good performance (Cha et al., satisfied in practice. Further works are still needed to
2017). The literature review mentioned above shows achieve real-world application with this method.
that the CNNs have a great potential to deal with these
problems taking low-level sensor data as input and re-
quiring better features to achieve the final goal, just 2 METHODOLOGY
like the structural damage detection is. It can learn hi-
erarchical features automatically from examples in a This article presents an approach to detect structural
data-specific way, an excellent property to tackle our damage with directly inputting low-level waveform
issue. signals. It can automatically extract features from
This study presents a new approach for structural the signals. In practice, low-level waveform signals are
damage detection using a CNN with customized archi- processed by signal processing, structural model estima-
tecture. As a pilot study, a numerical model of simply tion, or other hand-crafted feature-extraction methods.
supported Euler–Bernoulli beam is considered. Some Then the representations of structure condition can
new techniques of deep learning proposed by recent be obtained. However, there are some drawbacks in
studies are used to improve its performance, resulting in this methodology as shown in Section 1. In this article,
an excellent accuracy, which is better than using wavelet to address these issues, a CNN is designed to process
packet transform component energy (Law et al., 2005) low-level sensor data, to learn features, and to achieve
as the input feature. damage detection simultaneously. Although damage
However, neural networks are known as black boxes, localization is mainly discussed in this work, the ability
that is, it is hard to understand how they work. To ad- to detect multiple damages and corresponding damage
dress this issue, this article conducts the visualization of degree is also investigated.
hidden layers in the network by maximizing the activa- The overview of the proposed method, also shown
tion of the target layer on the input. It is found that these in Figure 1, is organized as follows: (1) numerical sim-
features become more abstract layer by layer. What is ulation is conducted to obtain structural responses data;
more, they are understandable for humans and have a (2) the data are processed by preprocessing procedures
mechanical meaning to some extent. For instance, the including data augmentation, an operation to facilitate
layers in bottom play a role as band pass filters, which performance of the neural network by creating more
center in the nature frequencies of the structure. Multi- data; (3) a deep CNN, as presented in Figure 2, is
band filters can be found in the middle layers. When the trained on the augmented data set; (4) for comparison,
layers located deeper, it can be discovered that the net- another detector using wavelet packet component en-
work even has learned the concept of structural mode ergy as input feature is also trained; (5) their classi-
independently. fication performances of single damage localization is
14678667, 2017, 12, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12313 by Karadeniz Technical Universitaet, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1028 Lin, Nie & Ma

Fig. 2. Convolutional neural network architecture.

Table 1
The model natural frequency and damping ratio

Modal order 1st 2nd 3rd 4th 5th


Freq./Hz 5.79 23.16 52.13 92.77 145.29
Damp./% 1.39 0.39 0.27 0.29 0.37
Fig. 3. The numerical model of the simply supported beam.

estimated by a test procedure; (6) the application of the The material properties are listed as follows: the Young
CNN to multiple damages detection is also discussed; modulus is 206 GPa and the density is 7,900 kg/m3 .
and (7) to find a mechanical interpretation of how the Rayleigh damping is used to simulate the damping ef-
neural network works, visualization of internal repre- fect, as shown in Equation 1
sentations of the network is carried out.
C = αK + β M (1)
where α and β used in this article are 7 × 10−6 s and
2.1 Numerical simulation
1.0 s−1 , respectively, resulting in a weak damping. De-
A simply supported beam with a length of 10 m, sub- tailed dynamical properties of the numerical model, in-
jected to burst random excitation, is considered in this cluding natural frequencies and damping ratios of the
study. It is simple so that its dynamic characteristics are first five orders, is shown in Table 1.
clear, leading to convenience in validating the method- Damage is simulated by decreasing height of the tar-
ology with our knowledge, such as finding comprehen- get beam element. There are six damage conditions, in-
sible features form the hidden layers of the CNN. cluding five damage levels and an intact condition. Five
The beam has a rectangular cross-section with 0.1 m levels of damage are prepared by reducing the bending
width and 0.25 m height, and it is divided into 10 equal rigidity of the target element by 10%, 20%, 30%, 40%,
Euler–Bernoulli beam elements, as shown in Figure 3. and 50%, respectively, corresponding to reducing the
14678667, 2017, 12, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12313 by Karadeniz Technical Universitaet, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Structural damage detection with automatic feature-extraction through deep learning 1029

Table 2 cess, therefore, is required to reduce the final sampling


The relative variation ratio of natural frequencies when frequency. The original computation results are down-
element no. 4 is damaged sampled to 1,024 Hz, and only vertical accelerations
Modal order from no. 1 to no. 9 node are measured, resulting in a
Damage data shape for 9 × 3,072 in each measurement.
level 1st 2nd 3rd 4th 5th On account of the nine positions for loading (node
no. 1 to no. 9), 10 positions for damage (element no. 0
10% −0.38% −0.71% −0.36% −0.09% −0.36%
50% −3.58% −5.76% −2.63% −0.92% −2.59% to no. 9), and six conditions for damage (one intact con-
dition and five damage levels), the number of combina-
tions is 459 (9 load positions × (10 damage locations ×
5 damage levels + 1 intact condition)), because the
height of the element to 96.5%, 92.8%, 88.8%, 84.3%, damage location makes no sense for the intact condi-
and 79.4%. Table 2 exhibits natural frequencies change tion. To generate a large enough data set, 15 calcula-
slightly when the element no. 4 is damaged with level tions are carried out for each configuration; finally, the
10% and 50% (the bending rigidity of element no. 4 is size of the whole data set is 6,885 (459 × 15).
90% and 50% of the intact condition, respectively).
Random excitation, meaning the amplitude of the ex-
citation on each time step follows Gaussian distribu-
2.2 Data augmentation
tion, is applied on one of the nine nodes from no. 1 to
no. 9 to simulate environmental load. It has equal inten- Getting more data is always the best way to make a
sity at different frequencies, leading to the excitation of neural network model generalize better, although the
multiple structure modes. However, the environmental amount of data are usually limited in practice. One way
load in real world does not have a spectrum with infi- to tackle this issue is creating fake data based on the
nite bandwidth and the first several modes of a structure true data set and adding it to the training set, which the
normally predominate the structural response. To simu- term of “data augmentation” means.
late this phenomenon, a low-pass filter is applied on the Data augmentation is an efficient approach for dif-
random excitation. Concretely, an eighth-order Butter- ferent classification tasks. These state-of-the-art image
worth filter with normalized cutoff frequency 512 Hz is recognition systems also used data augmentation tech-
applied on 3-second-long random excitation with a stan- niques, such as randomly cropping a piece from the ori-
dard deviation for 200 N. gin image, randomly rotating for a small angle, horizon-
Newmark’s method (Newmark, 1959) is used to com- tal flipping, adjusting contrast, and so forth (Szegedy
pute the response. Appropriate parameters are selected et al., 2015; Wu et al., 2015). For speech recognition
to obtain a second-order accuracy with no numerical tasks, injecting noise into the input is also a good way of
damping. To get a reasonable accuracy, 2−13 s is chosen data augmentation to improve the robustness (Sietsma
as the time step. However, such a small time step leads and Dow, 1991).
to a large volume of data, inducing computational is- During the training a neural network, a data batch,
sues for subsequent procedures. A down-sampling pro- a subset of the whole data set, is usually utilized to

Fig. 4. Data augmentation procedure.


14678667, 2017, 12, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12313 by Karadeniz Technical Universitaet, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1030 Lin, Nie & Ma

Time Concretely, signals from all the sensors are corrupted


In p u t Kern el by adding white Gaussian noise. Noise level is deter-
mined by giving a standard deviation of white Gaussian
noise, which is injected into the normalized data set, as
Channels

a b c d w x
shown in Equation 3 and Equation 4,
e f g h y z D̂n = Dn + ξ (3)
 
ξ ∼ N 0, σ 2 (4)
Output
where D̂n is the noisy data set, and ξ is a random vari-
aw+bx bw+cx cw+dx
+ey+fz +fy+gz +gy+hz able follows a Gaussian distribution with zero mean and
σ 2 variance.
Convolution axis Here, σ is set to be 0.5, meaning the Gaussian dis-
tributed noise with 50% standard deviation is injected
Fig. 5. An example of 1-D convolution along time axis. into the data set. It is an efficient way to assess the
method by challenging such a high noise level.
calculate the gradient of the objective function with
respect to the weights in the neural network in the 2.4 Convolutional neural network
current iteration step. The idea of data augmentation
here is to identify valuable difficulties in every data The only difference between a CNN (LeCun, 1989) and
batch in the training process by randomly sampling a simple neural network is that the former uses convo-
pieces from origin signals. Specifically, if the size of a lution instead of general multiplication in at least one of
data batch is n, this procedure randomly chooses n mea- its layers. As a result, it makes CNNs specialized to pro-
surement data from the whole data set. Subsequently, cess the data having a known grid-like topology, such as
pieces are randomly sampled from each selected datum, image data for 2-D and time-series data for one dimen-
as illustrated in Figure 4. Here, the length of pieces is sion. Notice that structural responses data are also time
1,024, a power of two, leading to a better computational series to take advantage of the CNN.
efficiency. In other words, a part of the signal with 1- In this article, a CNN is designed for feature extrac-
second length is cut out from the whole 3 seconds time tion and damage localization task at once. Typically, a
sequence. It should be noticed that the data augmenta- convolutional network may consist of multiple convolu-
tion operation is carried out after a data normalization tional layers. Furthermore, a convolutional layer usually
process, as demonstrated in Equation 2, contains three stages: convolution, nonlinear activation,
and pooling. To achieve better performance, a straight-
forward and efficient method of regularization named
Do − Do
Dn = (2) “Batch Normalization” (BN) (Ioffe and Szegedy, 2015)
σ (Do ) is also employed. These normalization layers are in-
where Do is the origin data set, Do and σ (Do ) are the serted before the nonlinear activation stage in some
mean and standard deviation, respectively, and Dn is the convolutional layers, as shown in Figure 2. All these
normalized data set. components will be introduced in the following sections.
As illustrated in Figure 2, the particular architecture
of the network is demonstrated as follows: (1) a con-
2.3 Noise simulation volutional layer with nonlinear activation is followed by
In practical measurement or monitoring, noise is an another one with a normalization layer before nonlinear
inevitable factor, and the robustness to measurement activation, then a pooling layer is added; (2) the same
noise should be discussed for every structural damage substructure is replicated twice; (3) the output of previ-
detection method. ous architecture is flattened into a 1-D vector; (4) two
Sensor noise, normally caused by various environ- full connected (FC) layers are used to access the out-
mental factors, is simulated to estimate the robustness put layer. There are a total of 1,072,267 parameters and
of the proposed method. It decays the signal by reducing eight nonlinear hidden layers in this deep CNN.
the signal-to-noise ratio, leading to difficulties to iden-
tify valuable information. As a result, a neural network 2.4.1 Convolution and pooling. The convolution and
has to learn essential features against the interference pooling are the most important and unique operations
of noise. in a CNN.
14678667, 2017, 12, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12313 by Karadeniz Technical Universitaet, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Structural damage detection with automatic feature-extraction through deep learning 1031

(a) (b) (c)

Fig. 6. Sigmoid (a), ReLU (b), and leaky ReLU (c) (α = 0.1).

Generally, convolution is an operation on two real-


vk

valued functions, such as Equation 5.
F (i) = S (i + n) K (n) (7)
 ∞
n=1

f (i) = s(n)k(i − n) dn (5)


−∞ However, it is still called “convolution” in machine
In CNN terminology, the first function (for instance, the learning field. This article follows this convention. This
function s in Equation 5) is usually known as the in- adaptation makes sense because appropriate parame-
put and the second (e.g., the function k in Equation 5) ters in the kernel will be learned by the learning algo-
is known as the kernel or filter. Besides, the output here rithm whether the input is flipped or not.
is referred as feature map. Nevertheless, data processed Equation 7 shows the 1-D convolution with single
on a computer, both time and sensor data are discrete. channel in machine learning, however, can be extended
Assuming the functions s and k are defined on inte- to multichannels situation by simply applying convolu-
ger domain, discrete convolution then can be defined as tion on multiple channels in parallel, then adding the re-
Equation 6, where vs and vk are the max valid indexes in sults of those channels up. Figure 5 demonstrates what
function S and K , respectively, assuming there is always happened in 1-D convolution with two channels along
zero when the index is out of the valid range. time axis. In this example, the input is a sequence of
Due to convolution being commutative, the two rep- vectors with length for two, and the kernel has a ker-
resentations in Equation 6 are equivalent. However, the nel size for two. The output is computed by multiply-
latter one is more appropriate for machine learning al- ing the kernel by corresponding part of the input and
gorithm implementation, because in CNN context, the adding the products up through channels. Finally, this
kernel K , usually a multidimensional array of param- operation is implemented step by step along the convo-
eters, is much smaller than input S, leading to a much lution axis (time axis). In this article, 1-D convolution
smaller max valid index in the kernel. In other words, with multiple channels is used in all the convolutional
that the vk is much smaller than vs brings convenience layers.
in algorithm implementation. As mentioned above, a pooling operation is a part of a
Furthermore, in the latter term of Equation 6, the in- typical convolutional layer. It provides a statistical sum-
put S is flipped relative to the kernel K , that is, when n mary of the nearby output elements.
increases, the index into the kernel increases, but the in- One type of pooling operation is the max pooling
dex into the input S decreases; it brings the property of (Zhou and Chellappa, 1988). It is similar with convolu-
commutativity to convolution. Nevertheless, the com- tion, however, instead of matrix multiplication, it picks
mutativity is only useful for mathematical derivation out the maximum within its “kernel size,” called as pool
but not necessary for neural network implementation. length here, step by step. Besides, pool stride is the
As a result, a version of convolution without flipping, length of gaps between every two neighbor steps. It is
which actually should be called as cross-correlation, is usually set to be the same of pool length, whereas the
adopted by most machine learning libraries, as shown in similar parameter in convolution is configured to be
Equation 7. one. For instance, assuming the output before a max
vs vk pooling layer is a vector like [1, 2, 3, 4] and both pool
F (i) = S (n) K (i − n) = S (i − n) K (n) (6) length and pool stride are two, then the output of pool-
n=1 n=1 ing will be [2, 4].
14678667, 2017, 12, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12313 by Karadeniz Technical Universitaet, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1032 Lin, Nie & Ma

The pooling operation mainly brings two benefits: caping from “dying ReLU.” Although Figure 6c shows
first, it helps make the representation more invariant to a big α (0.1) for clarity, α = 0.01 is more widely used in
the small variance of the input. On account of the result machine learning applications. In this article, the leaky
of pooling being a statistical summary, a small variance ReLU is applied as nonlinearity, and the parameter α is
in input may change its statistic characters slightly. It is a 0.01 in every activation function.
useful property to make a neural network more robust
to detect whether some feature is present or not. Sec- 2.4.3 Batch normalization. BN is also a type of layer
ond, the pooling operation reduces the size of feature that is specialized to solve the “internal covariate shift”
map, which is essential to improve the computational problem in neural network training (Ioffe and Szegedy,
efficiency of the network. 2015). In brief, the distribution of internal activations
This work employs max pooling, and pool stride is set will continuously vary with the changes of network
to be four. weights during training, which makes the learning al-
gorithm have to fit these unstable distributions in every
2.4.2 Leaky rectified linear unit (ReLU). Activation training step, leading to a low convergence rate. The BN
functions are the essential part of a neural network, layer is designed to alleviate this problem with a cheap
which transforms a network from a linear mapping to computational cost.
a nonlinear one, resulting in a higher representation ca- The details of the algorithm are shown in Equation 9
pacity. to Equation 12.
Historically, the sigmoid function, having the form
1 
m
σ (x) = 1/(1 + e−x ) as shown in Figure 6a, had fre- μD = xi (9)
quently been used for its excellent properties, such as m
i=1
easy to calculate the derivatives and the range of its out-
1 
m
put is between zero and one. However, its defects make
it rarely used in recent years. For instance, the sigmoid σ D2 = (xi − μ D )2 (10)
m
i=1
function “kills” gradients. When its output is close to
either zero or one, the gradient is always approximate to
xi − μ D
zero, meaning parameters are hard to move anymore in xˆi =  (11)
back-propagation. It is a dangerous property, especially σ D2 + 
when the neural network becomes deeper and deeper,
leading to the difficulties in neural network training.
yi = γ xˆi + β (12)
The ReLU (Glorot et al., 2011; Nair and Hinton,
2010) was then proposed and has become very popular where  is a small number, which is used against numer-
recently. It has a simple form as f (x) = max(0, x), as ical problem of division by zero. During training, for ev-
shown in Figure 6b. This form mainly brings two mer- ery mini-batch of data, D = {x1 , . . . , xm }, the algorithm
its: first, gradients can be retained instead of vanish- calculates the mean and variance, then shifts and scales
ing at the whole right side of the function. This prop- the origin data xi to zero-mean and one variance. Fi-
erty brings a better training speed than using sigmoid nally, it introduces two learnable parameters, γ and β,
functions (Krizhevsky et al., 2012). Second, it is cheaper to hold the model flexibility.
in computation. Nevertheless, the ReLU may “die” in The BN layer keeps its output following a similar dis-
training process, that is, it is possible that some weights tribution. As a result, the difficulty of training in the
are updated by a large gradient, then their activations next layer is reduced, giving rise to a fast convergence.
become zero and never activate again because the gra- In practice, it is found that the BN not only brings fast
dient of the left side of ReLU is always zero. convergence but also enhances the generalization.
The leaky rectified linear units (leaky ReLU) (Maas A total of three BN layers are inserted in the utilized
et al., 2013) is a variant of ReLU designed to solve the network architecture to promote the training progress,
“dying ReLU” problem. To gain an insight into what is as illustrated in Figure 2.
different, Figure 6 is prepared. In Figure 6c, it can be
found that the right part has a slope. In fact, its mathe-
2.5 Structural damage detection
matical expression is shown in Equation 8,
 In this section, the introduced CNN is applied to struc-
x if x > 0
f (x) = (8) tural damage detection task. Although damage localiza-
αx otherwise
tion is mainly discussed, the ability of the CNN to detect
where α is a parameter with small value. When a unit is multiple damages and corresponding damage degree is
not active, the nonzero gradient brings the ability for es- also investigated.
14678667, 2017, 12, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12313 by Karadeniz Technical Universitaet, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Structural damage detection with automatic feature-extraction through deep learning 1033

Table 3
The detailed configuration of the convolutional neural network architecture

Layer Type Input shape Output shape Kernel num. Kernel size Stride Padding With BN Activation
1 Convo. (1024, 9) (1024, 32) 32 16 1 Same False Leaky ReLU
2 Convo. (1024, 32) (1024, 32) 32 16 1 Same True Leaky ReLU
3 Pooling (1024, 32) (256, 32) None 4 4 Valid False None
4 Convo. (256, 32) (256, 64) 64 16 1 Same False Leaky ReLU
5 Convo. (256, 64) (256, 64) 64 16 1 Same True Leaky ReLU
6 Pooling (256, 64) (64, 64) None 4 4 Valid False None
7 Convo. (64, 64) (64, 128) 128 16 1 Same False Leaky ReLU
8 Convo. (64, 128) (64, 128) 128 16 1 Same True Leaky ReLU
9 Pooling (64, 128) (16, 128) None 4 4 Valid False None
10 Flatten (16, 128) (2048) None None None None None None
11 FC (2048) (256) None None None None False Leaky ReLU
12 FC (256) (128) None None None None False Leaky ReLU
13 Softmax (128) (11) None None None None None None

Fig. 7. Overview of the damage detection procedure.

2.5.1 Single damage localization. Figure 2 shows an ities of every final category. More details are listed in
overview of the CNN. It contains two parts: convolu- Table 3.
tional layers and FC layers. These convolutional layers As illustrated in Figure 7, there are two phases in this
are designed to extract features from local regions of method: training phase and testing phase. In the train-
the original data and to repeat the feature-extraction ing phase, the neural network is updated by reducing
process from the previous feature map. To achieve this the difference between the predicted damage location
goal, two convolutional layers are stacked to get a good and the real damage location. An objective function is
performance in nonlinear transformation, then a pool- utilized to estimate the difference between the predic-
ing layer follows to get a statistical summary of the tion and the true answer, and its derivatives on model
feature map. All the convolutional layers operate con- parameters are used to update the neural network. On
volution along the time axis. And, the kernel number the other hand, in the testing phase, the trained neural
increases along with the depth to hold enough capa- network receives new responses data, which are never
bility of retaining information of the feature map be- seen by it in training. Then a prediction of damage loca-
cause the pooling operation always reduces the size of tion is made. Furthermore, an estimation of the model
the feature map along the time axis. The same substruc- performance can be conducted by checking whether the
ture is replicated twice to obtain an implicit hierarchical prediction is correct.
feature structure in the neural network. On the other Concretely, the accelerations of vertical direction are
hand, the top FC layers work as a traditional classifier, considered as input data, and more details are pre-
that is, they take the feature map from previous con- sented as follows. As mentioned in Section 2.1, the data
volutional layers as input and map it to the probabil- set contains a total of 6,885 measurements, and each
14678667, 2017, 12, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12313 by Karadeniz Technical Universitaet, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1034 Lin, Nie & Ma

Table 4
Configurations of training process

Name Value Description


Batch size 128 The size of data batch used in every training iteration
Steps per epoch 128 The number of steps in a single training epoch
Initial learning rate 5 × 10−6 The initial learning rate of Adam algorithm
β1 0.9 A parameter of Adam, weight of the momentum term
β2 0.999 A parameter of Adam, controlling the decay of learning rate
Patience 300 A parameter of early stopping, how long the performance does not increase is acceptable

measurement has a shape for 9 × 3,072. After shuffling ements are then treated as the probability of damage
the whole data set, 70% (4,820 samples) of it is used as predicted by the CNN. In other words, taking a sample
the training set, the testing set takes 20% (1,377 sam- as input, the network outputs a conditional probability
ples), and the validation set occupies 10% (688 sam- distribution of damage location given the input sample.
ples). All these data sets are processed by a normaliza- Categorical cross entropy is used as objective func-
tion and a data augmentation procedure as discussed in tion to estimate the difference between the true damage
Section 2.2. As a result, the network receives a batch of location and the predicted damage location. It measures
pieces of the original response sequence in every train- the difference between two probability distributions p
ing iteration. Each piece has a shape for 9 × 1,024. and q over the same underlying set of events. When
As a supervised learning algorithm, the CNN needs a distributions p and q are discrete, it is defined as
known label for every datum. In this section, the labels Equation 13,
are the damage locations, which are encoded into one- 
hot form. It means the label vector is generated by the H ( p, q) = − p (x) log (q (x)) (13)
rule that the vector has all zero elements except the po- x

sition i, where i is the number of damaged element, as where H ( p, q) is the cross entropy, x is the valid dis-
shown in Figure 3. In this way, each element in the la- crete value of the distributions. In this article, x is the
bel vector represents the probability of whether its po- index of elements in the label vector, because a valid
sition is the true damage location, that is, the element index represents a valid discrete damage location. p(x)
with value one in the true label vector means its corre- is the xth element in the true label vector, meaning the
sponding beam element is damaged with 100% possi- true possibility of the existence of damage in xth ele-
bility and the other zero elements mean their positions ment. Similarly, q(x) represents the xth element in the
have 0% possibility of damage. This representation is prediction vector, indicating the predicted possibility of
required because the CNN uses Softmax layer as the the existence of damage in xth element.
output layer. It squashes the input to a vector of which To minimize the output of the objective function, an
each element is in a range from zero to one. These el- adaptation of mini-batch stochastic gradient descent

Fig. 8. Comparing features by the same classifier.


14678667, 2017, 12, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12313 by Karadeniz Technical Universitaet, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Structural damage detection with automatic feature-extraction through deep learning 1035

algorithm, named Adam (Kingma and Ba, 2014), is age level 10% and 30%, respectively, and the others are
employed as the optimization algorithm. It uses both intact.
an adaptive learning rate and a momentum term during Similarly, the objective function also needs a mod-
the training, leading to a better performance. Early ification to follow the change in label definition.
stopping technology is utilized to control the training Mean squared error is used in this task as shown in
process to avoid overfitting. It records the recent best Equation 14.
performance of the neural network on validation set
1
k
during training. If the performances of further several L (y p , yt ) = (y p (n) − yt (n))2 (14)
iteration epoches all do not exceed the recent best, then k
n=1
it stops the training process. Detailed configurations of
where y p and yt are the label of prediction and the true
training are summarized in Table 4.
label, respectively, and k is the length of the label vector,
which is 10 in this case.
2.5.2 Multiple damages identification. In this section,
Having been trained, a CNN can give a predic-
the application of CNN to multiple damages identifica-
tion label of which each element indicates a pred-
tion is discussed. Both damage level and damage loca-
icated damage level of its corresponding beam el-
tion are considered by designing a new representation
ement, leading to multiple damages identification.
of damage label and using our CNN to regress it. Specif-
Similarly, the method is tested on both noise-free and
ically, it is investigated that at most three damages exist
noisy data set.
and the possible damage levels are the five levels men-
tioned in Section 2.1.
Almost the same neural network architecture de-
scribed in Section 2.5.1 and all the configurations of 2.6 Visualization for physical interpretation
training are still used in this section, except three as- Deep neural networks have shown their excellent per-
pects: the strategy of responses data generation, the out- formances in a variety of fields. However, they are
put layer of the CNN, and the objective function. “black boxes” and humans may find it hard to under-
To obtain responses data with multiple damages, mul- stand how they work. It is vital for researchers to get
tiple stiffness degradations of the beam elements are qualitative interpretations to these systems so that an
simulated on the simply supported beam model men- estimation can be carried out on their reliability. In
tioned in Section 2.1. The strategy of computation is de- structural damage detection task, meaningful indexes
scribed as follows: (1) randomly choosing the number have been discussed by previous works, such as vi-
of damages in range from zero to three; (2) randomly bration modes and their derivatives, as mentioned in
selecting the beam elements to be decayed according Section 1. These indexes map the structural re-
to the number of damages; (3) randomly choosing the sponse into a new representation to more efficiently
damage level for each selected element in the five dam- show the healthy condition of the structure. If simi-
age levels as described in Session 2.1; (4) randomly lar representations are utilized inside the neural net-
choosing a load position, then computing the structural work, it can be more easily understood in a physical
response using this configuration of damage; and (5) re- view.
peating the process (1) to (4) 10,000 times to get enough How to visualize a hidden layer in a neural network
data. As a result, the data set consists of 10,000 ex- to get its intermediate representations? A straightfor-
amples including the ones with intact condition, single ward idea to solve this problem is generating a synthetic
damage, and multiple damages. Following the setting in signal that maximally activates the target hidden neu-
Section 2.5.1, the 70%, 20%, and 10% of the data set ron (Erhan et al., 2009). The activation value of a neu-
are used as training set, testing set, and validation set ron can be regarded as a score to evaluate how similar
respectively. is the input signal to the feature learned by the neural.
The last layer of the CNN is also modified to fit the In other words, the signal that makes the target neu-
requirement of detecting multiple damages. Instead of ron most active is the one most similar to the learned
Softmax activation, the last layer now uses linear acti- feature. As a result, it is chosen as an intuitive expres-
vation to fit a new definition of label and the output size sion for this neuron.
is changed to 10. Now the label is a vector consisting of Specifically, Simonyan et al. (2013) used an additional
10 elements. Each element represents the damage level 2 norm penalty term to forbid the amplitude of the in-
of corresponding beam element. For example, if a label put signal increase infinitely, as shown in Equation 15,
vector has all zero elements except that the second el-
ement is 0.1 and the fourth is 0.3, it means the second
and fourth element of the beam are damaged with dam- x̂ = arg max si j (I, θ ) − λI 22 (15)
I
14678667, 2017, 12, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12313 by Karadeniz Technical Universitaet, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1036 Lin, Nie & Ma

Table 5
Scenario setting

Feature extractor

Scenario no. Convolutional net. Wavelet packet


Noise Noise free 1 2

With noise 3 4

where si j is the score or the activation value of ith neural


in layer j in a neural network, I and θ are the input Fig. 9. Accuracy of scenarios defined in Table 5.
signal and model parameters, respectively.
In this work, the CNN is utilized. As a result, acti-
vations have a form of feature maps (refer to Figure 2 and the visualization of hidden layers is implemented
for insight) instead of a single value. A criterion with a to get qualitative interpretations into these features
form of single value is, thus, needed to estimate the ac- as well.
tivity level of the whole feature map and mean value In this section, the performances of features given
is considered here. A modification is therefore intro- by the proposed method and a powerful hand-crafted
duced that using the mean of the ith feature map in feature extractor, the wavelet packet transform com-
layer j instead of the activation value si j , as shown in ponent energy (Law et al., 2005), are compared. Al-
Equation 16, though directly comparing the performance of features
is hard, the following approach is considered. Ordinar-
x̂ = arg max Si j (I, θ ) − λI 22 (16) ily, shallow neural networks have been used as clas-
I
sifiers with hand-crafted features in structural damage
detection field for years. The proposed architecture
x̂ − x̂
x∗ = (17) also can be recognized as a combination of two parts:
σ (x̂) the convolutional layers at the bottom can be re-
where Si j (I, θ ) is the ith feature map in layer j and the garded as a hierarchical feature extractor, and the
weight parameter λ here is set to be 10 in this article. In top FC neural network plays a role as a classifier. In
addition, a normalization process is carried out to en- this way, the comparison can be implemented by re-
sure the same scale in visualization results, as shown in placing the bottom feature extractor with the hand-
Equation 17. Subsequently, intuition into how the neu- crafted features, which is the wavelet packet trans-
ral network works can be found by observing these re- form component energies in this article, as shown in
sults through depth of the network. Figure 8. More details will be discussed in the next
subsection.

3 RESULTS
3.1 Scenarios and configuration
As mentioned in Section 1, there are defects in these To estimate the learned feature, two schemes are
hand-crafted features for structural damage detection. considered. The first is comparing performance with
Some of them are susceptible to noise or hard to mea- another feature extractor. A comparison between the
sure from responses data directly, and some are maybe proposed method and the wavelet packet transform
too general instead of concentrating on local damage. component energy is then implemented. As shown in
Besides, designing these features costs a considerable Figure 8, the two feature extractors are judged by the
amount of time and requires expert knowledge. same neural network classifier, that is, the better fea-
The primary purpose of this work is extracting fea- tures will lead to a better performance. Second, its ro-
tures of the structure condition from responses data bustness to noise is investigated. Both original and noisy
directly by the learning algorithm. These features are data are arranged in the comparative tests. The defini-
specialized for damage detection by supervised train- tion of noise level here is already discussed in Section
ing to locate the damage. However, how to estimate 2.3. All the scenario plans are summarized in Table 5.
the extracted features is an important topic. To tackle Here are some details about the parameters in
this issue, the classification accuracy is considered, the wavelet packet analysis used in this study. Db2
14678667, 2017, 12, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12313 by Karadeniz Technical Universitaet, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Structural damage detection with automatic feature-extraction through deep learning 1037

Table 6
The accuracy of scenario 1

Predicted damage location

Count 0 1 2 3 4 5 6 7 8 9 10 Total %
Actual damage 0 745 13 26 5 0 0 0 0 0 0 1 790 94.30
location 1 9 677 13 6 3 1 0 0 0 0 2 711 95.22
2 5 9 769 4 3 0 0 0 0 0 0 790 97.43
3 4 3 14 728 8 0 1 0 0 1 1 760 95.79
4 1 7 10 10 629 2 4 0 1 1 3 668 94.16
5 1 0 2 5 4 646 3 8 2 0 3 674 95.85
6 0 0 0 5 8 6 638 13 19 1 1 691 92.33
7 0 0 2 4 3 5 13 694 5 8 3 737 94.17
8 0 0 0 0 2 4 10 16 716 14 3 765 93.59
9 0 0 0 0 0 1 7 20 7 762 0 797 95.61
10 4 4 23 6 8 2 8 2 4 5 743 809 91.84
Total 769 713 859 773 668 667 684 753 754 792 760 8192 Overall: 94.57

Table 7
The accuracy of scenario 2

Predicted damage location

Count 0 1 2 3 4 5 6 7 8 9 10 Total %
Actual damage 0 792 26 3 4 2 0 0 0 0 0 2 829 95.54
location 1 4 615 2 2 6 2 0 0 0 0 9 640 96.09
2 3 7 772 7 3 1 0 1 0 0 3 797 96.86
3 0 1 0 751 2 2 0 0 0 0 5 761 98.69
4 0 1 1 5 627 2 1 0 0 1 3 641 97.82
5 1 0 0 0 0 682 0 0 1 0 2 686 99.42
6 0 0 0 2 2 18 638 4 0 5 9 678 94.10
7 0 0 0 0 0 1 4 680 6 3 8 702 96.87
8 0 0 0 1 0 9 0 3 721 8 16 758 95.12
9 0 0 0 0 0 2 3 3 10 732 2 752 97.34
10 12 33 18 48 29 65 9 9 21 22 682 948 71.94
Total 812 683 796 820 671 784 655 700 759 771 741 8192 Overall: 93.90

wavelet (Daubechies and Heil, 1992) is used as wavelet neural network architecture with GPU acceleration.
generating function, and four is selected as the level On the other hand, the hardware platform has the
of wavelet packet decomposition; as a result, there specification that the processor and video card type
are 16 component coefficients per signal. Finally, the are Inter Core i7-4720HQ and NVIDA GTX970M,
original responses data, containing nine channels, can respectively.
be represented by a vector of wavelet packet transform The training of CNN is accelerated by GPU in all sce-
component energies with length for 144 (16 × 9). narios. Meanwhile, the computation of wavelet packet
In the scenarios considering noise, white Gaussian transform component energy is executed on the CPU,
noise is injected in all three data sets, that is, the net- leading to a much lower computational efficiency. De-
work is trained by noisy data and also is tested by noisy tailed numbers are listed as follows: single training
data. epoches of all the CNNs cost the same time, 13 sec-
As for the conjuration of computing platform, all onds, however, different numbers of epoch are con-
the numerical experiments are performed on the same sumed to achieve the best performance in validation. In
software and hardware platform. MATLAB is used single damage localization task, the scenario with noise
to program the structural response numerical simula- needs less time to achieve the best. Concretely, 954 and
tion program. Python package Tensorflow and Keras 737 epoches are expanded in noise-free and noisy situa-
(Chollet et al., 2015) are employed to establish the tion, respectively. Similarly, the situation with noise in
14678667, 2017, 12, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12313 by Karadeniz Technical Universitaet, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1038 Lin, Nie & Ma

Table 8
The accuracy of scenario 3

Predicted damage location

Count 0 1 2 3 4 5 6 7 8 9 10 Total %
Actual damage 0 677 22 59 3 4 0 0 0 0 0 2 767 88.27
location 1 17 626 6 8 14 1 0 1 10 2 1 686 91.25
2 29 13 711 8 7 7 0 4 1 2 0 782 90.92
3 20 26 13 706 14 6 9 0 5 3 0 802 88.03
4 1 26 30 7 547 16 5 4 6 5 0 647 84.54
5 1 10 6 5 30 621 17 21 14 11 1 737 84.26
6 2 2 3 14 3 17 574 14 40 29 2 700 82.00
7 2 1 4 1 8 9 5 631 11 58 0 730 86.44
8 0 4 0 1 7 7 9 14 717 29 0 788 90.99
9 3 0 1 0 1 3 2 26 13 706 0 755 93.51
10 13 26 29 13 22 4 5 17 21 38 610 798 76.44
Total 765 756 862 766 657 691 626 732 838 883 616 8192 Overall: 86.99

Table 9
The accuracy of scenario 4

Predicted damage location

Count 0 1 2 3 4 5 6 7 8 9 10 Total %
Actual damage 0 632 73 19 6 9 8 2 1 2 5 9 766 82.51
location 1 26 579 18 6 9 7 4 1 2 1 14 667 86.81
2 33 29 678 10 12 7 2 6 1 1 8 787 86.15
3 18 12 32 616 14 10 10 5 5 5 7 734 83.92
4 11 22 10 14 577 30 9 5 1 2 7 688 83.87
5 6 15 14 6 13 618 22 6 5 5 11 721 85.71
6 6 10 13 3 14 19 551 19 4 4 5 648 85.03
7 2 7 6 2 7 19 21 594 21 15 4 698 85.10
8 4 2 1 3 14 30 16 38 588 57 15 768 76.56
9 7 4 8 5 7 23 14 22 23 619 10 742 83.42
10 69 138 68 22 46 86 54 88 46 30 326 973 33.50
Total 814 891 867 693 722 857 705 785 698 744 416 8192 Overall: 77.86

multiple damages identification spends 1,609 epoches, 3.2 Single damage localization accuracy
less than 3,462 epoches costed in the noise-free situa-
Damage localization is concentrated and the neural net-
tion. It may be more difficult in training on a noisy data
work should “observe” the responses data, then “diag-
set, so that a further current best performance is harder
nose” the damage position. A total of 11 location cate-
to get than in the case without noise. As for the de-
gories are prepared. Specifically, categories from no. 0
tector with wavelet package component energy, it con-
to no. 9 represent the damages are located on element
sumes 190 seconds per epoch in training for the low ef-
no. 0 to no. 9, and category no. 10 represents the intact
ficiency of computing wavelet package on CPU; 63 and
condition.
91 epoches are spent in noise-free and noisy situation,
Figure 9 shows the accuracies of the four scenarios.
respectively.
It can be found that the CNN achieves a high accu-
The testing processing takes much less time than
racy, 94.57%, in the noise-free situation. And the fea-
training does. Every CNN has the same speed of testing,
ture extractor of wavelet packet energy also obtains a
8.92 × 10−4 seconds per example. Meanwhile, a speed
good result for 93.90% accuracy. When white noise is
for 2.07 × 10−2 seconds per example is measured in
injected into the data set, the CNN still gets a con-
the cases of detectors with wavelet package component
siderable accuracy for 86.99%. It largely exceeds the
energy.
14678667, 2017, 12, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12313 by Karadeniz Technical Universitaet, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Structural damage detection with automatic feature-extraction through deep learning 1039

accuracy, 77.86%, in scenario 4. It seems that the


wavelet packet transform component energy is more
sensitive to noise.
In addition, more details can be found in Table 6 to
Table 9. A total of 8,192 examples are randomly chosen
from the testing data set to estimate the model perfor-
mance. These tables show how models predict damage
location in a statistical view. Table 6 shows the neural
network makes good predictions and only a small num-
ber of mistakes, which are mainly neighboring the true
damage location. As same as the previous scenario, sce-
nario 2 is also carried out without noise, and Table 7
presents its performance. It can be observed that errors Fig. 10. 2-D visualization of prediction for scenario no. 1.
occur also in adjacent elements of the diagonal, imply-
ing the locations near the true damage location are more
error prone. Moreover, detecting the intact condition
seems more difficult that only 71.94% accuracy is ob-
tained. Overall, the performance of this scenario is sim-
ilar with scenario 1, except the poor accuracy of detect-
ing intact condition. When noise is considered, the CNN
still performs well as detailed in Table 8. Excepting the
errors near the true damage location, the other errors
mainly appear in the elements on antidiagonal of the
table, meaning the network predicts the wrong damage
positions at the symmetric position to the true damage
position. Besides, the lower accuracy of detecting intact
condition indicates the noise makes the gap between the Fig. 11. 2-D visualization of prediction for scenario no. 2.
intact condition and the other damage conditions am-
biguous, leading to the difficulty in identifying the in-
tact condition. As for scenario 4, a worse performance ter, such as the point of 10 in the cluster of zeros and the
is shown in Table 9. The noise deteriorates the perfor- points of three in the cluster of two. In scenario 2, dif-
mance of wavelet packet transform component energy ferent classes are approximately divided as illustrated
largely in detecting the intact condition. in Figure 11. Nevertheless, there are some confusions in
To get a further intuition, a visualization technol- some clusters. Especially category 10, the intact condi-
ogy called “t-SNE” (Maaten and Hinton, 2008) is tion, is dispersed in multiple clusters. In scenario 3, the
employed to visualize the distribution of predicted dam- performance of the CNN is influenced by noise. And,
age location label. The t-SNE is designed to create a the corresponding 2-D map, Figure 12, shows the points
2- or 3-D map of high-dimensional data, trying to hold with the same categories gather in the same cluster, al-
both local and global data structure at the same time. It though a few points are at the wrong place. Besides, the
means nearby points in original high-dimensional space points in close categories are more fallible, such as the
are still close in new low-dimensional space and the far points of other categories in the cluster of zero. Finally,
away points are still far. It is a useful technique to reveal in scenario 4, the cluster of 10 with such a wide range in
data structure in a single map. Figure 13 makes the identification hard, indicating the
Four 2-D maps of predicted location, distribution poor accuracy.
for the four scenarios are displayed in Figure 10 to
Figure 13 and each figure shows 256 points down-
3.3 Multiple damages identification results
sampled form the test data set. The two dimensions
called “t-SNE axis-1” and “t-SNE axis-2” in these fig- A trial of multiple damages identification is also con-
ures are created by the t-SNE algorithm to find the best ducted. As described in Section 2.5.2, at most three
representation of the relationship of data in the orig- damages are introduced into the numerical model, and
inal high dimension. It can be identified in Figure 10 both situations whether considering noise are tested.
that points in the same class are close to each other and To achieve multiple damages identification, a modifica-
there are clear gaps between each two clusters. It also tion is conducted on the label definition that each ele-
can be observed that a few points are in the wrong clus- ment of the label vector indicate the damage level of its
14678667, 2017, 12, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12313 by Karadeniz Technical Universitaet, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1040 Lin, Nie & Ma

(a)

Fig. 12. 2-D visualization of prediction for scenario no. 3. (b)

Fig. 15. Prediction of damage level for single damage case on


noise-free (a) and noisy data set (b).

(a)

Fig. 13. 2-D visualization of prediction for scenario no. 4.

(b)

Fig. 16. Prediction of damage level for two damages case on


(a) noise-free (a) and noisy data set (b).

(b)

Fig. 14. Prediction of damage level for zero damage case on (a)
noise-free (a) and noisy data set (b).

corresponding beam element. Subsequently, mean


squared error is utilized as the objective function to
measure the distance between the true label and the
predicted label.
(b)
The performance of the trained neural network is
tested on the testing data set. A tiny mean squared Fig. 17. Prediction of damage level for three damages case on
error, 1.27 × 10−4 , is achieved on the noise-free situa- noise-free (a) and noisy data set (b).
tion, whereas the one for 8.33 × 10−4 is obtained on the
noisy situation. To get a further insight in the identifi-
14678667, 2017, 12, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12313 by Karadeniz Technical Universitaet, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Structural damage detection with automatic feature-extraction through deep learning 1041

cation performance, four examples are prepared to test


the learned neural network, including situations con-
taining zero to three damages and low to high damage
level.
Figure 14 to Figure 17 show comparisons between the
prediction of damage label and the true damage label.
In Figure 14, both models on the two noise conditions
exhibit an excellent prediction on an example of intact
condition: both predictions of damage level are close to
zero.
The prediction of a single tiny damage for level 10%
is demonstrated in Figure 15. An excellent result, only
2.8% error and correct localization, is obtained by the
neural network trained on the noise-free data set. How-
ever, when noise is considered, it appears to be con-
fused, judging the example with tiny damage being in
Fig. 18. Spectrum of the learned feature in visualization point
the intact condition. 1, the peaks are centered at 52.1 Hz, 210.1 Hz, and 484.7 Hz,
When another damage with level 30% is added on el- the third, sixth, and ninth natural frequency of the structure.
ement 3, as shown in Figure 16, the noise-free model
also gives a good prediction, that is, correct localization
and little error on level 30% damage are achieved. A
reasonable result is obtained in the noisy case that the
error on element 3 is less than 8%, nevertheless, the
tiny damage is still underestimated and a little mistake
is made on element 1.
In the case of three damages as illustrated in
Figure 17, the noise-free model achieves an almost per-
fect prediction except the underestimation on damage
level 10% still exists. Although the model trained on
noisy data set still cannot recognize tiny damage on el-
ement 3, it gives an excellent prediction on the other
damages.
In general, the CNN also achieves a reasonable result
on the multiple damages identification task. Despite the
difficulty in detecting tiny damage, our method shows its Fig. 19. Spectrum of the learned feature in visualization
excellent performance of damage localization and dam- point 1, the peaks are centered at 52.1 Hz and 287.7 Hz, the
age degree prediction. In addition, the reliability is also third and seventh natural frequency.
exhibited by the models giving low damage estimation
on all the intact elements.
In this section, using the method mentioned in Sec-
tion 2.6, an interpretation of our CNN is demonstrated
by showing what features are learned by each hid-
3.4 Physical interpretation through visualization
den layer. The representations of the learned features
As discussed in Section 2.6, although the CNN shows are obtained by maximizing the average of the target
its excellent performance, it is inscrutable for hu- feature map over the input, that is, the optimized in-
mans, leading to distrust in its reliability. In other put makes the target neuron most active. If the input
words, one may not understand how a neural net- deviates from the optimized state, the corresponding ac-
work works and be not clear when it will make tivation value of the neuron will decline rapidly. Specif-
mistakes. This problem is particularly severe when ically, the CNN in scenario 3 is discussed in detail here
a neural network works on a responsible task, just for its robustness to noise, implying the network has
as structural damage detection. Hence, the effort learned some essential characters behind the data. From
of finding physical interpretations to help the hu- bottom to top, four visualization points in our CNN are
man understand how the neural network works is chosen, as displayed in Figure 2. On account of the diffi-
significant. culty in directly getting insight from time-sequence data
14678667, 2017, 12, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12313 by Karadeniz Technical Universitaet, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1042 Lin, Nie & Ma

Fig. 20. Spectrum of the learned feature in visualization Fig. 22. Spectrum of the learned feature in visualization
point 2, the peaks are centered at 210.1 Hz and 379.0 Hz, the point 3, the peak is centered at 52.1 Hz, the third natural
sixth and eighth natural frequency of the structure. frequency of the structure.

Fig. 21. Spectrum of the learned feature in visualization Fig. 23. Spectrum of the learned feature in visualization
point 2, the peak is centered at 379.0 Hz, the eighth natural point 3, the peaks are centered at 52.1 Hz and 210.1 Hz, the
frequency. third and sixth natural frequency.

for humans, all the representations of the learned fea- 19 shows a band-pass filter with two peaks centering on
tures are transformed into frequency domain. the third and seventh natural frequency (52.1 Hz and
In the bottom of the network, both layers at visu- 287.7 Hz). Besides, there is also all-pass filter in this
alization points 1 and 2 in Figure 2 have 32 kernels. layer, which means the origin signals can be received
Some of these kernels are visualized and demonstrated by subsequent layers. In the next visualization point, it
first. Figure 18 and Figure 19 illustrate the spectrums of is obvious that better filters are learned as displayed in
two kernels at visualization point 1. Both figures show Figure 20 and Figure 21. Specifically, both single-band
that the kernels in the first convolutional layer seem and multiband filters have a narrower bandwidth, and
to have learned a band-pass filter. Moreover, although burrs are less than the last layer.
the bandwidths are wide and many burrs can be found It, therefore, can infer that the bottom layers have
in a large range of frequency, the peaks are centered learned elemental features of the data, which are the
at the natural frequency of the structure. Concretely, main frequency components in this case. Furthermore,
Figure 18 shows the third, sixth, and ninth natural it can be found that the learned features become more
frequency (52.1 Hz, 210.1 Hz, and 484.7 Hz) and Figure distinct in the deeper layer.
14678667, 2017, 12, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12313 by Karadeniz Technical Universitaet, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Structural damage detection with automatic feature-extraction through deep learning 1043

Fig. 24. Spectrum of the learned feature in visualization Fig. 25. Spectrum of the learned feature in visualization
point 4, the third natural frequency can be found at 52.1 Hz. point 4, the third, sixth, and seventh natural frequency can be
found at 52.1 Hz, 210.1 Hz, and 287.7 Hz.

Subsequently, some kernels in visualization point 3


are illustrated in Figure 22 and Figure 23. The num-
ber of kernels in this layer, a total of 64, is more than
the last layer. In general, the learned features become
more distinct than point 2. The bandwidth is so narrow
that the peak becomes a “line” in Figure 23. The single-
band filter also has a narrow bandwidth as illustrated by
Figure 22. Furthermore, it is a significant relationship
between different channels in Figure 22: if the steeple
in every figure is picked out, redrawn with appropriate Fig. 26. A shape of third mode in a kernel representation.
signs and connected, it would appear a shape of the third
mode of the structure. To confirm this assumption, the
corresponding optimized time–sequence data are visu- these features are specialized in recognizing a combi-
alized as a video. According to this video, it is found nation of specific modes, that is, if the balance of each
that the structure dominantly vibrates in the third mode, component changes in the input data, the activation of
though some high-frequency components also can be this neuron will also have a change.
observed. Figure 26 shows a comparison between the In this layer, the learned features become high
shape of the structure in one frame of the video and the level and understandable with mechanical meaning.
true third mode. They are extremely similar. In mechanics, the response of a linear system is a
It is reasonable to suppose not only the features linear combination of its modes. This mechanism hides
learned by previous layers are refined further, but also behind the data. However, the network has learned
the links between channels are intensified in the mid- it independently to achieve the damage localization,
dle layers. As a result, the learned features become such as it has learned the concept of mode combination
high level and understandable with mechanical mean- described previously. Notice that all these features are
ing, such as the concept of third mode. learned on the noisy data set; it only uses these essential
Finally, the kernels at visualization point 4, the top features that can fight against the noise. That is the
of the convolutional layers, show their more abstract reason why the neural network can keep a reasonable
and high-level features. As shown in Figure 24, the performance on the noisy data set.
third mode filter also can be found in this layer and the It is noteworthy that the other features not shown
bandwidth is narrower. Figure 25 demonstrates a multi- in this section have similar property with the ones that
band filter holding three bands of frequency with nar- have been shown. For instance, most of these features
row bandwidths. Besides, different weights of amplitude are also band pass filters but have different weights of
can be found in different channels. It implies a concept amplitude in different channels or have different levels
of “mode combination” may be learned in this layer and of coarseness.
14678667, 2017, 12, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12313 by Karadeniz Technical Universitaet, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1044 Lin, Nie & Ma

Table 10
Hierarchical features learned by deep learning models in different tasks

Goal Original input Low-level feature Medium-level feature High-level feature


Structural damage detection Structure responses Frequency bands Vibration modes Combinations of modes
Speech recognition Speech records Frequency bands Phonemes Words
Image recognition Pixels Edges Object parts Object models

In brief, the learned features show a characteristic of sult”) by ensuring the principle (“reason”) using their
hierarchy: they become more and more distinct and ab- knowledge and logic, then use these principles to solve
stract layer by layer and even mechanical concepts are problems (“application”). However, the real world is
learned in the deep layer. Moreover, the features from complex, and a phenomenon is always caused by many
high layers are constituted by the ones from low lay- factors. Instead of considering some of these factors
ers in a nonlinear way, so that abstract features can be and establishing a hypothesis, another methodology is
learned. This implies that a deep neural network can considering all the factors implicitly by directly linking
independently learn essential characters hidden behind data (“result”) and problem-solving (“application”). A
the data to achieve its goal. Table 10 may lend fur- relationship can be established between arbitrary two
ther insight into how deep learning models work with or more factors by learning algorithm, if only relating
learned hierarchical features by comparing these fea- data have been prepared. The learned relationship
tures between different tasks, such as speech and image then can be used in various applications. This is the
recognition. key idea in the term of “big data.” This article follows
this methodology. Specifically, the network achieves
structural damage detection without knowledge from
4 DISCUSSION AND CONCLUSION human experts, only depending on the data.
The proposed method has demonstrated its remark-
In this article, a novel structural damage detection ap- able performance and provides a new framework for fu-
proach using the deep CNN is proposed. It can automat- ture studies to detect structural damage.
ically extract features from low-level waveform signals First of all, excellent classification accuracy is ob-
instead of relying on hand-crafted features. Interesting tained even in the situation with noise. Such a good
results are obtained: (1) the CNN shows its excellent accuracy is vital for a detection system. In addition, a
accuracy, even though the data set is noisy; (2) a reason- reasonable performance is also achieved in the multiple
able performance is also observed in the multiple dam- damages identification.
ages identification; and (3) by visualizing hidden layers, Second, the neural network can automatically extract
it is found that hierarchical features are learned layer by features of the structure from low-level sensor data
layer and even mechanical concepts are learned in deep directly. As a result, a lot of time could be saved.
layers, such as vibration modes and their combination. Moreover, the learned features are optimized to facil-
This study extends the application of neural network itate the final classification accuracy instead of being
method in structural damage detection. As mentioned limited in specific physical definition. For instance, the
in Section 1, neural network approaches are applied in concept of vibration mode is found in the high layer
structural damage detection field for years. However, al- of convolutional layers, however, extracting vibration
most all of these researches are based on hand-crafted modes is not the goal of the network, it just found some
features from mechanics or statistics and use a neural specific modes are helpful for damage detection in
network as a classifier or a regressor to achieve damage this case. Besides, even these vibration mode features
detection. In contrast, this article only uses a CNN to are used more complexly in the later layers, that is,
link the responses data and damage information. It is a they are combined and transformed nonlinearly in the
fresh attempt at applying neural network technology in subsequent layers. Instead, human researchers usually
this issue. derive new indices from some extant concepts, then
On the other hand, it provides a new framework validate their performance.
to tackle this problem. To solve a scientific problem, Third, the methodology, learning from data, has a
traditional methodology tends to analyze it from good extensibility. Notice that the network does not
“reason” to “result,” then leading to “application.” In know any structure information but responses data,
other words, researchers usually interpret the data (“re- in other words, it can be applied to any structure.
14678667, 2017, 12, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12313 by Karadeniz Technical Universitaet, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Structural damage detection with automatic feature-extraction through deep learning 1045

Similarly, although damage localization is concentrated tional Conference on Acoustics, Speech and Signal Process-
in this work, the method also can be easily generalized ing, Kyoto, Japan, 4277–80.
to detection task as shown in Section 3.3. Besides, Adeli, H. & Jiang, X. (2005), Dynamic wavelet neural network
for nonlinear identification of highrise buildings, Computer-
though the linear structure is discussed in this article Aided Civil and Infrastructure Engineering, 5, 316–30.
and hand-crafted features still have a reasonable Adeli, H. & Jiang, X. (2009), Intelligent Infrastructure: Neu-
performance, there are other situations: assuming ral Networks, Wavelets, and Chaos Theory for Intelligent
the problem becomes more complex and the existing Transportation Systems and Smart Structures, CRC Press,
human knowledge might not be able to represent its Boca Raton, FL.
Adewuyi, A. P., Wu, Z. S. & Serker, N. H. M. K. (2009), As-
characters perfectly, the neural network may still keep sessment of vibration-based damage identification methods
the potential to achieve a good result, because of its using displacement and distributed strain measurements,
ability of self-evolution by constantly learning from Structural Health Monitoring, 8(6), 443–61.
data. Nevertheless, human experts are still significant. Amezquita-Sanchez, J. & Adeli, H. (2015), Feature extrac-
A supervisor is necessary for any artificial intelligence tion and classification techniques for health monitoring of
structures, Scientia Iranica. Transaction A, Civil Engineer-
system, and human experts should validate whether the ing, 22(6), 1931.
system is irregular or not. Moreover, scientists might be Amezquita-Sanchez, J. P. & Adeli, H. (2016), Signal pro-
able to get insight into complex problems from a neural cessing techniques for vibration-based health monitoring
network by visualizing layers in it. of smart structures, Archives of Computational Methods in
Future work will focus on the problems caused by the Engineering, 23(1), 1–15.
An, Y., Spencer, B. F. & Ou, J. (2015), A test method for
dependence on numerical model. The proposed method damage diagnosis of suspension bridge suspender cables,
needs a large amount of data to involve different dam- Computer-Aided Civil & Infrastructure Engineering, 10,
age scenarios. However, a real-world structure only has 1–14.
one life-cycle and the data from various damage scenar- Blachowski, B., An, Y., Spencer, B. F. & Ou, J. (2017), Ax-
ios are unavailable. In addition, almost all the structures ial strain accelerations approach for damage localization
in statically determinate truss structures, Computer-Aided
on active duty are in a condition near intact, leading to Civil & Infrastructure Engineering, 32(4), 304–18.
scarcity of the data carrying explicit and distinct damage Cao, M., Xu, W., Ostachowicz, W. & Su, Z. (2014), Damage
information. An accessible approach to deal with this identification for beams in noisy conditions based on teager
situation may be modeling an accurate FE model of the energy operator-wavelet transform modal curvature, Jour-
intact structure and using this model to generate data nal of Sound & Vibration, 333(6), 1543–53.
Cavadas, F., Smith, I. F. C. & Figueiras, J. (2013), Damage de-
from different damage scenarios. Nevertheless, distinc- tection using data-driven methods applied to moving-load
tion between the FE model and the real-world structure responses, Mechanical Systems & Signal Processing, 39(1-
always exists. This means a CNN has to learn from FE 2), 409–25.
model and predict on real-world data, even the data sets Cawley, P. & Adams, R. D. (1979), The location of defects
yield different distributions. It is a problem in transfer in structures from measurements of natural frequencies,
Journal of Strain Analysis for Engineering Design, 14(2),
learning domain, a topic of machine learning, because 49–57.
the data sets for learning and predicting have different Cha, Y.-J., Choi, W. & Buyukozturk, O. (2017), Deep
distributions. This problem may be the key point to ap- learning-based crack damage detection using convolutional
ply this method on real-world structure. neural network, Computer-Aided Civil & Infrastructure En-
gineering, 32(3), 2013–14.
Chen, F., Jahanshahi, M. R., Wu, R. & Joffe, C. (2017),
ACKNOWLEDGMENTS A texture-based video processing methodology using
Bayesian data fusion for autonomous crack detection on
metallic surfaces, Computer-Aided Civil & Infrastructure
The authors acknowledge the financial supports from Engineering, 32(4), 271–87.
the National Natural Science Foundation of China Chollet, F., et al. (2015), Keras. Available at: https://github.
(11402098), International Science and Technology Co- com/fchollet/keras, accessed October 2017.
operation Fund of Qing Hai Province (2014-HZ-822), Daubechies, I. & Heil, C. (1992), Ten Lectures on Wavelets,
Science and Technology Plan of Guang Dong Province Society for Industrial and Applied Mathematics, Philadel-
phia, PA.
(2013B021500008), and National Key Laboratory Open Dutta, A. & Talukdar, S. (2004), Damage detection in bridges
Fund of China (SV2014-KF-16). using accurate modal parameters, Finite Elements in Anal-
ysis & Design, 40(3), 287–304.
Erhan, D., Bengio, Y., Courville, A. & Vincent, P. (2009), Vi-
REFERENCES sualizing higher-layer features of a deep network, Univer-
sity of Montreal, 1341, 3.
Abdel-Hamid, O., Mohamed, A., Jiang, H. & Penn, G. (2012), Farrar, C. R. & Iii, G. H. J. (1997), System identification from
Applying convolutional neural networks concepts to hybrid ambient vibration measurements on a bridge, Journal of
nn-hmm model for speech recognition, in IEEE Interna- Sound & Vibration, 205(1), 1–18.
14678667, 2017, 12, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12313 by Karadeniz Technical Universitaet, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1046 Lin, Nie & Ma

Glorot, X., Bordes, A. & Bengio, Y. (2011), Deep sparse recti- Maaten, L. v. d. & Hinton, G. (2008), Visualizing data using t-
fier neural networks, in Proceedings of the Fourteenth Inter- SNE, Journal of Machine Learning Research, 9(Nov), 2579–
national Conference on Artificial Intelligence and Statistics, 605.
University of Lleida, Lleida (Catalonia), Spain, 315–23. Mehrjoo, M., Khaji, N., Moharrami, H. & Bahreininejad, A.
Hou, Z. K., Noori, M. N. & Amand, R. S. (2000), Wavelet- (2008), Damage detection of truss bridge joints using ar-
based approach for structural damage detection, Journal of tificial neural networks, Expert Systems with Applications,
Engineering Mechanics, 126(7), 677–83. 35(3), 1122–31.
Huang, Y., Beck, J. L., Wu, S. & Li, H. (2014), Robust Mu, H. & Yuen, K. (2016), Ground motion prediction equa-
Bayesian compressive sensing for signals in structural tion development by heterogeneous Bayesian learning,
health monitoring, Computer-Aided Civil & Infrastructure Computer-Aided Civil & Infrastructure Engineering, 31(10),
Engineering, 29(3), 160–79. 761–76.
Ioffe, S. & Szegedy, C. (2015), Batch normalization: acceler- Nair, V. & Hinton, G. E. (2010), Rectified linear units improve
ating deep network training by reducing internal covariate restricted Boltzmann machines, in Proceedings of the 27th
shift, Computer Science. International Conference on Machine Learning, (ICML-10),
Jiang, X. & Adeli, H. (2007), Pseudospectra, music, and Haifa, Israel, 807–14.
dynamic wavelet neural network for damage detection Newmark, N. M. (1959), A method of computation for struc-
of highrise buildings, International Journal for Numerical tural dynamics, Journal of the Engineering Mechanics Divi-
Methods in Engineering, 71(5), 606–629. sion, 85(1), 67–94.
Kingma, D. P. & Ba, J. (2014), Adam: a method for stochastic Pandey, A. K., Biswas, M. & Samman, M. M. (1991), Damage
optimization, arXiv preprint arXiv:1412.6980. detection from changes in curvature mode shapes, Journal
Krizhevsky, A., Sutskever, I. & Hinton, G. E. (2012), Im- of Sound Vibration, 145(2), 321–32.
agenet classification with deep convolutional neural net- Salawu, O. S. (1997), Detection of structural damage through
works, Advances in Neural Information Processing Systems, changes in frequency: a review, Engineering Structures,
25(2), 2012. 19(9), 718–23.
Law, S. S., Li, X. Y., Zhu, X. Q. & Chan, S. L. (2005), Struc- Shabbir, F. & Omenzetter, P. (2015), Particle swarm optimiza-
tural damage detection from wavelet packet sensitivity, En- tion with sequential niche technique for dynamic finite ele-
gineering Structures, 27(9), 1339–48. ment model updating, Computer-Aided Civil & Infrastruc-
Le, Q. V. (2011), Building high-level features using large scale ture Engineering, 30(5), 359–75.
unsupervised learning, in 2013 IEEE International Confer- Shi, Z. Y., Law, S. S. & Zhang, L. M. (2000), Structural dam-
ence on Acoustics, Speech and Signal Processing (ICASSP), age localization from modal strain energy change, Journal
Vancouver, BC, Canada, 8595–98. of Sound & Vibration, 126(12), 825–44.
LeCun, Y. (1989), Generalization and network design strate- Sietsma, J. & Dow, R. J. F. (1991), Creating artificial neural
gies in Pfeifer R., Schreter Z., Fogelman F., and Steels networks that generalize, Neural Networks, 4(1), 67–79.
L. (eds.),Connectionism in Perspective, Elsevier, Zurich, Simonyan, K., Vedaldi, A. & Zisserman, A. (2013), Deep in-
Switerland, 143–55. side convolutional networks: visualising image classification
LeCun, Y., Bengio, Y. & Hinton, G. (2015), Deep learning, models and saliency maps, arXiv preprint arXiv:1312.6034.
Nature, 521(7553), 436–44. Simonyan, K. & Zisserman, A. (2014), Very deep con-
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, volutional networks for large-scale image recognition,
R. E., Hubbard, W. & Jackel, L. D. (1989), Backpropaga- arXiv:1409.1556.
tion applied to handwritten zip code recognition, Neural Sohn, H. & Farrar, C. R. (2001), Damage diagnosis using time
Computation, 1(4), 541–51. series analysis of vibration signals, Smart Material Struc-
Lee, H., Pham, P. T., Yan, L. & Ng, A. Y. (2009), Unsuper- tures, 10(3), 446–51.
vised feature learning for audio classification using con- Szegedy, C., Liu, W., Jia, Y. Sermanet, P., Reed, S., Anguelov,
volutional deep belief networks, in Advances in Neural D. & Rabinovich, A. (2015), Going deeper with convolu-
Information Processing Systems 22: Conference on Neu- tions, in Proceedings of the IEEE Conference on Computer
ral Information Processing Systems 2009. Proceedings of Vision and Pattern Recognition, Boston, MA, 1–9.
A Meeting Held 7-10 December 2009, Vancouver, British Wu, R., Yan, S., Shan, Y., Dang, Q. & Sun, G. (2015),
Columbia, Canada, 1096–1104. Deep image: scaling up image recognition, arXiv preprint
Lee, J. J., Lee, J. W., Yi, J. H., Yun, C. B. & Jung, H. Y. (2005), arXiv:1501.02876, 7(8).
Neural networks-based damage detection for bridges con- Yang, J. N., Lei, Y., Lin, S. & Huang, N. E. (2004),
sidering errors in baseline finite element models, Journal of Hilbert-Huang based approach for structural damage de-
Sound & Vibration, 280(3-5), 555–78. tection, Journal of Engineering Mechanics, 130(1), 85–
Li, Z., Park, H. S. & Adeli, H. (2017), New method for modal 95.
identification of super high-rise building structures using Young-Jin, C. & Oral, B. (2015), Structural damage detec-
discretized synchrosqueezed wavelet and Hilbert trans- tion using modal strain energy and hybrid multiobjective
forms, The Structural Design of Tall and Special Buildings, optimization, Computer-Aided Civil and Infrastructure En-
26(3), https://doi.org/10.1002/tal.1312. gineering, 30(5), 347–58.
Maas, A. L., Hannun, A. Y. & Ng, A. Y. (2013), Rectifier non- Zhou, Y. T. & Chellappa, R. (1988), Computation of optical
linearities improve neural network acoustic models, in Pro- flow using a neural network, in IEEE International Confer-
ceedings ICML, 30(1). ence on Neural Networks, San Diego, CA, 71–78.

You might also like