Automatic Pixel-Level Multiple Damage Detection of Concretestructure Using Fully Convolutional Network
Automatic Pixel-Level Multiple Damage Detection of Concretestructure Using Fully Convolutional Network
12433
I N D U ST R I A L A P P L I CAT I O N
computer vision-based methods for damage detection primar- Machine (SVM) has been used for detecting cracks (Gavilán
ily adopt image process techniques (IPTs). IPTs can detect et al., 2011; O'Byrne, Schoefs, Ghosh, & Pakrashi, 2013;
some specific types of structural damage, such as concrete Zou, Cao, Li, Mao, & Wang, 2012), loose bolts (Cha, You,
cracks (Abdel-Qader, Abudayyeh, & Kelly, 2003; Fujita, & Choi, 2016), rusting (Chen, Shen, Lei, & Chang, 2012),
Mitani, & Hamamoto, 2006; Nishikawa, Yoshida, Sugiyama, and spalling (German et al., 2012). Besides, the restricted
& Fujino, 2012), concrete spalling (German, Brilakis, & Boltzmann machine has been applied to structural health
DesRoches, 2012), and potholes and cracks in asphalt pave- monitoring (Rafiei & Adeli, 2017, 2018; Rafiei, Khushefati,
ment (Koch & Brilakis, 2011; Wang, Zhang, Wang, Braham, Demirboga, & Adeli, 2017). These approaches first extract
& Qiu, 2018; Zhang et al., 2016a). To identify or represent features from images using IPTs, and then evaluate whether
the damages from images, IPTs are based on remarkable the extracted features indicate damages. However, the results
assumptions. For instance, cracks can be found accordingly of the aforementioned approaches have inevitably been
because crack pixel shapes are thinner than those of other affected by false hand-picked feature extraction of IPTs.
textural patterns and darker than that of the background Although many types of ML-based methods have been
(Cheng, Chen, Glazier, & Hu, 1999; Yamaguchi, Nakamura, developed and adapted to research and industrial fields, the
Saegusa, & Hashimoto, 2008). With respect to morphology, convolutional neural networks (CNNs) have been highlighted
the simplest method for detecting damages is to use the his- in image classification and object recognition (Krizhevsky,
togram and threshold (Kirschke & Velinsky, 1992; Oliveira Sutskever, & Hinton, 2012; Ren, He, Girshick, & Sun, 2017).
& Correia, 2009). To increase the robustness of damage A deep learning-based method was developed to detect
detection, general global transforms and local edge detection railway defects (Soukup & Huber-Mörk, 2014), road cracks
are steered into damage contour detection, such as that of the (Zhang et al., 2017; Zhang, Yang, Zhang, & Zhu, 2016b),
fast Haar transform (FHT), the fast Fourier transform (FFT), concrete cracks (Cha, Choi, & Büyüköztürk, 2017a; Zhao
Sobel and Canny edge detectors. According to Abdel-Qader & Li, 2017), and other damages (Lin, Nie, & Ma, 2017).
et al. (2003), the FHT method is relatively more reliable for Undeniably, these methods achieve excellent performance in
crack detection and its accuracy is potentially improvable. realistic situations when only one damage type is detected.
This study was followed by several proposed improvements However, the CNN method has to adopt a sliding window
(Huang & Xu, 2006; Li, Zhao, Du, Ru, & Zhang, 2017; Sinha technique to localize the detected damage (Cha et al., 2017a;
& Fieguth, 2006; Subirats, Dumoulin, Legeay, & Barba, Xue & Li, 2018). Subsequently, Cha et al. also proposed a
2006; Ying & Salari, 2010; Zalama, Gómez-García-Bermejo, Faster Region-based Convolutional Neural Network (Faster
Medina, & Llamas, 2014). Although the IPTs-based damage R-CNN) for multiple damages (Cha, Choi, Suh, Mahmoud-
detection method is effective and fast, it can detect only one khani, & Büyüköztürk, 2017b; Ren et al., 2017). However,
damage type and its robustness left much to be desired when the Faster R-CNN method still detects multiple damages
noises, primarily from lighting and distortion, substantially at the grid-cell level, which means that objective images
affected the results (Koziarski & Cyganek, 2017; Ziou & have to be cut into little patches and the included damage
Tabbone, 1998). One possible route for removing the noises characteristics cannot be detected directly.
is denoising techniques, such as total variation denoising To achieve higher detection performance for multiple dam-
(Rudin, Osher, & Fatemi, 1992). However, these techniques ages, Fully Convolutional Network (FCN)-based method is
receive little effect because the images are taken under used to detect damages at the pixel level. The FCN is an end-
extensively varying real-world situations. to-end, pixel-to-pixel convolutional network for semantic seg-
To improve the adaptability of IPTs-based methods in real- mentation (Chen, Papandreou, Kokkinos, Murphy, & Yuille,
world situations, machine learning (ML)-based approaches 2015; Long, Shelhamer, & Darrell, 2015). It predicts the class
are used to conduct feasible solutions to damage detection of each pixel by adopting a deconvolution layer to upsample
(Amezquita-Sanchez & Adeli, 2015; Butcher et al., 2014; the last convolutional layer. Therefore, the FCN can be treated
Jiang & Adeli, 2007). To detect specific damage, these as an extended CNN in which the prediction has been con-
approaches first extract damage features from images and verted from a class number to a semantic segmentation image.
then classify the extracted features using ML algorithms. The Compared with nonpixel methods, pixel-level method can
artificial neural network (ANN), a typical supervised ML, provide the class of each pixel of a damage image, and there-
was developed to detect concrete cracks and other structural fore the damage feature can be completely separated from the
damages (Lee & Lee, 2004; Moon & Kim, 2011; Moselhi background. According to the separated results, a series of
& Shehab-Eldeen, 2000). However, due to the hardware postprocessing techniques can be followed to extract specific
limitation of computational capability, the architectures damage features (e.g., a crack width). At present, the FCN
of ANNs were simple, which led to insufficient damage method has been used to detect concrete cracks (Ni, Zhang, &
detection. Therefore, researchers resorted to other ML-based Chen, 2018; Yang, Li, Yu, Luo, & Huang, 2018). In this study,
methods for better detection results. The Support Vector we proposed a novel FCN to detect four concrete damages.
14678667, 2019, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12433 by Shenzhen University, Wiley Online Library on [04/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
618 LI ET AL.
27 28
30 31
Maxpool1 Maxpool2
Conv2
between Maxpool1 and Conv2, where the concatenation lay- 1 0 0 0×1 1×0 1×1
ers are used to concatenate features. When concatenating fea- 1 0 1
Convolution
tures, the feature maps from previous layers are stacked by kernel
Output size = (Input size + 2 × Pad – Kernel size) / Stride + 1; 3 = (3 + 2 × 1 – 3 ) / 1 + 1
channel. That is to say, the number of output channel in
concatenation layer is sum of its input channels. Based on FIGURE 4 Convolutional layer example
the concatenation, the DenseNet-121 alleviates the vanishing-
gradient problem, strengthens feature propagation, encour-
ages feature reuse, and substantially reduces the number of between the inputs and kernels where each kernel slides on the
parameters. input with a specific step size. The sliding step size is defined
To achieve good performance of deconvolution, all the as a stride that is usually identical in the height and width
average pooling layers are changed into max pooling layers, direction. The computed convolution values of each kernel
and the global average pooling in layer 14 also is replaced by are added together with a bias to generate the results of each
a max pooling layer with kernel size of 2 × 2 and stride of kernel. These results are amalgamated to produce the spatial
2. The final classifier layer of the DenseNet-121 is discarded, output of the convolutional layer. To maintain the output size,
and fully connected layer is converted to convolution layer fol- zero-padding (Pad) is always adopted for the input. The out-
lowed by a dropout layer with dropout rate of 0.5. Convolu- put size of a convolutional layer depends on the input size,
tions with a 1 × 1 kernel and five output channels (layer 18, Pad, kernel size, and stride, which can be calculated using the
21, 24, 27, and 30) are appended to predict the scores for each equation shown in Figure 4.
of the classes (cracks, spalling, efflorescence, holes, and the After the convolutional layer, a nonlinear operation is
background) at each of the previous output locations, followed implemented by applying a nonlinear activation function to
by a deconvolution layer to upsample the previous outputs to the convolution results. In ANN, typical nonlinear activation
pixel-dense outputs. The FCN fuses predictions from the final functions including sigmoid, tanh and arctan are adopted, but
layer of DenseNet-121, all the pooling layers and first convo- their saturating nonlinearities slow computations. The recti-
lution layer. In each deconvolution layer, the previous output fied linear unit (ReLU) was introduced (Nair & Hinton, 2010)
is doubled in size through implementing an upsample with a as a nonlinear activation function. It performs well because
stride of 2. Finally, the output size of the FCN is same as the the gradients of the ReLU are always zero and one. To facil-
input size. itate much faster computation during the training process,
ReLU is selected in our study because of its simple deriva-
tive function.
2.2 Convolutional layer (Conv)
To reduce the parameters of subsequent layers and the prob-
In CNNs, a convolutional layer performs the convolution ability of overfitting, a downsampling operation in the max
operation using a set of kernels (filters) with learnable weights pooling (Maxpool) layer is applied to output of the ReLU non-
as shown in Figure 4. The depth of the convolutional ker- linear activation function. The output size of the max pool-
nels is always equal to the depth of the convolutional layer ing layer can be calculated according to the input size, ker-
inputs, and the height and width that are identical are gener- nel size, and stride (Output size = (Input size − kernel size)
ally smaller than the inputs. The convolution is implemented /stride + 1).
14678667, 2019, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12433 by Shenzhen University, Wiley Online Library on [04/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LI ET AL. 621
0.8 0.8
Training loss 0.7
Training loss
0.7
Validation loss
0.6 Validation loss 0.6
0.5 0.5
Loss
Loss
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 20,000 40,000 60,000 80,000 100,000 0 20,000 40,000 60,000 80,000 100,000
Iteration Iteration
(a) (b)
0.8
Training loss
0.7
Validation loss
0.6
0.5
Loss
0.4
0.3
0.2
0.1
0
0 20,000 40,000 60,000 80,000 100,000
Iteration
(c)
FIGURE 9 Training and validation losses over iterations: (a) learning rate = 5 × 10−11 , (b) learning rate = 1 × 10−10 , and (c) learning
−10
rate = 2 × 10
multiple damages are scarce because it is not easy to find a 3.3 Evaluation metrics of accuracy
concrete surface including multiple damages and the multiple
Many evaluation criteria have been proposed and are fre-
damages can be taken in one image, so all of them are included
quently used to assess the accuracy of any type of tech-
in testing set to evaluate the detection ability of the FCN.
nique for semantic segmentation. The most popular metrics
for semantic segmentation that are currently used to measure
how per-pixel labeling methods perform on a task are the pixel
3.2 Model initialization accuracy (PA), mean pixel accuracy (MPA), mean intersec-
When training the FCN, a strategy of model-based transfer tion over union (MIoU), and frequency weighted intersection
learning (Gao & Mosalam, 2018; Pan & Yang, 2010), rather over union (FWIoU) (Garcia-Garcia, Orts-Escolano, Oprea,
than training the model from scratch, is adopted to acceler- Villena-Martinez, & Garcia-Rodriguez, 2017). For explana-
ate and optimize the learning efficiency of the model. Follow- tory purposes, the following notation details are marked: for
ing this strategy, the weights and biases of the DenseNet-121 the 𝑘 + 1 classes, 𝑝𝑢𝑣 is the amount of pixels of class 𝑢 pre-
part in the FCN are initialized by pretrained DenseNet-121. dicted to belong to class 𝑣. That is, 𝑝𝑢𝑢 represents the number
Besides, all the weights of deconvolutional layers in FCN are of true positives, whereas 𝑝𝑢𝑣 and 𝑝𝑣𝑢 are false positives and
initialized using the “Bilinear” method. The learning rate of false negatives, respectively. According to the marked nota-
the weights and biases are twice and equal to the base learning tion, the PA, MPA, MIoU, and FWIoU, respectively, can be
rate, respectively, and the learning rate and the weight decay formulated as follows:
of the deconvolutional layers are set to zero to keep the coeffi-
cient values of bilinear interpolation unchanged during train- ∑𝑘
ing. Moreover, all the bias terms of the deconvolutional layers 𝑢=0 𝑝𝑢𝑢
PA = ∑𝑘 ∑𝑘 (1)
𝑣=0 𝑝𝑢𝑣
are fixed at zero and not trained. 𝑢=0
14678667, 2019, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12433 by Shenzhen University, Wiley Online Library on [04/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
624 LI ET AL.
100% 95%
98% 85%
96% 75%
MPA
PA
94% 65%
80%
95%
70%
FWIoU
60%
MIoU
90%
50%
FIGURE 10 (a) PAs, (b) MPAs, (c) MIoUs, and (d) FWIoUs over iterations
shows that the FCN will finally yield lower accuracy when
R (w pixels/mm2)
0.12 R = 2,144.7D-2.123
adopting the smaller and larger learning rates. As a result, Coefficient of determination = 0.9998
the right learning rate 1 × 10−10 between the smaller and the 0.08
larger learning rates is chosen as the best learning rate. In our
training, we attempted to use larger learning rates, but this led 0.04
to losses surging unavoidably and the trainings were difficult
0
to converge. 0 100 200 300 400 500 600
After the training process, the trained FCN model with D (mm)
1 × 10−10 learning rate achieves the highest PA of 98.61%, (b)
MPA of 91.59%, MIoU of 84.53%, and FWIoU of 97.34%.
Therefore, the trained model with 1 × 10−10 learning rate FIGURE 11 Calibration experiment for the relation between the
is used in testing process and for extracting damages from ratio and the distance: (a) apparatus, (b) fitted curve
images, which is detailed in Section 5.
(D) from the smartphone to the surface of the detected tar-
get, experiments are performed in a quasi-static sense. During
5 CA L I B R AT I NG E X P E R I M E N T the experiments, a target (a black solid circle with a diameter
A N D T E ST I NG T H E T R A I N E D A N D of 50 mm) is set, and the smartphone fixed on a linear guide
VA L I DAT E D F C N is moved from its initial (100 mm) to its maximum position
(550 mm) in steps of 10 mm and then back to complete one
To compute the areas of detected damages, a ratio between cycle, where the distance between the target and the smart-
the number of damage pixels (the pixel area) and the true phone is measured using a laser range finder. The experiment
area of the damage is necessary. Considering that the ratio is repeated five times, and the experimental results are curve-
will change over the distance of the smartphone camera from fitted in Figure 11b where the unit of R is 1 w pixels/mm2
the surface of the detected objects, the relation between the (10,000 pixels/mm2 ).
ratio and distance is calibrated under laboratory conditions. In Figure 11b, the fitted equation serves as the calibration
To examine the performance of the trained and validated FCN equation for relating the ratio R to the actual distance D from
from the previous section, extensive tests are conducted, and the target to the smartphone camera. Based on this equation,
the damages in the tested images are extracted according to the true area rather than the number of pixels can be computed
the predicted testing results and the calibrated equation. accordingly.
5.1 Calibrating the relation between the ratio 5.2 Testing the FCN and extracting damages
and the distance from images
As shown in Figure 11a, to calibrate the relation between the The remaining 550 images that are not used for training
ratio (R) of the pixel area and the true area and the distance and validation processes are used to test the trained FCN.
14678667, 2019, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12433 by Shenzhen University, Wiley Online Library on [04/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
626 LI ET AL.
3,000 10,000
y=x y=x
500 2,000
0 0
0 500 1,000 1,500 2,000 2,500 3,000 0 2,000 4,000 6,000 8,000 10,000
True area (mm2) True area (mm2)
(a) (b)
16,000
y=x 800
Prediction area (mm2)
8,000
400
4,000 200
0 0
0 4,000 8,000 12,000 16,000 0 200 400 600 800
True area (mm2) True area (mm2)
(c) (d)
FIGURE 12 True areas and prediction areas of testing images using trained FCN: (a) cracks, (b) spalling, (c) efflorescence, and (d) holes
FIGURE 13 Examples of predicted results and extracted cracks: (a) normal surface, (b) rough surface, (c) two cracks, and (d) shadowed image
14678667, 2019, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12433 by Shenzhen University, Wiley Online Library on [04/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LI ET AL. 627
FIGURE 14 Examples of predicted results and extracted spalling: (a) normal surface, (b) rough surface, (c) surface with crack-like
disturbance, and (d) surface with two spalling damages
FIGURE 15 Examples of predicted results and extracted efflorescence: (a) normal surface, (b) rough surface, (c) decentralized efflorescence,
and (d) unclear efflorescence
14678667, 2019, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12433 by Shenzhen University, Wiley Online Library on [04/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Examples of predicted results and extracted holes: (a) normal surface, (b) rough surface, (c) surface with stain, and (d) shadowed
LI ET AL.
The testing duration is recorded as 0.05 s for each image. results on the images from normal, rough surfaces with one
Figure 12 presents the true areas (damage areas in ground crack, and a surface with two cracks. However, in Figure 13d,
truths) and prediction areas (damage areas in prediction the black stain is treated as crack because it is linked by the
results) for the 550 testing images, where the areas are crack and its color is similar with the crack, although the crack
computed according to the recorded distance from the smart- is also rightly detected.
phone to the concrete surface and the calibrated equation in Images with concrete spalling are shown in Figure 14.
Figure 11b (where the ratio of the resizing images is also Because of large area of the concrete spalling damage,
considered). It is observed that the prediction areas PAFCN the spalling is successfully detected on normal and rough
essentially agree with the true areas TA in Figures 12a, surfaces. Images from surfaces with crack-like disturbance
12b, and 12d, but, in Figure 12c, the accuracy of detecting and two spalling damages also are used to test in Fig-
efflorescence damages is not as high as expected. One of ures 14c and 14d, and all the spalling damages are recognized
the possible reasons causing this phenomenon is that the correctly.
efflorescence damages are very similar with undamaged Figure 15 presents images with concrete efflorescence,
concrete surfaces. Especially when light intensity is very where the efflorescence damage is satisfactorily detected
strong, it is difficult to distinguish efflorescence damage from using the trained FCN. In Figures 15a and 15b, the edges of
the concrete surface with reflected light. Some of the tested efflorescence damage are wrongly identified as background,
images, which include the four types of concrete damages and in addition some efflorescence with very small areas are not
are taken under various conditions, are chosen and presented detected in Figures 15c and 15d.
in Figures 13–18, where the prediction results are the testing Images consisting of concrete holes under real-word con-
outputs of the trained and validated FCN. ditions are shown in Figure 16, where the concrete holes are
Figure 13 shows images of concrete cracks under various satisfactorily extracted. As mentioned, the holes smaller than
conditions. Figures 13a, 13b, and 13c show suitable prediction 1,000 pixels are neglected. As a result, some tiny holes are
14678667, 2019, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12433 by Shenzhen University, Wiley Online Library on [04/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
630 LI ET AL.
3,500 12,000
3,000 10,000
500 2,000
0 0
0 500 1,000 1,500 2,000 2,500 3,000 3,500 0 2,000 4,000 6,000 8,000 10,000 12,000
True area (mm2) True area (mm2)
(a) (b)
20,000 1,200
Prediction area (mm2)
12,000
600
8,000
300
4,000
0 0
0 4,000 8,000 12,000 16,000 20,000 0 200 400 600 800 1,000 1,200
True area (mm2) True area (mm2)
(c) (d)
FIGURE 19 True areas and prediction areas of testing images using trained SegNet: (a) cracks, (b) spalling, (c) efflorescence, and (d) holes
not detected because the FCN did not learn the features of by developing a bigger training database. Besides, compared
tiny holes. with the ground truth, the FCN provides a smoother boundary
Figure 17 shows predicted results of images with multiple of efflorescence damage, which leads to the incorrect detec-
damages and, similar to Figure 16, all the damages are satis- tion of damage edges.
factorily detected in Figures 17b and 17c. However, the FCN Despite minor errors, the results demonstrate the robust
fails to detect the hazy part of the efflorescence in Figure 17a, performance of our FCN-based method for the detection
and the thin part of a crack is not detected in Figure 17d, where of multiple damages of concrete structure. These minor
the width of the crack at the thin part is approximately 2 pixels errors may be caused by the small training database. There-
(about 0.2 mm). fore, a larger database with more images of concrete dam-
Typical examples of incorrectly predicted results and ages under various conditions will be generated to improve
extracted damages are shown in Figure 18. In Figure 18a, a the method's capacity and generalization in our future
small number of gray-white background pixels are classified studies.
as efflorescence damages because the color of the background To apply this trained model in practice, the shooting dis-
pixels is very similar to efflorescence. Image with thin crack tance D between the smart phone and concrete surface must
is presented in Figure 18b, where FCN fails to detect the thin be measured when taking damage images, and the measur-
crack pixels. The width in broken parts of the thin crack is ing process will be easily completed using a tape measure or
approximately 2 pixels. In Figure 18c, some spalling pixels are a laser range finder. Then the ratio R of the pixel area and
wrongly classified as hole. In Figure 18d, the background pix- the true area can be calculated according to the calibrated
els in top edge are incorrectly detected as crack pixels. A really equation in Figure 11b. After that, the obtained damage
bad prediction result is shown in Figure 18d, where the FCN images can be input to the trained FCN model to predict the
incorrectly predicts small holes as background and a contin- class and location of damages in the input images. Finally,
uous crack is disconnected. These errors can be minimized the damages in images can be extracted according to the
14678667, 2019, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12433 by Shenzhen University, Wiley Online Library on [04/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LI ET AL. 631
FIGURE 20 Comparison of prediction results for the proposed FCN and the SegNet
prediction results, and the damage areas are computed using TABLE 4 Comparison results of the FCN and SegNet
the prediction and the ratio R. Method PA(%) MPA(%) MIoU(%) FWIoU(%)
FCN 98.61 91.59 84.53 97.34
SegNet 98.62 98.16 83.82 96.84
6 COM PA R AT I V E ST U DY Note: Boldface values present the higher PA, MPA, MIoU, and FWIoU of two
methods.
To compare the performance of the proposed FCN-based
approach and a state-of-the-art semantic separation method,
the built database including concrete cracks, spalling, efflo-
rescence, and holes are used for training the SegNet model To choose the best initial learning rate for the SegNet,
(Badrinarayanan, Kendall, & Cipolla, 2017). The SegNet is three fixed learning rates (0.01, 0.001, and 0.0001) are set.
a deep convolutional encoder–decoder architecture for image The trained SegNet models are validated and saved every
segmentation that performs well on road scenes and indoor 1,000 iterations. With a GPU boosting the training processes,
scene segmentation tasks. To adapt the input size of the Seg- the recorded training time for the SegNet is approximately
Net, all the images and their labels in the training, validation, 11 hr. In the training process, the highest validation PA, MPA,
and testing sets are resized to 480 × 360 pixel resolution to MIoU, and FWIoU are 98.62%, 98.16%, 83.82%, and 96.84%
generate a new database for SegNet training. recorded with 0.001 learning rate. Comparison of the best
The SegNet is trained with a momentum of 0.9, and a results in the validation processes of the FCN and the SegNet
weight decay of 0.0005 for 100,000 iterations. Because of are listed in Table 4. It shows that although the PA and MPA of
the smaller input size, the batch size is set to four images to FCN are lower than SegNet, the FCN achieves higher MIoU
accelerate the convergence speed and add the robustness of and FWIoU than the SegNet. Therefore, our FCN presents
the trained model. good performance in concrete damage detection problem.
14678667, 2019, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12433 by Shenzhen University, Wiley Online Library on [04/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
632 LI ET AL.
Similar to the FCN testing, the trained SegNet model is 2,000, 200, and 550, respectively, among the labeled images.
tested using the testing set in the new database. The recorded The database as open-source can be downloaded from website
testing duration is 0.05 s for each image, which is shorter than (https://drive.google.com/open?id = 1Odq4jzZyj-urfxC25bv
FCN because of its smaller input size. Figure 19 shows the gtLLO1yqbC-89). A strategy of model-based transfer learn-
true areas and prediction areas for the 550 testing images using ing was used to initialize the weight and bias parameters of
the trained SegNet, where the areas are computed according the FCN. To find the best training model of the FCN, the best
to the recorded distance from the smartphone to the concrete learning rates were selected via trial-and-error. Based on the
surface and the calibrated equation in Figure 11b (where the best training, the FCN achieved the highest PA of 98.61%,
ratio of the resizing images is also considered). It can be MPA of 91.59%, MIoU of 84.53%, and FWIoU of 97.34%.
found that the SegNet provides good prediction for spalling The robustness of the trained FCN was tested using the 550
damages, but the prediction areas for crack, efflorescence, testing images not used for training and validation. In addi-
and hole damages are generally larger than their true areas. tion, the performance of the trained FCN was compared with
Figure 20 presents a comparison of prediction results for the SegNet-based method. The comparative study showed that
the proposed FCN and SegNet, where the PASegNet is the pre- the proposed FCN-based method can provide good detection
diction area of the SegNet. Figure 20a shows images with results of damages and has the significant advantage that the
crack damage, where the crack is separated into two fragments number of parameters of the FCN are less than the SegNet,
by FCN because the thin part of the crack is not detected. The which leads to a smaller size of trained model of the FCN.
SegNet recognizes the whole crack, but the prediction area The proposed FCN was strong at detecting concrete dam-
is larger than its true area. In Figure 20b, the FCN provides ages: cracks, spalling, efflorescence, and holes, and showed
satisfactory prediction for a crack, but the SegNet incorrectly low levels of noise. It is a significant advantage that the FCN
detects background as spalling and hole. For the detection of can learn damage features from much training data. How-
spalling and efflorescence damages in Figures 20c and 20d, ever, it also means that a FCN-based method requires a large
all the FCN and the SegNet provide good prediction for them. amount of images to train a robust FCN model. One common
In Figure 20e, a stain on background is wrongly recognized shortcoming of almost all vision-based approaches, includ-
as spalling damage by the SegNet, but the FCN successfully ing the methods of IPTs, CNNs, and FCNs, is the inability
detects the holes in image. to detect the depth of damages due to the nature of flattened
From the comparison of the FCN and SegNet, it can be photographic images.
concluded that the FCN has better performance detecting con- In future studies, more images with more types of con-
crete damages because the SegNet predicts larger damage crete damages under various conditions will be provided and
areas than true areas for crack, efflorescence, and hole dam- added to the existing database to increase the robustness of
ages. In addition, a crucial advantage of the FCN is it applies the proposed method, and comparative studies will also be
DenseNet-121 to extract damage features, which significantly performed.
reduces the number of parameters and size of trained model.
As a result, the small FCN model can be integrated into a ACK NOW L E D G M E N T
smart phone, and the smart phone equipped with our FCN
This work was supported by the National Key Research and
model can be used to detect concrete damages conveniently.
Development Programs of China during the Thirteenth Five-
This advantage will significantly add to the application of the
Year Plan Period (Grant 2016YFC0802002-03 and Grant
FCN-based method.
2016YFE0202400).
REFERENCES
7 CONC LU SI ON S
Abdel-Qader, I., Abudayyeh, O., & Kelly, M. (2003). Analysis of edge-
detection techniques for crack identification in bridges. Journal of
A damage detection method based on the FCN is proposed
Computing in Civil Engineering, 17(4), 255–263.
to detect four concrete damages: cracks, spalling, efflores-
Amezquita-Sanchez, J., & Adeli, H. (2015). Synchrosqueezed wavelet
cence, and holes. A smartphone was used to collect 1,375
transform-fractality model for locating, detecting, and quantifying
raw images with a 4,032 × 3,016 pixel resolution. For the damage in smart highrise building structures. Smart Materials and
collected images, the pixel locations of all four damage types Structures, 24(6), 065034.
and their corresponding labels were specified. The raw images Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). SegNet: A deep
and labeled images were resized to a 504 × 376 pixel res- convolutional encoder-decoder architecture for scene segmentation.
olution to reduce the computation of the training processes. IEEE Transactions on Pattern Analysis & Machine Intelligence,
Then, the horizontal flipping on the database was performed 39(12), 2481–2495.
for data augmentation. After data augmentation, the num- Butcher, J., Day, C., Austin, J., Haycock, P., Verstraeten, D., &
ber of images used for training, validation, and testing were Schrauwen, B. (2014). Defect detection in reinforced concrete using
14678667, 2019, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12433 by Shenzhen University, Wiley Online Library on [04/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LI ET AL. 633
random neural architectures. Computer-Aided Civil and Infrastruc- Huang, Y., & Xu, B. (2006). Automatic inspection of pavement cracking
ture Engineering, 29(3), 191–207. distress. Journal of Electronic Imaging, 15(1), 013017.
Cha, Y.-J., Choi, W., & Büyüköztürk, O. (2017a). Deep learning- Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., …
based crack damage detection using convolutional neural networks. Darrell, T. (2014). Caffe: Convolutional architecture for fast feature
Computer-Aided Civil and Infrastructure Engineering, 32(5), 361– embedding. arXiv preprint arXiv: 1408.5093.
378. Jiang, X., & Adeli, H. (2007). Pseudospectra, MUSIC, and dynamic
Cha, Y.-J., Choi, W., Suh, G., Mahmoudkhani, S., & Büyüköztürk, O. wavelet neural network for damage detection of highrise buildings.
(2017b). Autonomous structural visual inspection using region-based International Journal for Numerical Methods in Engineering, 71(5),
deep learning for detecting multiple damage types. Computer-Aided 606–629.
Civil and Infrastructure Engineering, 33(9), 731–747. Kirschke, K., & Velinsky, S. (1992). Histogram-based approach for auto-
Cha, Y.-J., You, K., & Choi, W. (2016). Vision-based detection of loos- mated pavement-crack sensing. Journal of Transportation Engineer-
ened bolts using the Hough transform and support vector machines. ing, 118(5), 700–710.
Automation in Construction, 71, 181–188. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classi-
Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. (2015). fication with deep convolutional neural networks. Proceedings of the
Semantic image segmentation with deep convolutional nets and fully Neural Information Processing Systems Conference, Stateline, NV,
connected CRFS. arXiv preprint arXiv: 1412.7062. December 3–8.
Chen, P., Shen, H., Lei, C., & Chang, L. (2012). Support-vector- Koch, C., & Brilakis, I. (2011). Pothole detection in asphalt pavement
machine-based method for automated steel bridge rust assessment. images. Advanced Engineering Informatics, 25(3), 507–515.
Automation in Construction, 23(5), 9–19. Koziarski, M., & Cyganek, B. (2017). Image recognition with deep neu-
Cheng, H., Chen, J., Glazier, C., & Hu, Y. (1999). Novel approach to ral networks in presence of noise—Dealing with and taking advan-
pavement cracking detection based on fuzzy set theory. Journal of tage of distortions. Integrated Computer Aided Engineering, 24(4),
Computing in Civil Engineering, 13(4), 270–280. 337–350.
Fu, J., Liu, J., Wang, Y., & Lu, H. (2017). Stacked deconvolutional Lee, B., & Lee, H. (2004). Position-invariant neural network for digital
network for semantic segmentation. arXiv preprint arXiv: 1708. pavement crack analysis. Computer-Aided Civil and Infrastructure
04943. Engineering, 19(2), 105–118.
Fujita, Y., Mitani, Y., & Hamamoto, Y. (2006). A method for crack detec- Li, G., Zhao, X., Du, K., Ru, F., & Zhang, Y. (2017). Recognition and
tion on a concrete structure. Proceedings of the International Confer- evaluation of bridge cracks with modified active contour model and
ence on Pattern Recognition, pp. 901–904. greedy search-based support vector machine. Automation in Con-
Gao, Y., & Mosalam, K. (2018). Deep transfer learning for image-based struction, 78, Supplement C, 51–61.
structural damage recognition. Computer-Aided Civil and Infrastruc- Lin, Y., Nie, Z., & Ma, H. (2017). Structural damage detection with
ture Engineering, 33(9), 748–768. automatic feature-extraction through deep learning. Computer-Aided
Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., Civil and Infrastructure Engineering, 32(12), 1025–1046.
& Garcia-Rodriguez, J. (2017). A review on deep learning tech- Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional net-
niques applied to semantic segmentation. arXiv preprint arXiv: 1704. works for semantic segmentation. Proceedings of the IEEE Confer-
06857. ence on Computer Vision and Pattern Recognition, pp. 3431–3440.
Gavilán, M., Balcones, D., Marcos, O., Llorca, D. F., Sotelo, M. A.,
Moon, H., & Kim, J. (2011). Intelligent crack detecting algorithm on the
Parra, I., … Amírola, A. (2011). Adaptive road crack detection sys-
concrete crack image using neural network. Proceedings of the 28th
tem by pavement classification. Sensors, 11(10), 9628–9657.
ISARC, pp. 1461–1467.
German, S., Brilakis, I., & DesRoches, R. (2012). Rapid entropy-based
Moselhi, O., & Shehab-Eldeen, T. (2000). Classification of defects in
detection and properties measurement of concrete spalling with
sewer pipes using neural networks. Journal of Infrastructure Systems,
machine vision for post-earthquake safety assessments. Advanced
6(3), 97–104.
Engineering Informatics, 26(4), 846–858.
Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted
Graybeal, B. A., Phares, B. M., Rolander, D. D., Moore, M., & Washer,
Boltzmann machines. Proceedings of the 27th International Confer-
G. (2002). Visual inspection of highway bridges. Journal of Nonde-
ence on Machine Learning (ICML-10), Haifa, Israel, June 21–24, pp.
structive Evaluation, 21(3), 67–83.
807–814.
He, K., Zhang, X., Ren, S., & Sun, J. (2014). Spatial pyramid pooling in
Ni, F., Zhang, J., & Chen, Z. (2018). Pixel-level crack delineation in
deep convolutional networks for visual recognition. Proceedings of
images with convolutional feature fusion. Structure Control and
the 13th European Conference on Computer Vision, Zurich, Switzer-
Health Monitoring, e2286, 1–18.
land, September 6–12, pp. 346–361.
Nishikawa, T., Yoshida, J., Sugiyama, T., & Fujino, Y. (2012). Concrete
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for
crack detection by multiple sequential image filtering. Computer-
image recognition. Proceedings of the IEEE Conference on Computer
Aided Civil and Infrastructure Engineering, 27(1), 29–47.
Vision and Pattern Recognition, pp. 770–778.
O'Byrne, M., Schoefs, F., Ghosh, B., & Pakrashi, V. (2013). Tex-
Huang, G., Liu, Z., van der Maaten, L., & Weinberger, K. Q. (2017).
ture analysis based damage detection of ageing infrastructural ele-
Densely connected convolutional networks. Proceedings of the IEEE
ments. Computer-Aided Civil and Infrastructure Engineering, 28(3),
Conference on Computer Vision and Pattern Recognition.
162–177.
14678667, 2019, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12433 by Shenzhen University, Wiley Online Library on [04/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
634 LI ET AL.
Oliveira, H., & Correia, P. (2009). Automatic road crack segmentation Wilson, D., & Martinez, T. (2001). The need for small learning rates on
using entropy and image dynamic thresholding. Proceedings of the large problems. Proceedings of the International Joint Conference on
IEEE Signal Processing Conference, pp. 622–626. Neural Networks, pp. 115–119.
Pan, S., & Yang, Q. (2010). A survey on transfer learning. IEEE Xue, Y., & Li, Y. (2018). A fast detection method via region-based
Transactions on Knowledge and Data Engineering, 22(10), 1345– fully convolutional neural networks for shield tunnel lining defects.
1359. Computer-Aided Civil and Infrastructure Engineering, 33(8),
Rafiei, M., & Adeli, H. (2017). A novel machine learning-based algo- 638–654.
rithm to detect damage in high-rise building structures. Structural Yamaguchi, T., Nakamura, S., Saegusa, R., & Hashimoto, S. (2008).
Design of Tall and Special Buildings, 26(18), e1400. Image-based crack detection for real concrete surfaces. IEEJ Trans-
Rafiei, M., & Adeli, H. (2018). A novel unsupervised deep learning actions on Electrical and Electronic Engineering, 3(1), 128–135.
model for global and local health condition assessment of structures. Yang, X., Li, H., Yu, Y., Luo, X., & Huang, T. (2018). Automatic pixel-
Engineering Structures, 156, 598–607. level crack detection and measurement using fully convolutional net-
Rafiei, M., Khushefati, W., Demirboga, R., & Adeli, H. (2017). Super- work. Computer-Aided Civil and Infrastructure Engineering, 33(12),
vised deep restricted Boltzmann machine for estimation of con- 1090–1109.
crete compressive strength. ACI Materials Journal, 114(2), 237– Ying, L., & Salari, E. (2010). Beamlet transform-based technique for
244. pavement crack detection and classification. Computer-Aided Civil
Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards and Infrastructure Engineering, 25(8), 572–580.
real-time object detection with region proposal networks. IEEE Zalama, E., Gómez-García-Bermejo, J., Medina, R., & Llamas, J. (2014).
Transactions on Pattern Analysis & Machine Intelligence, 39(6), Road crack detection using visual features extracted by Gabor fil-
1137–1149. ters. Computer-Aided Civil and Infrastructure Engineering, 29(5),
Rudin, L., Osher, S., & Fatemi, E. (1992). Nonlinear total variation based 342–358.
noise removal algorithms. Physica D: Nonlinear Phenomena, 60 Zhang, A., Wang, K., Li, B., Yang, E., Dai, X., Peng, Y., … Chen,
(1-4), 259–268. C. (2017). Automated pixel-level pavement crack detection on 3D
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional net- asphalt surfaces using a deep-learning network. Computer-Aided
works for large-scale image recognition. Proceedings of the Interna- Civil and Infrastructure Engineering, 32(10), 805–819.
tional Conference on Learning Representations (ICLR), San Diego, Zhang, D., Li, Q., Chen, Y., Cao, M., He, L., & Zhang, B. (2016a). An
CA, May 7–9. efficient and reliable coarse-to-fine approach for asphalt pavement
Sinha, S., & Fieguth, P. (2006). Automated detection of cracks in buried crack detection. Image and Vision Computing, 57, 130–146.
concrete pipe images. Automation in Construction, 15(1), 58–72. Zhang, L., Yang, F., Zhang, Y., & Zhu, Y. (2016b). Road crack detection
Soukup, D., & Huber-Mörk, R. (2014). Convolutional neural networks using deep convolutional neural network. Proceedings of the IEEE
for steel surface defect detection from photometric stereo images. International Conference on Image Processing, pp. 3708–3712.
Proceedings of International Symposium on Visual Computing, Zhao, X., & Li, S. (2017). A method of crack detection based on convolu-
pp. 668–677. tional neural networks. Proceedings of the 11th International Work-
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdi- shop on Structural Health Monitoring, pp. 978–984.
nov, R. (2014). Dropout: A simple way to prevent neural networks Ziou, D., & Tabbone, S. (1998). Edge detection techniques—An
from overfitting. Journal of Machine Learning Research, 15(1), overview. International Journal of Pattern Recognition and Image
1929–1958. Analysis, 8(4), 537–559.
Subirats, P., Dumoulin, J., Legeay, V., & Barba, D. (2006). Automation Zou, Q., Cao, Y., Li, Q., Mao, Q., & Wang, S. (2012). Cracktree: Auto-
of pavement surface crack detection using the continuous wavelet matic crack detection from pavement images. Pattern Recognition
transform. Proceedings of the IEEE Image Processing Conference, Letters, 33(3), 227–238.
pp. 3037–3040.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., …
Rabinovich, A. (2015). Going deeper with convolutions. Proceedings How to cite this article: Li S, Zhao X, Zhou
of the IEEE Conference on Computer Vision and Pattern Recogni-
G. Automatic pixel-level multiple damage detec-
tion, pp. 1–9.
tion of concrete structure using fully convolutional
Wang, W., Zhang, A., Wang, K., Braham, A., & Qiu, S. (2018). Pave-
network. Comput Aided Civ Inf. 2019;34:616–634.
ment crack width measurement based on Laplace's equation for
continuity and unambiguity. Computer-Aided Civil and Infrastruc-
https://doi.org/10.1111/mice.12433
ture Engineering, 33(12), 110–123.