Automated Detection of Underwater Cracks Based On Fusion - 2024 - Engineering S
Automated Detection of Underwater Cracks Based On Fusion - 2024 - Engineering S
Engineering Structures
journal homepage: www.elsevier.com/locate/engstruct
A R T I C L E I N F O A B S T R A C T
Keywords: This paper presents a novel method for the underwater crack detection by fuse the optical and texture infor
Underwater crack detection mation. Underwater crack images were influenced by the harsh underwater environment, lead to making crack
Optical images detection challenging, especially difficult to capturing the details of cracks accurately. To improve detection
Texture information
accuracy, it was necessary to obtain more feature information related to cracks. Therefore, this paper proposes a
Data fusion
dual-input branch semantic segmentation model to achieve the fusion of optical and texture information, and
Semantic segmentation
introduces the Convolutional Block Attention Module (CBAM) module to enhance the performance of the se
mantic segmentation model. The optimal network architecture was determined by selecting the backbone
network and optimizer, and a custom Tversky loss function was introduced to make the semantic segmentation
model pay more attention to the crack area. The results show that the detection accuracy, IoU, and F1-Score can
reach 96.07 %, 0.95, and 0.96 respectively. Through multiple comparative experiments, the effectiveness of the
proposed method was validated, particularly compared to the non-fused texture information method, where the
accuracy, IoU, and F1-Score were increased by 3.30 %, 6.74 %, and 7.88 % respectively. Finally, by visualizing
the variation pattern of cracks in the detection model, the operational mechanism of the proposed method was
explained. This confirms that the proposed method significantly improves the accuracy of underwater crack
detection, and provides a novel approach for the underwater defect detection.
* Corresponding author.
E-mail address: [email protected] (A. Liu).
https://doi.org/10.1016/j.engstruct.2024.118515
Received 18 April 2024; Received in revised form 11 June 2024; Accepted 24 June 2024
Available online 28 June 2024
0141-0296/© 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
S. Teng et al. Engineering Structures 315 (2024) 118515
low-quality images pose challenges to the crack feature extraction pro in small sample scenarios through a two-stage transfer learning strategy,
cess, resulting in high false positive or false negative rates in traditional achieving satisfactory segmentation results. The above studies confirm
image processing techniques [12]. Additionally, due to the complex that the powerful feature extraction capabilities of deep learning pro
computational processes, traditional digital image processing methods vide a more intelligent approach for the underwater crack detection.
suffer from drawbacks such as slow detection speed and high levels of The aforementioned studies still rely solely on optical images, which
manual intervention. are affected by factors such as lighting and turbidity, these factors are
The development of deep learning technology has provided an op the main reasons affecting detection accuracy. To improve detection
portunity to overcome the limitations of traditional image processing accuracy, it is necessary to obtain more feature information related to
techniques [13,14]. Deep learning algorithms can automatically extract defects. Optical images consist of three channels: R (Red), G (Green),
image features to achieve the infrastructure crack detection [15,16]. and B (Blue). The longer wavelength of red light is rapidly absorbed in
Convolutional neural networks (CNNs) are the most representative deep water, causing the R channel to lose contrast and detail, resulting in
learning algorithms, which have achieved remarkable results in the color distortion or unreality in the image. The image in the G channel
crack classification [17], object detection [18], semantic segmentation usually retains the details and contrast of underwater scenes well
[19], and other fields, providing a potential solution for underwater because the wavelength of green light propagates relatively well in
crack detection. Ma et al. [20] utilized underwater images obtained by water. The image in the B channel emphasizes underwater colors and
underwater robots to train YOLO-v3 networks, and achieved relatively details because the wavelength of blue light can penetrate further in
accurate crack segmentation results by combining the grayscale fluctu water and maintain visibility in deep water. Therefore, this paper will
ation analysis, and calculated the crack width based on the segmentation reduce the influence of the underwater environment by eliminating the
results. To improve segmentation efficiency, Cao et al. [21] achieved the red channel. Meanwhile, the image texture provides rich information
stitching of underwater crack images through an image stitching algo about surface details and structures in the image, which helps to
rithm and implemented crack segmentation using the graph CNNs. To distinguish different objects or scenes. This paper will fuse the texture
overcome the problem of insufficient underwater samples, Li et al. [22] information with the green and blue channel information of optical
enhanced the segmentation accuracy of semantic segmentation models images to enrich the features of cracks, thereby improving the accuracy
2
S. Teng et al. Engineering Structures 315 (2024) 118515
of crack detection. and l iterate over the window, μi,j represents the mean value of the pixel
values within the window, n represents the size of the window.
2. Method The local standard deviation calculated by the stdfilt algorithm was
used as the pixel value of the output image, forming a new image, as
The main research ideas of this paper were as follows: extracting the shown in Fig. 2. The new image obtained from the local standard de
optical information from the G and B channels of underwater images, as viation computed by the stdfilt algorithm can better display the edges of
well as texture information calculated through stdfilt algorithm, to serve objects. After the computation by the stdfilt algorithm, each pixel value
as inputs for the deep learning model. Through a custom-designed deep in the output image represents the standard deviation of the local pixel
learning model, the information fusion was achieved. Additionally, values at its corresponding position, which helps highlight the detail
some advanced improvement strategies were proposed to enhance the information in the image. In this paper, the local standard deviation of
accuracy of crack detection. The implementation process was illustrated underwater crack images was calculated by the stdfilt algorithm as
in Fig. 1. texture information, and this texture information was combined with
the G and B channels of optical images as inputs to the deep learning
2.1. Texture information acquisition model to achieve underwater crack detection.
In this paper, the stdfilt algorithm was utilized to obtain texture in
2.2. Semantic segmentation network
formation from underwater crack images. The stdfilt algorithm was a
filtering technique used to compute the local standard deviation of each
This paper adopts the DeepLabv3 + model [23] as the base frame
pixel in an image. The standard deviation measures the dispersion of a
work and improves it to achieve high-precision detection of underwater
set of data, so the stdfilt algorithm can be used to highlight details such
crack images. The DeepLabv3 + aims to address the lack of object
as texture and edges in an image. It has wide applications in image
boundary details and contextual information in the semantic segmen
processing, including edge detection, texture analysis, and feature
tation task. It was an evolution of the DeepLab series of models, which
extraction. The calculation method was as follows (Eq. 1): for each pixel
further enhances the performance of semantic segmentation by intro
position (i, j), the algorithm computes the standard deviation within a
ducing techniques such as dilated convolution, multi-scale feature
chosen window. Taking the current pixel position as the center, the al
fusion, and decoder modules. Its basic architecture was illustrated in
gorithm extracts the pixel values within the window and then calculates
Fig. 3.
the standard deviation of these pixel values. The equation for calculating
The DeepLabv3 + employs dilated convolutions to expand the
the standard deviation was as follows:
receptive field of CNNs, thereby increasing the capability to capture
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
̅
1 ∑ n ∑ n (
n+1 n+1
) contextual information from input images [24]. Dilated convolutions
2
σ (i, j) = (I i + k − , j + l − − μ i,j ) (1) introduce dilation rates in the convolutional kernels, allowing for an
n2 k=1 l=1 2 2
increased receptive field without increasing the number of parameters.
Underwater images often suffer from issues such as granularity and
where, σ (i, j) represents the local standard deviation at position (i, j),
( ) blurriness. Dilated convolutions help in extending the receptive field,
I i +k − n+1
2 , j +l − n+1
2 represents the pixel value within the window, k better capturing contextual information from the image, which aids in
3
S. Teng et al. Engineering Structures 315 (2024) 118515
improving the accuracy of underwater crack detection. (Branch A and Branch B). The G and B channels of optical images were
To fully utilize information at different scales within images, the input into Branch A (the raw DeepLabv3 + network was used to extract
DeepLabv3 + employs a strategy called multi-scale feature fusion. By features from optical images), while the texture information was input
introducing dilated convolution layers with different dilation rates into into Branch B (a lightweight CNN was used to extract features from
the network, along with the use of the Atrous Spatial Pyramid Pooling texture information). Finally, the feature fusion was performed after the
(ASPP) module, the DeepLabv3 + was capable of simultaneously ‘Concat’ layer. The optical image provides contrast and color informa
leveraging features at both global and local scales [25]. Underwater tion of the cracks, while the texture information provides more details
cracks may exhibit different scales and shapes, thus necessitating a and edge information. Fusion of these two types of information enhances
model capable of effectively capturing information at various scales. the image’s contrast, making it easier to detect the presence of under
Through the multi-scale feature fusion technique, the DeepLabv3 + can water cracks.
effectively utilize features from both global and local scales, thereby The CBAM was an attention mechanism used to enhance the per
enhancing its ability to detect underwater cracks. formance of CNNs [27], as depicted in Fig. 5. It dynamically adjusts the
The DeepLabv3 + also introduces a decoder module, which was used weights of feature maps by introducing two mechanisms: channel
to restore image resolution and refine segmentation results. The decoder attention and spatial attention, enabling the network to focus more on
module generates the final semantic segmentation result through up- crucial features and improving both feature representation and network
sampling and feature-level fusion, contributing to improved segmenta generalization capabilities, as shown in Fig. 5a. The channel attention
tion accuracy and detail [26]. Underwater images were often affected by mechanism weights the feature maps along the channel dimension to
factors such as water quality and lighting, resulting in poor image dynamically adjust the importance of different channels, enabling the
quality. By incorporating the decoder module, the DeepLabv3 + can network to better capture important features, as illustrated in Fig. 5b.
restore image resolution and refine segmentation results, thereby The output Mc (F) of the channel attention module was computed using
enhancing the accuracy of underwater crack detection. Eq. 2. On the other hand, the spatial attention mechanism weights the
Building upon the basic architecture of DeepLabv3 + , this paper feature maps along the spatial dimension to enhance the network’s
proposes a dual-input semantic segmentation model to fuse the optical attention to different regions, effectively improving the network’s rep
and texture information. Additionally, it introduces the CBAM module resentation capabilities, as shown in Fig. 5c. The output Ms (F) of the
to enhance the crack feature extraction performance of semantic seg spatial attention module was calculated using Eq. 3. In this paper, the
mentation model, namely FOT-DeepLabv3 + . This model was depicted CBAM was embedded after the ASPP module with the aim of enhancing
in Fig. 4. the network’s perception and discrimination capabilities of crack fea
The improvements in this paper mainly include the following two tures through spatial and channel attention mechanisms. The CBAM can
aspects: help networks adaptively weight important features on each feature
The improved network model consists of dual-input branches map, thereby further improving the representation ability of features.
4
S. Teng et al. Engineering Structures 315 (2024) 118515
This, in turn, helps improve the accuracy and robustness of underwater improves the model’s performance and robustness. The deep structure of
crack detection. ResNet helps capture richer image features, thereby enhancing detection
Overall, the contribution of this paper can be reflected in the fusion accuracy. The mobilenetv2 was a lightweight CNN designed for mobile
of optical and texture information through the construction of a se devices, characterized by its smaller model size and lower computa
mantic segmentation model with dual-input branches, and the intro tional cost [29]. In resource-constrained underwater environments,
duction of module CBAM to enhance the model’s ability to extract using the MobileNet series as the backbone feature extraction network
underwater crack features. can accelerate the model’s inference speed and reduce hardware
resource requirements, thus enabling real-time underwater crack
Mc (F) = σ (MLP(AvgPool(F) ) + MLP(MaxPool(F))) (2)
detection.
where, F represents the input feature map,AvgPool and MaxPool denote The choice of network training optimizer also significantly impacts
global average pooling and max pooling operations, respectively, MLP detection results. This paper will compare the effects of different opti
represents a multi-layer perceptron, σ denotes the Sigmoid activation mizers (adaptive moment estimation (adam) [30], stochastic gradient
function. descent with momentum (sgdm) [31], and root-mean-square propaga
tion (rmsprop) [32]) on the detection results.
Ms (F) = σ (f 7×7 ([AvgPool(F); MaxPool(F)]) (3) The adam optimizer was an adaptive learning rate optimization al
gorithm known for its fast convergence speed and good generalization
where, f 7×7 denotes a 7 × 7 convolutional operation, while performance. In underwater crack detection, the adam optimizer often
[AvgPool(F); MaxPool(F)] represents concatenation of the results of enables the model to converge faster to local optima and was relatively
average pooling and max pooling along the channel axis. easy to tune, making it suitable for most scenarios, Eqs. 9–10.
The sgdm accelerates the optimization process by introducing mo
2.3. Architecture optimization mentum when updating parameters, especially in directions with large
curvature. This helps avoid getting stuck in local minima or saddle
To achieve optimal detection performance, this paper will compare points and allows faster convergence to the global or local optima. In
the effects of different backbone feature extraction networks (resnet18, tasks such as underwater crack detection, sgdm, as an improvement over
resnet50, and mobilenetv2) on the detection results to determine the stochastic gradient descent, can speed up the model training process,
best image feature extraction strategy. The potential advantages of enhance the convergence speed and stability, and facilitate more
ResNet were as follows: the ResNet series consists of deep residual effective crack detection, Eqs. 11–12.
networks, which have excellent feature learning capabilities and The rmsprop optimizer was another adaptive learning rate optimi
parameter efficiency [28]. In underwater crack detection, using the zation algorithm suitable for optimizing non-stationary objective func
ResNet series as the backbone feature extraction network typically tions. In underwater crack detection, if the dataset exhibits certain
5
S. Teng et al. Engineering Structures 315 (2024) 118515
6
S. Teng et al. Engineering Structures 315 (2024) 118515
θt+1 = θt − vt (10) In this paper, four performance indicators were used to evaluate the
detection performance of the semantic segmentation model [33],
where, θ represents parameters, g denotes gradient, α is the learning
including: (a) Accuracy, which calculates the accuracy of crack detec
rate, γ represents the momentum coefficient, v denotes velocity
tion, Eq. 14; (b) F1-Score, a comprehensive metric that combines Pre
(momentum).
cision and Recall, Eq. 15; (c) Intersection over Union (IoU), which
vt = β⋅vt− 1 + (1 − β)⋅g2t (11) assesses the overlap ratio between predicted results (Ap) and ground
truth labels (Ar) pixels, Eq. 16; (d) Frames Per Second (FPS) to evaluate
θt+1 = θt − √̅̅̅̅
α
⋅gt (12) the computational efficiency of the semantic segmentation model.
vt + ϵ
Fig. 8. Detection results of the backbone network and optimizer being mobilenetv2-rmsprop.
7
S. Teng et al. Engineering Structures 315 (2024) 118515
( )
1 TP TN stable power supply guarantee system stability, providing reliable sup
Accuracy = + (14)
2 TP + FP FN + TN port for training deep learning models of this paper.
8
S. Teng et al. Engineering Structures 315 (2024) 118515
and 0.95, respectively. Some examples of the detection results were interval of 0.1). The results, as shown in Fig. 10, indicate that the best
shown in Fig. 8, where the detected crack regions align well with the performance was achieved when β was 0.9, with an accuracy, IoU, and
labels, and the lower resolution and some non-crack linear features do F1-Score of 96.07 %, 0.95, and 0.96, respectively. When β was small, the
not affect the detection results, especially for small and narrow cracks, detection performance was poor. The changes in α and β have almost no
the FOT-DeepLabv3 + still has high detection performance. Meanwhile, impact on the FPS (Figs. 9d and 10d).
the detection results also confirmed that good detection results can still In summary, by selecting a lightweight backbone network and an
be achieved when cracks were in an extremely blurry environment. optimizer suitable for training underwater crack images, the FOT-
Moreover, due to the lightweight characteristic of mobilenetv2, it ex DeepLabv3 + can achieve better detection performance for underwater
hibits significant computational efficiency advantages (Fig. 7d), with an cracks. Furthermore, adjusting the loss function enables the FOT-
FPS close to 40 under current hardware conditions, offering potential DeepLabv3 + to adapt to underwater crack detection scenarios. Specif
real-time detection benefits. ically, reducing the α value and increasing the β value of the Tversky loss
This paper utilizes the Tversky loss as the training loss function. The function enhance feature extraction capability of FOT-DeepLabv3 + for
above research results were based on α and β both equal to 0.5. Ac underwater cracks and suppress the influence of non-cracks on the
cording to Section 2.2, α and β have different effects on training effec detection results.
tiveness. This paper will explore the impact of different values on the
detection results. (1) With β fixed at 0.5, α ranges from 0.1 to 0.9 (with
an interval of 0.1). The results were shown in Fig. 9, indicate that the 3.2. Comparative Studies
best performance was achieved when α was 0.1, with an accuracy, IoU,
and F1-Score of 96.13 %, 0.93, and 0.95, respectively. The overall trend To validate the effectiveness of the proposed method, several
suggests that as α increases, poorer detection results were more likely. comparative experiments were conducted in this study. Firstly, the
Fig. 9d demonstrates that the value of α does not affect computational detection performance of non-fused texture information was compared.
efficiency. (2) With α fixed at 0.1, β ranges from 0.1 to 0.9 (with an Fig. 11 illustrates the variations in detection results between non-fused
and fused texture information. The non-fused texture information ach
ieved an accuracy, IoU, and F1-Score of 93.00 %, 0.89, and 0.89,
respectively, while the fused texture information (our method) achieved
an accuracy, IoU, and F1-Score of 96.07 %, 0.95, and 0.96, respectively.
Thus, the strategy of fusing texture information proposed in this study
increased accuracy, IoU, and F1-Score by 3.30 %, 6.74 %, and 7.88 %,
respectively. Fig. 12 presents partial examples of the detection results
for non-fused and fused texture information. It can be observed from the
images that the detection results of non-fused texture information
contain many noise points, and some non-crack linear objects were
incorrectly detected as cracks, for blurry images, some cracks may have
many details lost, especially for small crack targets. In contrast, the fused
texture information can address these issues, resulting in detection re
sults that closely match the labels. This confirms that the texture in
Fig. 11. Detection results of non-fused texture information. formation can provide more robust crack features for the FOT-
9
S. Teng et al. Engineering Structures 315 (2024) 118515
Table 3. The results demonstrate that the Tversky loss function used in
Table 2 this study can achieve the best detection results. This confirms that
Comparison results with other semantic segmentation algorithms. adjusting the α and β values of the Tversky loss function enables FOT-
Evaluation U- SegNet FCN Our method (FOT- DeepLabv3 + to focus more on the crack regions.
indicators Net (8 s) DeepLabv3 +)
10
S. Teng et al. Engineering Structures 315 (2024) 118515
11
S. Teng et al. Engineering Structures 315 (2024) 118515
Fig. 17. Areas of focus in Layer L2. Fig. 19. Areas of focus in Layer L3.
4. Conclusion
analysis reveals the changing patterns of crack features within the se
This paper proposes a crack detection method that constructing a mantic segmentation model.
specialized semantic segmentation network to fuse the optical and The following conclusions were drawn from the research results:
texture information of underwater images, achieving high-precision (1) The underwater crack detection method proposed in this paper,
segmentation of underwater crack images. The texture information which integrates texture information, has achieved satisfactory detec
contains the boundary information of cracks, which was input into the tion results. Compared with non-fused texture information methods, the
semantic segmentation model to guide it to obtain more accurate crack detection accuracy, IoU, and F1-Score have been improved by 3.30 %,
detection results. Comparative studies confirm the excellent perfor 6.74 %, and 7.88 %, respectively.
mance of the proposed method. Finally, the feature visualization (2) Adjusting the α and β parameters of the Tversky loss function can
alter the proposed semantic segmentation model’s focus on underwater
12
S. Teng et al. Engineering Structures 315 (2024) 118515
cracks, with the optimal values of α and β being 0.1 and 0.9 respectively, methods. Appl Acoust 2016;103:110–21. https://doi.org/10.1016/j.
apacoust.2015.10.013.
with which with the accuracy, IoU, and F1-Score of 96.07 %, 0.95, and
[8] Lei M, Liu L, Shi C, Tan Y, Lin Y, Wang W. A novel tunnel-lining crack recognition
0.96 respectively. system based on digital image technology. Tunn Undergr Space Technol 2021;108:
(3) By comparing with other popular semantic segmentation models, 103724. https://doi.org/10.1016/j.tust.2020.103724.
it has been confirmed that the outstanding performance of the proposed [9] Shi P, Fan X, Ni J, Khan Z, Li M. A novel underwater dam crack detection and
classification approach based on sonar images. PloS One 2017;12:e0179627.
method, the detection accuracy and efficiency have the best https://doi.org/10.1371/journal.pone.0179627.
performance. [10] Shi P, Fan X, Ni J, Wang G. A detection and classification approach for underwater
(4) The feature visualization confirms that the shallow layers of the dam cracks. Struct Health Monit 2016;15:541–54. https://doi.org/10.1177/
1475921716651039.
semantic segmentation model primarily extract low-level features (such [11] Mucolli L, Krupinski S, Maurelli F, Mehdi SA, Mazhar S. Detecting cracks in
as edges, colors, and spots), while deeper layers can extract shape and underwater concrete structures: an unsupervised learning approach based on local
spatial distribution features. Combining features from both shallow and feature clustering. Oceans 2019 MTS/IEEE Seattle 2019:1–8. https://doi.org/
10.23919/OCEANS40490.2019.8962401.
deep layers enables the extraction of more complete crack information. [12] Huang Y, Zhuo Q, Fu J, Liu A. Research on evaluation method of underwater image
quality and performance of underwater structure defect detection model. Eng
CRediT authorship contribution statement Struct 2024;306:117797. https://doi.org/10.1016/j.engstruct.2024.117797.
[13] Wu P, Liu A, Fu J, Ye X, Zhao Y. Autonomous surface crack identification of
concrete structures based on an improved one-stage object detection algorithm.
Sritawat Kitiporncha: Writing – review & editing, Supervision, Eng Struct 2022;272:114962. https://doi.org/10.1016/j.engstruct.2022.114962.
Resources, Methodology, Conceptualization. Jiyang Fu: Writing – [14] Su Z, Zhou F, Liang J, Liu A, Wang J, Liang J, et al. Fractal theory based
identification model for surface crack of building structures. Eng Struct 2024;305:
original draft, Methodology, Investigation. Xijun Ye: Writing – review & 117708. https://doi.org/10.1016/j.engstruct.2024.117708.
editing, Methodology, Investigation. Bingcong Chen: Writing – review [15] Situ Z, Teng S, Feng W, Zhong Q, Chen G, Su J, et al. A transfer learning-based
& editing, Project administration, Investigation. Jie Yang: Writing – YOLO network for sewer defect detection in comparison to classic object detection
methods. Dev Built Environ 2023;15:100191. https://doi.org/10.1016/j.
review & editing, Supervision, Resources, Methodology, Conceptuali dibe.2023.100191.
zation. Airong Liu: Supervision, Resources, Methodology, Investiga [16] Wan C, Xiong X, Wen B, Gao S, Fang D, Yang C, et al. Crack detection for concrete
tion, Funding acquisition, Conceptualization. Shuai Teng: Writing – bridges with imaged based deep learning. Sci Prog 2022;105:
00368504221128487. https://doi.org/10.1177/00368504221128487.
original draft, Validation, Methodology, Investigation, Formal analysis, [17] Zhou Q, Situ Z, Teng S, Chen G. Convolutional neural networks–based model for
Conceptualization. Zhihua Wu: Writing – original draft, Investigation, automated sewer defects detection and classification. J Water Resour Plan Manag
Formal analysis. 2021;147:04021036.
[18] Teng S, Liu Z, Chen G, Cheng L. Concrete crack detection based on well-known
feature extractor model and the YOLO_v2 network. Appl Sci 2021;11:813. https://
doi.org/10.3390/app11020813.
Declaration of Competing Interest [19] Teng S, Chen G. Deep convolution neural network-based crack feature extraction,
detection and quantification. J Fail Anal Prev 2022;22:1308–21. https://doi.org/
The authors declare that they have no known competing financial 10.1007/s11668-022-01430-9.
[20] Ma Y, Wu Y, Li Q, Zhou Y, Yu D. ROV-based binocular vision system for
interests or personal relationships that could have appeared to influence underwater structure crack detection and width measurement. Multimed Tools
the work reported in this paper. Appl 2023;82:20899–923. https://doi.org/10.1007/s11042-022-14168-1.
[21] Cao W, Li J. Detecting large-scale underwater cracks based on remote operated
vehicle and graph convolutional neural network. Front Struct Civ Eng 2022;16:
Data Availability
1378–96. https://doi.org/10.1007/s11709-022-0855-8.
[22] Li Y, Bao T, Huang X, Chen H, Xu B, Shu X, et al. Underwater crack pixel-wise
Data will be made available on request. identification and quantification for dams via lightweight semantic segmentation
and transfer learning. Autom Constr 2022;144:104600. https://doi.org/10.1016/j.
autcon.2022.104600.
Acknowledgements [23] Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H. Encoder-Decoder with Atrous
Separable Convolution for Semantic Image Segmentation. Cham: Springer
International Publishing; 2018. p. 833–51.
This paper was funded by the National Natural Science Foundation of [24] Situ Z, Wang Q, Teng S, Feng W, Chen G, Zhou Q, et al. Improving urban flood
China (No. 52279127), 111 Project (No. D21021), Guangzhou Basic prediction using LSTM-DeepLabv3+ and Bayesian optimization with
Research Program Jointly Funded by Municipal Schools (Institutes) and spatiotemporal feature fusion. J Hydrol 2024;630:130743. https://doi.org/
10.1016/j.jhydrol.2024.130743.
Enterprises (No. 2024A03J0318), the National key Research and
[25] Cao Q, Li M, Yang G, Tao Q, Luo Y, Wang R, et al. Urban vegetation classification
Development Plan (No. 2022YFB2603300), China Postdoctoral Science for unmanned aerial vehicle remote sensing combining feature engineering and
Foundation (No. 2023M740805), Postdoctoral Fellowship Program of improved DeepLabV3+. Forests 2024;15:382. https://doi.org/10.3390/
f15020382.
CPSF (No. GZC20230593).
[26] Liu Z, Zeng Z, Li J, Teng S. Automatic detection and quantification of hot-rolled
steel surface defects using deep learning. Arab J Sci Eng 2023;48:10213–25.
References https://doi.org/10.1007/s13369-022-07567-x.
[27] Woo S, Park J, Lee J-Y, Kweon IS. CBAM: Convolutional Block Attention Module.
Cham: Springer International Publishing; 2018. p. 3–19.
[1] Teng S, Liu A, Ye X, Wang J, Fu J, Wu Z, et al. Review of intelligent detection and
[28] Shafiq M, Gu Z. Deep residual learning for image recognition: a survey. Appl Sci
health assessment of underwater structures. Eng Struct 2024;308:117958. https://
2022;12:8972. https://doi.org/10.3390/app12188972.
doi.org/10.1016/j.engstruct.2024.117958.
[29] Sandler M., Howard A., Zhu M., Zhmoginov A., Chen L.C. MobileNetV2: Inverted
[2] Chen D, Huang B, Kang F. A review of detection technologies for underwater cracks
Residuals and Linear Bottlenecks. 2018 IEEE/CVF Conference on Computer Vision
on concrete dam surfaces. Appl Sci 2023;13:3564. https://doi.org/10.3390/
and Pattern Recognition2018. p. 4510–4520, https://doi.org/10.1109/CVPR.201
app13063564.
8.00474.
[3] Cui B, Wang C, Li Y, Li H, Li C, Cui B. Image enhancement-based detection of
[30] Talib LF, Amin J, Sharif M, Raza M. Transformer-based semantic segmentation and
concrete cracks under turbid water bodies. Archit Eng Des Manag 2024:1–22.
CNN network for detection of histopathological lung cancer. Biomed Signal Process
https://doi.org/10.1080/17452007.2024.2324037.
Control 2024;92:106106. https://doi.org/10.1016/j.bspc.2024.106106.
[4] Tian B, Liu C, Guo J, Yuan S, Wang L, Xu Z. Research on the dynamic positioning of
[31] Adige S, Kurban R, Durmuş A, Karaköse E. Classification of apple images using
remotely operated vehicles applied to underwater inspection and repair of
support vector machines and deep residual networks. Neural Comput Appl 2023;
hydraulic structures. Phys Fluids 2023;35:097123. https://doi.org/10.1063/
35:12073–87. https://doi.org/10.1007/s00521-023-08340-3.
5.0167445.
[32] Kumar Y, Garg P, Moudgil MR, Singh R, Woźniak M, Shafi J, et al. Enhancing
[5] Zhang C, Ma H, Chen Z, Li S, Ma Z, Huang H, et al. YOLOX-DG robotic detection
parasitic organism detection in microscopy images through deep learning and fine-
systems for large-scale underwater concrete structures. iScience 2024;27:109337.
tuned optimizer. Sci Rep 2024;14:5753. https://doi.org/10.1038/s41598-024-
https://doi.org/10.1016/j.isci.2024.109337.
56323-8.
[6] Li W, Yuan Xa, Chen G, Ge J, Yin X, Li K. High sensitivity rotating alternating
[33] Zhou Q, Situ Z, Teng S, Liu H, Chen W, Chen G. Automatic sewer defect detection
current field measurement for arbitrary-angle underwater cracks. NDT E Int 2016;
and severity quantification based on pixel-level semantic segmentation. Tunn
79:123–31. https://doi.org/10.1016/j.ndteint.2016.01.003.
Undergr Space Technol 2022;123:104403. https://doi.org/10.1016/j.
[7] Zhang Y, Sidibé Y, Maze G, Leon F, Druaux F, Lefebvre D. Detection of damages in
tust.2022.104403.
underwater metal plate using acoustic inverse scattering and image processing
13
S. Teng et al. Engineering Structures 315 (2024) 118515
[34] Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical [36] Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic
Image Segmentation. Cham: Springer International Publishing; 2015. p. 234–41. segmentation. IEEE Trans Pattern Anal Mach Intell 2017;39:640–51. https://doi.
[35] Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional encoder- org/10.1109/TPAMI.2016.2572683.
decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell [37] Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM:
2015;39:2481–95. https://doi.org/10.1109/TPAMI.2016.2644615. visual explanations from deep networks via gradient-based localization. Int J
Comput Vis 2020;128:336–59. https://doi.org/10.1007/s11263-019-01228-7.
14