0% found this document useful (0 votes)
40 views14 pages

Automated Detection of Underwater Cracks Based On Fusion - 2024 - Engineering S

Uploaded by

Qasim Khattak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views14 pages

Automated Detection of Underwater Cracks Based On Fusion - 2024 - Engineering S

Uploaded by

Qasim Khattak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Engineering Structures 315 (2024) 118515

Contents lists available at ScienceDirect

Engineering Structures
journal homepage: www.elsevier.com/locate/engstruct

Automated detection of underwater cracks based on fusion of optical and


texture information
Shuai Teng a , Airong Liu a, * , Zhihua Wu a , Bingcong Chen a , Xijun Ye a , Jiyang Fu a ,
Sritawat Kitiporncha a , Jie Yang b
a
Research Centre for Wind Engineering and Engineering Vibration, Guangzhou University, Guangzhou 510006, China
b
School of Engineering, RMIT University, PO Box 71, Bundoora, VIC 3083, Australia

A R T I C L E I N F O A B S T R A C T

Keywords: This paper presents a novel method for the underwater crack detection by fuse the optical and texture infor­
Underwater crack detection mation. Underwater crack images were influenced by the harsh underwater environment, lead to making crack
Optical images detection challenging, especially difficult to capturing the details of cracks accurately. To improve detection
Texture information
accuracy, it was necessary to obtain more feature information related to cracks. Therefore, this paper proposes a
Data fusion
dual-input branch semantic segmentation model to achieve the fusion of optical and texture information, and
Semantic segmentation
introduces the Convolutional Block Attention Module (CBAM) module to enhance the performance of the se­
mantic segmentation model. The optimal network architecture was determined by selecting the backbone
network and optimizer, and a custom Tversky loss function was introduced to make the semantic segmentation
model pay more attention to the crack area. The results show that the detection accuracy, IoU, and F1-Score can
reach 96.07 %, 0.95, and 0.96 respectively. Through multiple comparative experiments, the effectiveness of the
proposed method was validated, particularly compared to the non-fused texture information method, where the
accuracy, IoU, and F1-Score were increased by 3.30 %, 6.74 %, and 7.88 % respectively. Finally, by visualizing
the variation pattern of cracks in the detection model, the operational mechanism of the proposed method was
explained. This confirms that the proposed method significantly improves the accuracy of underwater crack
detection, and provides a novel approach for the underwater defect detection.

1. Introduction Subsequently, various digital image processing techniques have been


used to improve the accuracy of crack detection from images captured
Underwater structures such as bridges, offshore platforms, subsea by cameras of underwater robots.
pipelines, etc., which bear important functions and are responsible for Traditional digital image processing techniques mainly rely on the
critical tasks [1]. The presence of cracks can potentially weaken the feature extraction and edge detection to obtain information about de­
structural integrity, thereby jeopardizing the safety of the structure [2]. fects [8]. Shi et al. [9] segmented sonar images into image blocks, used
Timely detection of cracks allows for the early identification of potential clustering analysis to extract crack fragments, and combined improved
safety hazards, enabling necessary maintenance and repair measures to evidence theory and fuzzy rule reasoning to achieve crack classification.
be taken to ensure the safe operation of underwater structures [3]. The Them also proposed an improved evidence theory-based crack detection
most direct method is carried out by divers or underwater robots [4]. and classification algorithm based on the local and global features of
Divers or underwater robots equipped with cameras descend directly to images [10]. As an advanced version, Mucolli et al. [11] introduced an
the surface of underwater structures to visually inspect the presence, unsupervised crack detection method for underwater concrete struc­
morphology, and size of cracks [5]. This method is intuitive and tures based on a local feature clustering algorithm. Although these
real-time, but it is constrained by underwater visibility and operational methods attempt to overcome the limitations of underwater crack
conditions, and requires high skills and safety requirements for divers. detection, it is still facing challenges in practical engineering applica­
Additionally, it is susceptible to high rates of human observation tions. Due to the harsh and uncontrolled underwater inspection sce­
misjudgment due to the complex underwater filming environment [6,7]. narios, the imaging quality of underwater crack images is poor. The

* Corresponding author.
E-mail address: [email protected] (A. Liu).

https://doi.org/10.1016/j.engstruct.2024.118515
Received 18 April 2024; Received in revised form 11 June 2024; Accepted 24 June 2024
Available online 28 June 2024
0141-0296/© 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
S. Teng et al. Engineering Structures 315 (2024) 118515

Fig. 1. Research ideas of the proposed method.

Fig. 2. Computation results of texture information.

low-quality images pose challenges to the crack feature extraction pro­ in small sample scenarios through a two-stage transfer learning strategy,
cess, resulting in high false positive or false negative rates in traditional achieving satisfactory segmentation results. The above studies confirm
image processing techniques [12]. Additionally, due to the complex that the powerful feature extraction capabilities of deep learning pro­
computational processes, traditional digital image processing methods vide a more intelligent approach for the underwater crack detection.
suffer from drawbacks such as slow detection speed and high levels of The aforementioned studies still rely solely on optical images, which
manual intervention. are affected by factors such as lighting and turbidity, these factors are
The development of deep learning technology has provided an op­ the main reasons affecting detection accuracy. To improve detection
portunity to overcome the limitations of traditional image processing accuracy, it is necessary to obtain more feature information related to
techniques [13,14]. Deep learning algorithms can automatically extract defects. Optical images consist of three channels: R (Red), G (Green),
image features to achieve the infrastructure crack detection [15,16]. and B (Blue). The longer wavelength of red light is rapidly absorbed in
Convolutional neural networks (CNNs) are the most representative deep water, causing the R channel to lose contrast and detail, resulting in
learning algorithms, which have achieved remarkable results in the color distortion or unreality in the image. The image in the G channel
crack classification [17], object detection [18], semantic segmentation usually retains the details and contrast of underwater scenes well
[19], and other fields, providing a potential solution for underwater because the wavelength of green light propagates relatively well in
crack detection. Ma et al. [20] utilized underwater images obtained by water. The image in the B channel emphasizes underwater colors and
underwater robots to train YOLO-v3 networks, and achieved relatively details because the wavelength of blue light can penetrate further in
accurate crack segmentation results by combining the grayscale fluctu­ water and maintain visibility in deep water. Therefore, this paper will
ation analysis, and calculated the crack width based on the segmentation reduce the influence of the underwater environment by eliminating the
results. To improve segmentation efficiency, Cao et al. [21] achieved the red channel. Meanwhile, the image texture provides rich information
stitching of underwater crack images through an image stitching algo­ about surface details and structures in the image, which helps to
rithm and implemented crack segmentation using the graph CNNs. To distinguish different objects or scenes. This paper will fuse the texture
overcome the problem of insufficient underwater samples, Li et al. [22] information with the green and blue channel information of optical
enhanced the segmentation accuracy of semantic segmentation models images to enrich the features of cracks, thereby improving the accuracy

2
S. Teng et al. Engineering Structures 315 (2024) 118515

Fig. 3. The DeepLabv3 + model.

of crack detection. and l iterate over the window, μi,j represents the mean value of the pixel
values within the window, n represents the size of the window.
2. Method The local standard deviation calculated by the stdfilt algorithm was
used as the pixel value of the output image, forming a new image, as
The main research ideas of this paper were as follows: extracting the shown in Fig. 2. The new image obtained from the local standard de­
optical information from the G and B channels of underwater images, as viation computed by the stdfilt algorithm can better display the edges of
well as texture information calculated through stdfilt algorithm, to serve objects. After the computation by the stdfilt algorithm, each pixel value
as inputs for the deep learning model. Through a custom-designed deep in the output image represents the standard deviation of the local pixel
learning model, the information fusion was achieved. Additionally, values at its corresponding position, which helps highlight the detail
some advanced improvement strategies were proposed to enhance the information in the image. In this paper, the local standard deviation of
accuracy of crack detection. The implementation process was illustrated underwater crack images was calculated by the stdfilt algorithm as
in Fig. 1. texture information, and this texture information was combined with
the G and B channels of optical images as inputs to the deep learning
2.1. Texture information acquisition model to achieve underwater crack detection.

In this paper, the stdfilt algorithm was utilized to obtain texture in­
2.2. Semantic segmentation network
formation from underwater crack images. The stdfilt algorithm was a
filtering technique used to compute the local standard deviation of each
This paper adopts the DeepLabv3 + model [23] as the base frame­
pixel in an image. The standard deviation measures the dispersion of a
work and improves it to achieve high-precision detection of underwater
set of data, so the stdfilt algorithm can be used to highlight details such
crack images. The DeepLabv3 + aims to address the lack of object
as texture and edges in an image. It has wide applications in image
boundary details and contextual information in the semantic segmen­
processing, including edge detection, texture analysis, and feature
tation task. It was an evolution of the DeepLab series of models, which
extraction. The calculation method was as follows (Eq. 1): for each pixel
further enhances the performance of semantic segmentation by intro­
position (i, j), the algorithm computes the standard deviation within a
ducing techniques such as dilated convolution, multi-scale feature
chosen window. Taking the current pixel position as the center, the al­
fusion, and decoder modules. Its basic architecture was illustrated in
gorithm extracts the pixel values within the window and then calculates
Fig. 3.
the standard deviation of these pixel values. The equation for calculating
The DeepLabv3 + employs dilated convolutions to expand the
the standard deviation was as follows:
receptive field of CNNs, thereby increasing the capability to capture
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
̅
1 ∑ n ∑ n (
n+1 n+1
) contextual information from input images [24]. Dilated convolutions
2
σ (i, j) = (I i + k − , j + l − − μ i,j ) (1) introduce dilation rates in the convolutional kernels, allowing for an
n2 k=1 l=1 2 2
increased receptive field without increasing the number of parameters.
Underwater images often suffer from issues such as granularity and
where, σ (i, j) represents the local standard deviation at position (i, j),
( ) blurriness. Dilated convolutions help in extending the receptive field,
I i +k − n+1
2 , j +l − n+1
2 represents the pixel value within the window, k better capturing contextual information from the image, which aids in

3
S. Teng et al. Engineering Structures 315 (2024) 118515

Fig. 4. The proposed FOT-DeepLabv3 + model.

improving the accuracy of underwater crack detection. (Branch A and Branch B). The G and B channels of optical images were
To fully utilize information at different scales within images, the input into Branch A (the raw DeepLabv3 + network was used to extract
DeepLabv3 + employs a strategy called multi-scale feature fusion. By features from optical images), while the texture information was input
introducing dilated convolution layers with different dilation rates into into Branch B (a lightweight CNN was used to extract features from
the network, along with the use of the Atrous Spatial Pyramid Pooling texture information). Finally, the feature fusion was performed after the
(ASPP) module, the DeepLabv3 + was capable of simultaneously ‘Concat’ layer. The optical image provides contrast and color informa­
leveraging features at both global and local scales [25]. Underwater tion of the cracks, while the texture information provides more details
cracks may exhibit different scales and shapes, thus necessitating a and edge information. Fusion of these two types of information enhances
model capable of effectively capturing information at various scales. the image’s contrast, making it easier to detect the presence of under­
Through the multi-scale feature fusion technique, the DeepLabv3 + can water cracks.
effectively utilize features from both global and local scales, thereby The CBAM was an attention mechanism used to enhance the per­
enhancing its ability to detect underwater cracks. formance of CNNs [27], as depicted in Fig. 5. It dynamically adjusts the
The DeepLabv3 + also introduces a decoder module, which was used weights of feature maps by introducing two mechanisms: channel
to restore image resolution and refine segmentation results. The decoder attention and spatial attention, enabling the network to focus more on
module generates the final semantic segmentation result through up- crucial features and improving both feature representation and network
sampling and feature-level fusion, contributing to improved segmenta­ generalization capabilities, as shown in Fig. 5a. The channel attention
tion accuracy and detail [26]. Underwater images were often affected by mechanism weights the feature maps along the channel dimension to
factors such as water quality and lighting, resulting in poor image dynamically adjust the importance of different channels, enabling the
quality. By incorporating the decoder module, the DeepLabv3 + can network to better capture important features, as illustrated in Fig. 5b.
restore image resolution and refine segmentation results, thereby The output Mc (F) of the channel attention module was computed using
enhancing the accuracy of underwater crack detection. Eq. 2. On the other hand, the spatial attention mechanism weights the
Building upon the basic architecture of DeepLabv3 + , this paper feature maps along the spatial dimension to enhance the network’s
proposes a dual-input semantic segmentation model to fuse the optical attention to different regions, effectively improving the network’s rep­
and texture information. Additionally, it introduces the CBAM module resentation capabilities, as shown in Fig. 5c. The output Ms (F) of the
to enhance the crack feature extraction performance of semantic seg­ spatial attention module was calculated using Eq. 3. In this paper, the
mentation model, namely FOT-DeepLabv3 + . This model was depicted CBAM was embedded after the ASPP module with the aim of enhancing
in Fig. 4. the network’s perception and discrimination capabilities of crack fea­
The improvements in this paper mainly include the following two tures through spatial and channel attention mechanisms. The CBAM can
aspects: help networks adaptively weight important features on each feature
The improved network model consists of dual-input branches map, thereby further improving the representation ability of features.

4
S. Teng et al. Engineering Structures 315 (2024) 118515

Fig. 5. The CBAM module.

This, in turn, helps improve the accuracy and robustness of underwater improves the model’s performance and robustness. The deep structure of
crack detection. ResNet helps capture richer image features, thereby enhancing detection
Overall, the contribution of this paper can be reflected in the fusion accuracy. The mobilenetv2 was a lightweight CNN designed for mobile
of optical and texture information through the construction of a se­ devices, characterized by its smaller model size and lower computa­
mantic segmentation model with dual-input branches, and the intro­ tional cost [29]. In resource-constrained underwater environments,
duction of module CBAM to enhance the model’s ability to extract using the MobileNet series as the backbone feature extraction network
underwater crack features. can accelerate the model’s inference speed and reduce hardware
resource requirements, thus enabling real-time underwater crack
Mc (F) = σ (MLP(AvgPool(F) ) + MLP(MaxPool(F))) (2)
detection.
where, F represents the input feature map,AvgPool and MaxPool denote The choice of network training optimizer also significantly impacts
global average pooling and max pooling operations, respectively, MLP detection results. This paper will compare the effects of different opti­
represents a multi-layer perceptron, σ denotes the Sigmoid activation mizers (adaptive moment estimation (adam) [30], stochastic gradient
function. descent with momentum (sgdm) [31], and root-mean-square propaga­
tion (rmsprop) [32]) on the detection results.
Ms (F) = σ (f 7×7 ([AvgPool(F); MaxPool(F)]) (3) The adam optimizer was an adaptive learning rate optimization al­
gorithm known for its fast convergence speed and good generalization
where, f 7×7 denotes a 7 × 7 convolutional operation, while performance. In underwater crack detection, the adam optimizer often
[AvgPool(F); MaxPool(F)] represents concatenation of the results of enables the model to converge faster to local optima and was relatively
average pooling and max pooling along the channel axis. easy to tune, making it suitable for most scenarios, Eqs. 9–10.
The sgdm accelerates the optimization process by introducing mo­
2.3. Architecture optimization mentum when updating parameters, especially in directions with large
curvature. This helps avoid getting stuck in local minima or saddle
To achieve optimal detection performance, this paper will compare points and allows faster convergence to the global or local optima. In
the effects of different backbone feature extraction networks (resnet18, tasks such as underwater crack detection, sgdm, as an improvement over
resnet50, and mobilenetv2) on the detection results to determine the stochastic gradient descent, can speed up the model training process,
best image feature extraction strategy. The potential advantages of enhance the convergence speed and stability, and facilitate more
ResNet were as follows: the ResNet series consists of deep residual effective crack detection, Eqs. 11–12.
networks, which have excellent feature learning capabilities and The rmsprop optimizer was another adaptive learning rate optimi­
parameter efficiency [28]. In underwater crack detection, using the zation algorithm suitable for optimizing non-stationary objective func­
ResNet series as the backbone feature extraction network typically tions. In underwater crack detection, if the dataset exhibits certain

5
S. Teng et al. Engineering Structures 315 (2024) 118515

Fig. 6. Training processes for different architectures.

Fig. 7. Mean value of evaluation indicators for cracks and non-cracks.

6
S. Teng et al. Engineering Structures 315 (2024) 118515

Table 1 where, θ represents parameters, g denotes gradient, α is the learning


Evaluation indicators of detected crack.
rate, β is decay coefficients, v represents the exponentially weighted
Backbone network and optimizer Accuracy (%) IoU F1-Score moving average of squared gradients, ϵ is a small value added for nu­
resnet18-sgdm 93.15 0.93 0.95 merical stability.
resnet18-adam 87.85 0.87 0.94 For underwater crack detection algorithms based on FOT-Deep­
resnet18-rmsprop 93.88 0.88 0.91 Labv3 + , selecting an appropriate loss function can significantly impact
resnet50-sgdm 91.38 0.91 0.96
resnet50-adam 91.4 0.89 0.95
the algorithm’s performance. The Tversky loss function effectively ad­
resnet50-rmsprop 87.01 0.86 0.9 dresses class imbalance issues by flexibly adjusting parameters to bal­
mobilenetv2-sgdm 95 0.93 0.96 ance the importance of different classes and considering pixel-level
mobilenetv2-adam 94.93 0.94 0.95 spatial relationships. This loss function enhances the model’s detection
mobilenetv2-rmsprop 95.59 0.94 0.95
accuracy for crack boundaries and improves its resistance to noise
interference, thereby effectively enhancing the performance and sta­
imbalances or dynamic changes, the rmsprop optimizer may help bility of underwater crack detection. Therefore, the Tversky loss func­
converge to the optimal solution more quickly, Eqs. 8–9. tion exhibits significant advantages in underwater crack detection
algorithms based on FOT-DeepLabv3 + . The Tversky loss was specif­
mt = β1 mt− 1 + (1 − β1 )⋅gt (4) ically designed for imbalanced datasets and operates by adjusting two
parameters to balance the importance of different classes, thereby
vt = β2 ⋅vt− 1 + (1 − β2 )⋅β2t (5) effectively overcoming data imbalance issues. Its calculation formula
was as follows:
mt
m
̂t = (6) ∑N
1 − βt1 i=1 pi ⋅gi
Tversky Loss = ∑N ∑N ∑ (13)
p ⋅g + α p ⋅(1 − gi ) + β Ni=1 (1 − pi )⋅gi
vt i=1 i i i=1 i
vt =
̂ (7)
1 − βt2
where, pi represents the model’s prediction value (ranging from 0 to 1),
α gi is the ground truth label (0 or 1), α and β are two control parameters
θt+1 = θt − √̅̅̅̅̅ ⋅m
̂t (8) used to balance the importance of positive and negative classes. The
vt + ϵ
̂
advantage of the Tversky loss function lies in its ability to flexibly adjust
where, θ represents parameters, g denotes gradient, α is the learning the loss function by tuning the α and β parameters, making it more
rate, β1 and β2 are momentum decay coefficients, m and v are estimates suitable for different datasets and tasks, effectively addressing class
of the first and second moments, ϵ is a small value added for numerical imbalance issues. Based on this characteristic, this paper aims to derive a
stability. loss function that performs well for underwater crack detection.

νt = γ⋅vt− 1+ α⋅gt (9) 2.4. Evaluation indicator

θt+1 = θt − vt (10) In this paper, four performance indicators were used to evaluate the
detection performance of the semantic segmentation model [33],
where, θ represents parameters, g denotes gradient, α is the learning
including: (a) Accuracy, which calculates the accuracy of crack detec­
rate, γ represents the momentum coefficient, v denotes velocity
tion, Eq. 14; (b) F1-Score, a comprehensive metric that combines Pre­
(momentum).
cision and Recall, Eq. 15; (c) Intersection over Union (IoU), which
vt = β⋅vt− 1 + (1 − β)⋅g2t (11) assesses the overlap ratio between predicted results (Ap) and ground
truth labels (Ar) pixels, Eq. 16; (d) Frames Per Second (FPS) to evaluate
θt+1 = θt − √̅̅̅̅
α
⋅gt (12) the computational efficiency of the semantic segmentation model.
vt + ϵ

Fig. 8. Detection results of the backbone network and optimizer being mobilenetv2-rmsprop.

7
S. Teng et al. Engineering Structures 315 (2024) 118515

Fig. 9. The impact of ALPHA on detection results.

( )
1 TP TN stable power supply guarantee system stability, providing reliable sup­
Accuracy = + (14)
2 TP + FP FN + TN port for training deep learning models of this paper.

Precision × Recall 3. Results and discussion


F1 − Score = 2 × (15)
Precision + Recall
This part mainly consists of three sections. Section 3.1 was used to
area(Ap ∩ Ar ) clarify the underwater crack detection effect of the proposed method,
IoU = (16)
area(Ap ∪Ar ) Section 3.2 confirms the excellent performance of the proposed method
TP (True Positives): represents the number of crack pixels correctly through comparative research, and Section 3.3 explains the variation
predicted as crack pixels by the model. TN (True Negatives): represents law of crack features in the model through feature visualization and
the number of non-crack pixels correctly predicted as non-crack pixels feature importance analysis.
by the model. FP (False Positives): represents the number of non-crack
pixels incorrectly predicted as crack pixels by the model. FN (False 3.1. Crack detection results
Negatives): represents the number of crack pixels incorrectly predicted
as non-crack pixels by the model. Precision = TP/(TP + FP); Recall = Fig. 6 illustrates the training performance of FOT-DeepLabv3 + with
TP/(TP + FN). different combinations of backbone networks and optimizers. During the
training process, when the combination of backbone network and
2.5. Dataset partitioning and computer configuration optimizer was resnet18-rmsprop and resnet50-rmsprop, there were
significant fluctuations in both the accuracy and loss of the network.
This paper employs an underwater robot to capture images of test Moreover, these fluctuations do not improve with continued training.
blocks containing cracks in turbid water. An 8-megapixel camera with This suggests that these two combinations of backbone networks and
two 15 W light-emitting diode lights mounted on the underwater robot optimizers may not be suitable for the current task of underwater crack
captures detailed images of underwater structures, which were essential detection.
for visual inspections and enabling the detection and documentation of The testing samples were used to evaluate the crack detection per­
cracks. The high-resolution camera ensures clear underwater images formance of the semantic segmentation model. Fig. 7 illustrates the
and videos for real-time monitoring and post-mission analysis. A total of detection results of FOT-DeepLabv3 + with different combinations of
180 underwater crack images were collected, with 90 images used as the backbone networks and optimizers (Figs. 7a, 7b, and 7c show the mean
testing dataset. The remaining 90 images underwent image augmenta­ accuracy, mean IoU, and mean F1-Score for cracks and non-cracks,
tion (rotation, contrast transformation, and color jittering), resulting in respectively). The results indicate that the mobilenetv2 performs well
360 images for the training dataset. The images were of size 256 × 256. as the backbone network of FOT-DeepLabv3 + , with the best perfor­
The computer setup includes an NVIDIA GeForce RTX 3080 Ti GPU, mance achieved by the combination of mobilenetv2 and rmsprop,
providing excellent computing and graphics processing performance for yielding a mean accuracy, mean IoU, and mean F1-Score of 97.77 %,
fast and efficient training of deep learning models. Coupled with an Intel 0.97, and 0.97, respectively. Table 1 also presents the crack detection
Core i7 multi-core processor and 32 GB DDR4 memory, the system en­ results, showing that the mobilenetv2 and rmsprop yield the best
sures powerful computing and processing capabilities. And a 1 TB SSD detection performance as the backbone network and optimizer for FOT-
offers rapid storage access speeds, while an excellent cooling system and DeepLabv3 + . The accuracy, IoU, and F1-Score were 95.59 %, 0.94,

8
S. Teng et al. Engineering Structures 315 (2024) 118515

Fig. 10. The impact of Beta on detection results.

and 0.95, respectively. Some examples of the detection results were interval of 0.1). The results, as shown in Fig. 10, indicate that the best
shown in Fig. 8, where the detected crack regions align well with the performance was achieved when β was 0.9, with an accuracy, IoU, and
labels, and the lower resolution and some non-crack linear features do F1-Score of 96.07 %, 0.95, and 0.96, respectively. When β was small, the
not affect the detection results, especially for small and narrow cracks, detection performance was poor. The changes in α and β have almost no
the FOT-DeepLabv3 + still has high detection performance. Meanwhile, impact on the FPS (Figs. 9d and 10d).
the detection results also confirmed that good detection results can still In summary, by selecting a lightweight backbone network and an
be achieved when cracks were in an extremely blurry environment. optimizer suitable for training underwater crack images, the FOT-
Moreover, due to the lightweight characteristic of mobilenetv2, it ex­ DeepLabv3 + can achieve better detection performance for underwater
hibits significant computational efficiency advantages (Fig. 7d), with an cracks. Furthermore, adjusting the loss function enables the FOT-
FPS close to 40 under current hardware conditions, offering potential DeepLabv3 + to adapt to underwater crack detection scenarios. Specif­
real-time detection benefits. ically, reducing the α value and increasing the β value of the Tversky loss
This paper utilizes the Tversky loss as the training loss function. The function enhance feature extraction capability of FOT-DeepLabv3 + for
above research results were based on α and β both equal to 0.5. Ac­ underwater cracks and suppress the influence of non-cracks on the
cording to Section 2.2, α and β have different effects on training effec­ detection results.
tiveness. This paper will explore the impact of different values on the
detection results. (1) With β fixed at 0.5, α ranges from 0.1 to 0.9 (with
an interval of 0.1). The results were shown in Fig. 9, indicate that the 3.2. Comparative Studies
best performance was achieved when α was 0.1, with an accuracy, IoU,
and F1-Score of 96.13 %, 0.93, and 0.95, respectively. The overall trend To validate the effectiveness of the proposed method, several
suggests that as α increases, poorer detection results were more likely. comparative experiments were conducted in this study. Firstly, the
Fig. 9d demonstrates that the value of α does not affect computational detection performance of non-fused texture information was compared.
efficiency. (2) With α fixed at 0.1, β ranges from 0.1 to 0.9 (with an Fig. 11 illustrates the variations in detection results between non-fused
and fused texture information. The non-fused texture information ach­
ieved an accuracy, IoU, and F1-Score of 93.00 %, 0.89, and 0.89,
respectively, while the fused texture information (our method) achieved
an accuracy, IoU, and F1-Score of 96.07 %, 0.95, and 0.96, respectively.
Thus, the strategy of fusing texture information proposed in this study
increased accuracy, IoU, and F1-Score by 3.30 %, 6.74 %, and 7.88 %,
respectively. Fig. 12 presents partial examples of the detection results
for non-fused and fused texture information. It can be observed from the
images that the detection results of non-fused texture information
contain many noise points, and some non-crack linear objects were
incorrectly detected as cracks, for blurry images, some cracks may have
many details lost, especially for small crack targets. In contrast, the fused
texture information can address these issues, resulting in detection re­
sults that closely match the labels. This confirms that the texture in­
Fig. 11. Detection results of non-fused texture information. formation can provide more robust crack features for the FOT-

9
S. Teng et al. Engineering Structures 315 (2024) 118515

Fig. 12. Example of partial detection results.

Table 3. The results demonstrate that the Tversky loss function used in
Table 2 this study can achieve the best detection results. This confirms that
Comparison results with other semantic segmentation algorithms. adjusting the α and β values of the Tversky loss function enables FOT-­
Evaluation U- SegNet FCN Our method (FOT- DeepLabv3 + to focus more on the crack regions.
indicators Net (8 s) DeepLabv3 +)

Accuracy (%) 10.01 93.54 77.64 96.07 3.3. Feature interpretation


IoU 0.10 0.82 0.77 0.95
F1-Score 0.38 0.85 0.83 0.96
FPS 6.72 30.08 7.71 42.09
To clarify the variation of crack features in the proposed semantic
segmentation model, this paper visualized the internal features of the
semantic segmentation model and analyzed the importance of features
using the Grad-CAM (Gradient-weighted Class Activation Mapping)
Table 3
[37]. The Grad-CAM was an interpretable method for deep learning
Detection results of different loss functions.
models used to understand the decision-making process of the model in
Evaluation indicators Dice Focal Cross- Our method (Tversky) classification tasks. For semantic segmentation results, the Grad-CAM
entropy
can explain how the predicted regions for each class were formed. It
Accuracy (%) 94.12 91.94 96.53 96.07 can show the regions of interest in the image that the model focuses on
IoU 0.93 0.88 0.91 0.95
when predicting specific classes and visualize the active regions of the
F1-Score 0.94 0.89 0.92 0.96
model under different categories. This explanation can help users un­
derstand the decision-making process of the model and how the model
DeepLabv3 + , thereby improving the crack detection accuracy. utilizes different parts of the image for semantic segmentation. This
To validate the excellent performance of the constructed FOT- study analyzed some representative layers, such as Layers L1, L2, L3,
DeepLabv3 + , this paper also compared it with several other commonly and L4 as shown in Fig. 13.
used semantic segmentation algorithms (such as: U-Net [34], SegNet For Layer L1, the image features obtained by this convolutional layer
[35], and FCN [36]), as shown in Table 2. The results indicate that the were visualized first. Fig. 14 shows that different convolution channels
U-Net performed the worst, with the lowest detection indicators and can capture different features. However, this layer mainly captures low-
slowest computational efficiency. The FCN (8 s) also exhibited poor level features (such as edges, colors, and spots), and there were many
performance, while SegNet, although decent, still did not surpass the noise points in channels related to crack features, which were far from
method proposed in this paper. This confirms that the semantic seg­ sufficient for crack detection. Fig. 15 illustrates the interpretation results
mentation model adopted in this paper was the most suitable, as it both of Grad-CAM. From the figure, it can be observed that this layer mainly
achieves the highest accuracy and exhibits the fastest computational focuses on the edge information of the crack, and the active region
efficiency due to the use of a lightweight backbone network. To further mainly surrounds the crack’s edges, unable to fully capture the entire
validate the superiority of the loss function adopted in this paper, region of the crack.
comparisons were made with several commonly used semantic seg­ For Layer L2, Fig. 16 illustrates the feature images obtained from 256
mentation loss functions (Dice, Focal, and Cross-entropy), shown in convolution channels. Some convolution channels were dedicated to
extracting features related to cracks. Due to the feature extraction

10
S. Teng et al. Engineering Structures 315 (2024) 118515

Fig. 13. The positions of L1, L2, L3 and L4.

Fig. 14. Layer L1 activation results.

performed by many convolutional and pooling layers, much of the noise


information has been eliminated, making the cracks more clearly visible.
By activating several strong feature maps (brighter in crack areas), it can
be observed that this layer can extract crack shape features and spatial
distribution features. The visualization results of Grad-CAM (as shown in
Fig. 17) also indicate that this layer has a high focus on crack areas,
confirming that the ASPP and CBAM modules enable the model to pay
more attention to the regions where cracks were located.
For Layer L3, it combines the shallow crack edge information with
the deep shape and spatial distribution features, resulting in a more
Fig. 15. Areas of focus in Layer L1.

11
S. Teng et al. Engineering Structures 315 (2024) 118515

Fig. 16. Layer L2 activation results.

Fig. 17. Areas of focus in Layer L2. Fig. 19. Areas of focus in Layer L3.

nuanced description of cracks (as shown in Fig. 18). Through the


upsampling layer, the resolution of features was increased, allowing for
more detailed representation of crack details. The visualization results of
Grad-CAM (as shown in Fig. 19) also indicate that the crack’s edge,
shape, and spatial distribution features were given special attention.
This explains the reason for achieving better detection results. Finally,
the Softmax layer classifies each pixel based on a large number of fea­
tures, ultimately achieving the detection of cracks and non-cracks, as
depicted in Fig. 20, where the left image shows the detection results for
non-crack regions, and the right image shows the detection results for
crack regions.
Overall, the proposed deep learning model can obtain complete
features of cracks through continuous feature extraction and fusion.
Fig. 20. Output features of non-crack and crack.

4. Conclusion
analysis reveals the changing patterns of crack features within the se­
This paper proposes a crack detection method that constructing a mantic segmentation model.
specialized semantic segmentation network to fuse the optical and The following conclusions were drawn from the research results:
texture information of underwater images, achieving high-precision (1) The underwater crack detection method proposed in this paper,
segmentation of underwater crack images. The texture information which integrates texture information, has achieved satisfactory detec­
contains the boundary information of cracks, which was input into the tion results. Compared with non-fused texture information methods, the
semantic segmentation model to guide it to obtain more accurate crack detection accuracy, IoU, and F1-Score have been improved by 3.30 %,
detection results. Comparative studies confirm the excellent perfor­ 6.74 %, and 7.88 %, respectively.
mance of the proposed method. Finally, the feature visualization (2) Adjusting the α and β parameters of the Tversky loss function can
alter the proposed semantic segmentation model’s focus on underwater

Fig. 18. Layer L3 activation results.

12
S. Teng et al. Engineering Structures 315 (2024) 118515

cracks, with the optimal values of α and β being 0.1 and 0.9 respectively, methods. Appl Acoust 2016;103:110–21. https://doi.org/10.1016/j.
apacoust.2015.10.013.
with which with the accuracy, IoU, and F1-Score of 96.07 %, 0.95, and
[8] Lei M, Liu L, Shi C, Tan Y, Lin Y, Wang W. A novel tunnel-lining crack recognition
0.96 respectively. system based on digital image technology. Tunn Undergr Space Technol 2021;108:
(3) By comparing with other popular semantic segmentation models, 103724. https://doi.org/10.1016/j.tust.2020.103724.
it has been confirmed that the outstanding performance of the proposed [9] Shi P, Fan X, Ni J, Khan Z, Li M. A novel underwater dam crack detection and
classification approach based on sonar images. PloS One 2017;12:e0179627.
method, the detection accuracy and efficiency have the best https://doi.org/10.1371/journal.pone.0179627.
performance. [10] Shi P, Fan X, Ni J, Wang G. A detection and classification approach for underwater
(4) The feature visualization confirms that the shallow layers of the dam cracks. Struct Health Monit 2016;15:541–54. https://doi.org/10.1177/
1475921716651039.
semantic segmentation model primarily extract low-level features (such [11] Mucolli L, Krupinski S, Maurelli F, Mehdi SA, Mazhar S. Detecting cracks in
as edges, colors, and spots), while deeper layers can extract shape and underwater concrete structures: an unsupervised learning approach based on local
spatial distribution features. Combining features from both shallow and feature clustering. Oceans 2019 MTS/IEEE Seattle 2019:1–8. https://doi.org/
10.23919/OCEANS40490.2019.8962401.
deep layers enables the extraction of more complete crack information. [12] Huang Y, Zhuo Q, Fu J, Liu A. Research on evaluation method of underwater image
quality and performance of underwater structure defect detection model. Eng
CRediT authorship contribution statement Struct 2024;306:117797. https://doi.org/10.1016/j.engstruct.2024.117797.
[13] Wu P, Liu A, Fu J, Ye X, Zhao Y. Autonomous surface crack identification of
concrete structures based on an improved one-stage object detection algorithm.
Sritawat Kitiporncha: Writing – review & editing, Supervision, Eng Struct 2022;272:114962. https://doi.org/10.1016/j.engstruct.2022.114962.
Resources, Methodology, Conceptualization. Jiyang Fu: Writing – [14] Su Z, Zhou F, Liang J, Liu A, Wang J, Liang J, et al. Fractal theory based
identification model for surface crack of building structures. Eng Struct 2024;305:
original draft, Methodology, Investigation. Xijun Ye: Writing – review & 117708. https://doi.org/10.1016/j.engstruct.2024.117708.
editing, Methodology, Investigation. Bingcong Chen: Writing – review [15] Situ Z, Teng S, Feng W, Zhong Q, Chen G, Su J, et al. A transfer learning-based
& editing, Project administration, Investigation. Jie Yang: Writing – YOLO network for sewer defect detection in comparison to classic object detection
methods. Dev Built Environ 2023;15:100191. https://doi.org/10.1016/j.
review & editing, Supervision, Resources, Methodology, Conceptuali­ dibe.2023.100191.
zation. Airong Liu: Supervision, Resources, Methodology, Investiga­ [16] Wan C, Xiong X, Wen B, Gao S, Fang D, Yang C, et al. Crack detection for concrete
tion, Funding acquisition, Conceptualization. Shuai Teng: Writing – bridges with imaged based deep learning. Sci Prog 2022;105:
00368504221128487. https://doi.org/10.1177/00368504221128487.
original draft, Validation, Methodology, Investigation, Formal analysis, [17] Zhou Q, Situ Z, Teng S, Chen G. Convolutional neural networks–based model for
Conceptualization. Zhihua Wu: Writing – original draft, Investigation, automated sewer defects detection and classification. J Water Resour Plan Manag
Formal analysis. 2021;147:04021036.
[18] Teng S, Liu Z, Chen G, Cheng L. Concrete crack detection based on well-known
feature extractor model and the YOLO_v2 network. Appl Sci 2021;11:813. https://
doi.org/10.3390/app11020813.
Declaration of Competing Interest [19] Teng S, Chen G. Deep convolution neural network-based crack feature extraction,
detection and quantification. J Fail Anal Prev 2022;22:1308–21. https://doi.org/
The authors declare that they have no known competing financial 10.1007/s11668-022-01430-9.
[20] Ma Y, Wu Y, Li Q, Zhou Y, Yu D. ROV-based binocular vision system for
interests or personal relationships that could have appeared to influence underwater structure crack detection and width measurement. Multimed Tools
the work reported in this paper. Appl 2023;82:20899–923. https://doi.org/10.1007/s11042-022-14168-1.
[21] Cao W, Li J. Detecting large-scale underwater cracks based on remote operated
vehicle and graph convolutional neural network. Front Struct Civ Eng 2022;16:
Data Availability
1378–96. https://doi.org/10.1007/s11709-022-0855-8.
[22] Li Y, Bao T, Huang X, Chen H, Xu B, Shu X, et al. Underwater crack pixel-wise
Data will be made available on request. identification and quantification for dams via lightweight semantic segmentation
and transfer learning. Autom Constr 2022;144:104600. https://doi.org/10.1016/j.
autcon.2022.104600.
Acknowledgements [23] Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H. Encoder-Decoder with Atrous
Separable Convolution for Semantic Image Segmentation. Cham: Springer
International Publishing; 2018. p. 833–51.
This paper was funded by the National Natural Science Foundation of [24] Situ Z, Wang Q, Teng S, Feng W, Chen G, Zhou Q, et al. Improving urban flood
China (No. 52279127), 111 Project (No. D21021), Guangzhou Basic prediction using LSTM-DeepLabv3+ and Bayesian optimization with
Research Program Jointly Funded by Municipal Schools (Institutes) and spatiotemporal feature fusion. J Hydrol 2024;630:130743. https://doi.org/
10.1016/j.jhydrol.2024.130743.
Enterprises (No. 2024A03J0318), the National key Research and
[25] Cao Q, Li M, Yang G, Tao Q, Luo Y, Wang R, et al. Urban vegetation classification
Development Plan (No. 2022YFB2603300), China Postdoctoral Science for unmanned aerial vehicle remote sensing combining feature engineering and
Foundation (No. 2023M740805), Postdoctoral Fellowship Program of improved DeepLabV3+. Forests 2024;15:382. https://doi.org/10.3390/
f15020382.
CPSF (No. GZC20230593).
[26] Liu Z, Zeng Z, Li J, Teng S. Automatic detection and quantification of hot-rolled
steel surface defects using deep learning. Arab J Sci Eng 2023;48:10213–25.
References https://doi.org/10.1007/s13369-022-07567-x.
[27] Woo S, Park J, Lee J-Y, Kweon IS. CBAM: Convolutional Block Attention Module.
Cham: Springer International Publishing; 2018. p. 3–19.
[1] Teng S, Liu A, Ye X, Wang J, Fu J, Wu Z, et al. Review of intelligent detection and
[28] Shafiq M, Gu Z. Deep residual learning for image recognition: a survey. Appl Sci
health assessment of underwater structures. Eng Struct 2024;308:117958. https://
2022;12:8972. https://doi.org/10.3390/app12188972.
doi.org/10.1016/j.engstruct.2024.117958.
[29] Sandler M., Howard A., Zhu M., Zhmoginov A., Chen L.C. MobileNetV2: Inverted
[2] Chen D, Huang B, Kang F. A review of detection technologies for underwater cracks
Residuals and Linear Bottlenecks. 2018 IEEE/CVF Conference on Computer Vision
on concrete dam surfaces. Appl Sci 2023;13:3564. https://doi.org/10.3390/
and Pattern Recognition2018. p. 4510–4520, https://doi.org/10.1109/CVPR.201
app13063564.
8.00474.
[3] Cui B, Wang C, Li Y, Li H, Li C, Cui B. Image enhancement-based detection of
[30] Talib LF, Amin J, Sharif M, Raza M. Transformer-based semantic segmentation and
concrete cracks under turbid water bodies. Archit Eng Des Manag 2024:1–22.
CNN network for detection of histopathological lung cancer. Biomed Signal Process
https://doi.org/10.1080/17452007.2024.2324037.
Control 2024;92:106106. https://doi.org/10.1016/j.bspc.2024.106106.
[4] Tian B, Liu C, Guo J, Yuan S, Wang L, Xu Z. Research on the dynamic positioning of
[31] Adige S, Kurban R, Durmuş A, Karaköse E. Classification of apple images using
remotely operated vehicles applied to underwater inspection and repair of
support vector machines and deep residual networks. Neural Comput Appl 2023;
hydraulic structures. Phys Fluids 2023;35:097123. https://doi.org/10.1063/
35:12073–87. https://doi.org/10.1007/s00521-023-08340-3.
5.0167445.
[32] Kumar Y, Garg P, Moudgil MR, Singh R, Woźniak M, Shafi J, et al. Enhancing
[5] Zhang C, Ma H, Chen Z, Li S, Ma Z, Huang H, et al. YOLOX-DG robotic detection
parasitic organism detection in microscopy images through deep learning and fine-
systems for large-scale underwater concrete structures. iScience 2024;27:109337.
tuned optimizer. Sci Rep 2024;14:5753. https://doi.org/10.1038/s41598-024-
https://doi.org/10.1016/j.isci.2024.109337.
56323-8.
[6] Li W, Yuan Xa, Chen G, Ge J, Yin X, Li K. High sensitivity rotating alternating
[33] Zhou Q, Situ Z, Teng S, Liu H, Chen W, Chen G. Automatic sewer defect detection
current field measurement for arbitrary-angle underwater cracks. NDT E Int 2016;
and severity quantification based on pixel-level semantic segmentation. Tunn
79:123–31. https://doi.org/10.1016/j.ndteint.2016.01.003.
Undergr Space Technol 2022;123:104403. https://doi.org/10.1016/j.
[7] Zhang Y, Sidibé Y, Maze G, Leon F, Druaux F, Lefebvre D. Detection of damages in
tust.2022.104403.
underwater metal plate using acoustic inverse scattering and image processing

13
S. Teng et al. Engineering Structures 315 (2024) 118515

[34] Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical [36] Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic
Image Segmentation. Cham: Springer International Publishing; 2015. p. 234–41. segmentation. IEEE Trans Pattern Anal Mach Intell 2017;39:640–51. https://doi.
[35] Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional encoder- org/10.1109/TPAMI.2016.2572683.
decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell [37] Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM:
2015;39:2481–95. https://doi.org/10.1109/TPAMI.2016.2644615. visual explanations from deep networks via gradient-based localization. Int J
Comput Vis 2020;128:336–59. https://doi.org/10.1007/s11263-019-01228-7.

14

You might also like