0% found this document useful (0 votes)
10 views11 pages

Deep Learning-Based Semantic Segmentation Methods For Pavement Cracks

Uploaded by

akran16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views11 pages

Deep Learning-Based Semantic Segmentation Methods For Pavement Cracks

Uploaded by

akran16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

information

Article
Deep Learning-Based Semantic Segmentation Methods for
Pavement Cracks
Yu Zhang 1 , Xin Gao 1, * and Hanzhong Zhang 2

1 College of Mechanical Engineering, Shenyang University of Technology, Shenyang 110870, China


2 Hong Kong Community College, The Hong Kong Polytechnic University, Hong Kong 999077, China
* Correspondence: [email protected]

Abstract: As road mileage continues to expand, the number of disasters caused by expanding
pavement cracks is increasing. Two main methods, image processing and deep learning, are used
to detect these cracks to improve the efficiency and quality of pavement crack segmentation. The
classical segmentation network, UNet, has a poor ability to extract target edge information and small
target segmentation, and is susceptible to the influence of distracting objects in the environment,
thus failing to better segment the tiny cracks on the pavement. To resolve this problem, we propose
a U-shaped network, ALP-UNet, which adds an attention module to each encoding layer. In the
decoding phase, we incorporated the Laplacian pyramid to make the feature map contain more
boundary information. We also propose adding a PAN auxiliary head to provide an additional loss
for the backbone to improve the overall network segmentation effect. The experimental results show
that the proposed method can effectively reduce the interference of other factors on the pavement
and effectively improve the mIou and mPA values compared to the previous methods.

Keywords: attention module; Laplacian pyramid; PAN

1. Introduction
Pavement crack is a common pavement disease, and it is often not brought to our
attention and is generally only considered by us as a normal phenomenon of pavement
Citation: Zhang, Y.; Gao, X.; Zhang, aging [1]. However, traffic accidents, such as pavement collapse due to the continuous
H. Deep Learning-Based Semantic expansion of pavement cracks, occur frequently. With the erosion of rainwater and the
Segmentation Methods for Pavement crushing of vehicles causing the increase of pavement cracks, if these cracks are not repaired
Cracks. Information 2023, 14, 182.
in time, they will affect traffic safety, and in more serious cases even lead to landslides on
https://doi.org/10.3390/
panhandle roads. If road maintenance intervenes in the early stage of pavement damage,
info14030182
not only can it reduce the repair cost and repair time, but it can also greatly extend the
Academic Editor: Gennady Agre service life of the road and reduce the disaster caused by the aging of the road. The
traditional manual maintenance method is risky and inefficient. It requires a lot of human
Received: 14 February 2023
and material resources, which makes it difficult to complete many road maintenance
Revised: 9 March 2023
tasks in a timely manner [2]. In contrast, an automated pavement crack detection system
Accepted: 13 March 2023
Published: 15 March 2023
developed using computer vision and deep learning technologies can do the job quickly
and accurately while eliminating the subjective factor [3]. Such systems can use low-cost
devices such as smartphones or drones to capture high-resolution images and identify crack
locations and types through methods such as deep convolutional neural networks and
Copyright: © 2023 by the authors. adaptive threshold segmentation. This will not only improve the efficiency and accuracy
Licensee MDPI, Basel, Switzerland. of detection but also reduce labor costs and risks. Therefore, road maintenance should be
This article is an open access article developed in the direction of intelligence and efficiency.
distributed under the terms and In the field of computer vision-based pavement crack detection and segmentation [4],
conditions of the Creative Commons the research direction is broadly divided into two parts. One is based on image processing,
Attribution (CC BY) license (https://
which mainly focuses on manual recognition of the collected data [5], using a variety of
creativecommons.org/licenses/by/
feature rules such as HOG (Histogram of Oriented Gradient), frequency, greyscale, edge,
4.0/).

Information 2023, 14, 182. https://doi.org/10.3390/info14030182 https://www.mdpi.com/journal/information


Information 2023, 14, 182 2 of 11

texture, entropy, etc. and then designing some feature recognition conditions for recognition.
The second is to establish a convolutional network based on deep learning to extract features
from the dataset [6] and make the network continuously self adjust according to a specific
loss function to achieve output data equal or approximate to the label.
We next provide a comparative review of previous studies on pavement crack seg-
mentation, which can be divided into two main categories, i.e., image processing-based
and deep learning-based approaches.

1.1. Image Processing-Based Methods


In the early research, crack detection methods mainly combined or improved tra-
ditional image processing techniques, such as threshold processing and edge detection.
Furthermore, an automatic detection method for pavement cracks was generated. However,
the result obtained by this method is the centerline of the crack, which does not include
the width of the crack. In order to get more information about cracks, image processing
technology is used to preprocess images, segment images, and extract features, and a fast,
automatic detection and segmentation method is developed. Shi, Y. et al. [7] proposed
a new forest structure-based road crack detection system, CrackForest, to address the
problems of severe crack inhomogeneity, complex topology, texture similarity, and noise,
and the experimental results proved that CrackForest has advanced detection accuracy.
Oliveira and Correia [8] performed appropriate smoothing of the data images to reduce
false positive detection results, and then iteratively classified the binary pixels into cracked
and noncracked classes to identify intact pavement cracks. Lu et al. [9] proposed a pave-
ment crack identification method based on automatic threshold iteration. They improved
the peak threshold selection method by completing image enhancement, smoothing, and
denoising processes before iterative threshold selection. The improved peak threshold
selection method could realize real-time automatic threshold selection and ensure the
stability of the detection process. Dinh, T.H. and Ha, and Q.P. et al. addressed the potential
problem of concrete crack detection by using a threshold histogram approach to extract
regions of interest from the background [10].

1.2. Deep Learning-Based Methods


With the technological breakthrough of deep learning in recent years, detection algo-
rithms based on deep learning and convolutional neural networks have achieved better
results in pavement crack identification.
Zhang, L. first applied deep learning to pavement crack detection [11]. In order
to classify cracked images, he demonstrated that using convolutional neural networks
is superior to supporting vector machines in classifying cracked images. However, the
designed network is primitive, the training data of the image scenes are not complete
enough, and the recognition efficiency and accuracy are low. Henrique Oliveira et al. [3]
proposed a fully integrated system for the automatic detection of pavement cracks, which
eliminates the need for manual labeling of samples, minimizes the human subjectivity
generated by traditional visual inspection, and achieves crack detection and crack-type
classification based on image blocks. Cha, Y.J. proposed a crack detection algorithm
incorporating sliding windows and convolutional neural networks [6], which reduces the
disturbance of image cutting on crack recognition. Experimental results show that the
algorithm outperforms Sobel edge detection and canny-edge detection and can achieve
higher crack classification accuracy. Zhang, A. and Wang, K.C.P. et al. [12] proposed an
efficient architecture for CrackNet based on convolutional neural networks (CNN). It uses
CNN to predict the class of each pixel of an image, which is significantly better than the
state-of-the-art image processing methods and machine learning-based classifiers, though
there is a need for further improvement to detect finer cracks. VishalMandal proposed
an automatic pavement disease analysis system based on YOLOv2 [13]. The system
obtained the final average score through the correct rate and recall rate to evaluate the
classification and detection accuracy of the proposed distress analyzer. König, J. proposed
Information 2023, 14, 182 3 of 11

an architecture based on a fully convolutional network, UNet, and a residual module [14]
and achieved good segmentation results for crack images. Garbowski and Gajewski [15]
proposed a semiautomatic inspection tool based on 3D profile scan data that can identify
and measure defects such as cracks, potholes, and crumbling on road surfaces. They used
techniques such as laser scanners, image processing techniques, support vector machine
classifiers, and a human–computer interface. A fast, accurate, and visual assessment of the
road surface condition was achieved and compared with manual inspection methods. Its
effectiveness and accuracy were demonstrated. Seo, H. and Huang, C. improved the U-Net
by proposing mU-Net [16], introducing a feature extraction module between the encoder
and decoder, fusing high-level features with low-level features, and using a region-based
growth method to generate an initial liver segmentation mask to improve segmentation
accuracy and robustness. Combining gradient descent and Newton’s method accelerates
network convergence and avoids local optimal solutions.
These inspection methods use different schemes to process and identify cracked im-
ages. Each detection system has a specific scope of application. Most of the existing
detection techniques are designed to accomplish the classification of crack types and pave-
ment damage assessment. However, due to the good crack resistance and stability of
asphalt pavements, cracks are mostly small and located on rough, gray, and disturbed sur-
faces, this affects the accuracy of the algorithm to varying degrees. In practical applications,
the existing methods still have the following shortcomings. For example, it is not effective
in segmenting small cracks and is prone to misjudgment of road surface disturbances such
as manhole covers and oil stains. For the edge of the lane line, it is easy to produce an
oversegmentation problem. In this paper, these characteristics of pavement crack images
are analyzed and a novel method based on deep learning networks is proposed. The
method can simultaneously improve the segmentation effect in terms of crack details and
reduce the effect of interfering objects on the segmentation effect. Extensive experiments
are also conducted on a test dataset with representative features, and significant advantages
are achieved in several metrics compared to existing methods. This paper provides a novel
and effective technical means for the field of road surface crack image analysis, and also
provides insight and reference for other similar problems.

2. Proposed Method
This section describes the structural components of the algorithm after the improve-
ment and explains the reasons for the modification. The focus of this paper is to improve
the accuracy of the model to segment the crack data in the presence of disturbances. With
the Unet model [17] as the main backbone and the CBAM (convolutional block attention
module) attention module [18] added to each layer of the encoder (left path) to better
perform initial feature extraction on the input data, the decoder structure concatenates
Laplacian residuals [19] based on Unet to incorporate more detailed information.

2.1. CBAM-Unet Model


Different from the SE (squeeze-and-excitation) attention module [20], the CBAM mod-
ule infers attention feature maps along two separate dimensions, channel and spatial
attention mechanisms [18]. Channel attention determines “what is important”, and spatial
attention determines “where is important”. It provides adaptive feature refinement and im-
proves the representation of interests. This is helpful for small objects and difficult samples.
Compared with BAM (bottleneck attention module) [21], it is used in the bottleneck and
can be used as a plug-and-play module for any intermediate convolutional layer module.
From the perspective of data features, pavement cracks belong to small objects, and
the area ratio between the foreground and the background is very unbalanced. In order
to enable Unet to better extract features from details in data in limited data, we proposed
CBAM-Unet which adds an attention module before the down-sampling of the encoding
layer of the contraction path. In this way, the feature map obtained by convolution is
weighted in the channel dimension and spatial dimension, which enables the network
CBAM-Unet which adds an attention module before the down-sampling of the encoding
layer of the contraction path. In this way, the feature map obtained by convolution is
weighted in the channel dimension and spatial dimension, which enables the network to
pay more attention to the target region when extracting features. This process can be for-
mulated
Information 2023, 14, 182 as follows: 4 of 11

E = DouConv(x),
(1)
E = Att(E ),
to pay more attention to the target region when extracting features. This process can be
where Ei denotesformulated
the i-th encoding layer of thenetwork model, and x is the feature map
as follows:
input from the previous layer. DouConv(·) denotesEirepeated= DouConv (x),
application of two 3 × 3 con- (1)
E
volutions (unpadded convolutions), and Att(·) is thei+CBAM 1 = Att E ,
(attention
I) module. The decod-
ing layer includeswhere
a deconvolution
Ei denotes the layer and a convolutional
i-th encoding layer tomodel,
layer of the network restore thex is
and feature
the feature map
input from the previous layer. DouConv( · ) denotes repeated
map after convolutional downsampling to its original detail and size. On this basis, we application of two 3 × 3
convolutions (unpadded convolutions), and Att( · ) is the
add weight standardization to the preactivation convolution blocks [22] of the decoder CBAM attention module. The
decoding layer includes a deconvolution
architecture, which helps slightly improve the mIou indicator. layer and a convolutional layer to restore the
feature map after convolutional downsampling to its original detail and size. On this basis,
we add weight standardization to the preactivation convolution blocks [22] of the decoder
2.2. Laplacian Pyramid
architecture, which helps slightly improve the mIou indicator.
The Laplace pyramid has been applied in various fields of scene understanding since
2.2. Laplacian about
it preserves local information Pyramidgiven data [23] and it emphasizes the differences be-
The Laplace
tween different scale spaces. Thispyramid
featurehas been applied
is exactly whatinisvarious
missing fields of scene
in the understanding since
higher-order
features generateditin preserves local information
the encoding phase. After about given the
applying dataLaplace
[23] andpyramid
it emphasizes the differences
transform
between different scale spaces. This feature is exactly what is missing in the higher-
to the image, the results at different scales contain more distinct boundary information
order features generated in the encoding phase. After applying the Laplace pyramid
[24]. Incorporatingtransform
this boundary information
to the image, intoatthe
the results decoding
different stage
scales can more
contain improve the boundary
distinct
segmentation of small objects by the network structure. The decoding process incorpo-
information [24]. Incorporating this boundary information into the decoding stage can
rates multiscale Laplace
improvefeatures of the input
the segmentation three-channel
of small objects by thecolor image.
network The structure
structure. of process
The decoding
the proposed method is shownmultiscale
incorporates in FigureLaplace
1. Basedfeatures
on thisof decoding
the inputscheme containing
three-channel colorthe
image. The
multiscale Laplacian residuals of the input images, this coding structure can be more ef- scheme
structure of the proposed method is shown in Figure 1. Based on this decoding
fective for feature containing
extractionthe of multiscale Laplacian residuals of the input images, this coding structure can
fine cracks.
be more effective for feature extraction of fine cracks.

Figure 1. AL-Unet combining CBAM and Laplace pyramid.

Calculate the Laplacian residual on the input image:

Lk = Ik − Up(Ik+1 ), k = 1, 2, 3, 4 (2)
Information 2023, 14, 182 5 of 11

The k in the above equation denotes the number of layers in Laplacian pyramid, Ik is
obtained by downsampling the original input image by 1/2k−1 , and Up(·) represents the
image resize using bilinear interpolation.

Dk = cat(Ek , Lk , UpwsConv(Dk+1 )), k = 1, 2, 3, 4 (3)

After encoding and decoding calculation, Dk has the feature information of different
levels of encoder and the boundary features of Laplacian residuals. UpwsConv(·) denotes
preactivation convolution blocks with weight standardization. D1 is calculated as the final
feature output of the lead head for loss calculation. We called the UNet combining CBAM
and Laplace pyramid AL-UNet.

2.3. PAN Path-Aggregation Auxiliary Head


Deep supervision is a commonly used technique in deep network training. Its main
idea is to add additional auxiliary heads in the middle layer of the network, which is used
as an auxiliary loss to guide the weight of the shallow network. In this paper, AL-UNet is
used as the main feature extraction module, and PAN [25] is used as the auxiliary head. We
modified the input and output of the original PAN without changing the performance, and
D2, D3, and D4 in the decoder stage of AL-UNet are used as the input of the PAN feature
extraction module to extract auxiliary features from the input image. Di(i = 1, 2, 3, 4) also
has shallow, deep features and boundary information of the Laplacian. The three outputs
of PAN were spliced at the same scale and passed through a layer of two 1 ∗ 1 CBL to
provide auxiliary loss for AL-UNet. The architecture of the PAN is shown in Figure 2.

2.4. Loss Function


The pavement crack segmentation task is a typical fine-crack segmentation task, and
its data is characterized by an extremely unbalanced proportion of positive and negative
samples, i.e., a very low proportion of crack categories and a high proportion of background
categories. In this case, if only BCELoss [26] is used as the loss function, it may cause the
network to overfit the background category and ignore the detection of the crack category.
Therefore, other loss functions need to be used to supplement BCELoss to improve the
network’s focus on the crack category and segmentation accuracy.
DiceLoss [27] is a loss function based on the proportion of overlapping regions, which
measures the similarity between segmentation results and labels, and is balanced for posi-
tive and negative samples. In the pavement crack segmentation task, using only BCELoss
as the loss function may cause the network to be insensitive to the differences between
positive and negative samples due to the extremely unbalanced ratio of positive and nega-
tive samples. Using DiceLoss allows the network to pay more attention to the differences
between positive and negative samples and optimize the proportion of overlapping regions
between segmentation results and labels.
FocalLoss [28] is an improved cross-entropy loss function that dynamically adjusts
the weights according to the difficulty of the samples, allowing the network to focus more
on hard-to-classify samples and less on easy-to-classify samples. In the pavement crack
segmentation task, the crack category tends to be the hard-to-classify samples because of its
low proportion, while the background category tends to be the easy-to-classify sample due
to its high proportion. Using FocalLoss can make the network focus more on the detection
of crack categories, thus improving the recall and F1 value.
Therefore, in the pavement crack segmentation task, the loss function is chosen ac-
cording to the characteristics of the difference between foreground and background scales
as follows:
1 N
BCELoss(yi , pi ) = − ∑ [yi log(pi ) + (1 − yi ) log(1 − pi )] (4)
N i=1

2 ∑N
i=1 yi pi + ε
DiceLoss(yi , pi ) = 1 − (5)
∑N N
i=1 yi + ∑i=1 pi + ε
Information 2023, 14, 182 6 of 11

3, 14, x FOR PEER REVIEW 6 of 12


N
1
N i∑
FocalLoss(pi ) = − (1 − pi )γ log(pi ) (6)
=1

2. The ALP-UNet structure.


Figurestructure.
Figure 2. The ALP-UNet
The yi in the above equation is the true value of the ith pixel in the pavement crack
2.4. Loss Function image, p is the probability value of the network prediction of the ith pixel prediction,
i
The pavement and N issegmentation
crack the number of task
pixels is in the image
a typical data. ε in
fine-crack DicelLoss is atask,
segmentation smoothing
and term to
its data is characterized by an extremely unbalanced proportion of positive and negativeparameter
prevent the denominator from going to zero. The γ in FocalLoss is the weight
thatlow
samples, i.e., a very adjusts the hard and
proportion easy samples.
of crack categories When
and the sample
a high is an easy-to-classify
proportion of back- sample,
the p is large, and the loss of easy-to-classify samples is significantly reduced after adding
ground categories. In ithis case, if only BCELoss [26] is used as the loss function, it may
the γ power. The pi of the hard-to-classify sample is around 0.5, and obviously, its loss is
cause the networkreduced
to overfit the background category and ignore the detection of the crack
much less than that of the easy-to-classify sample.
category. Therefore, otherIn thisloss functions need
segmentation task, weto be
set used to supplement
the weights of BCELoss,BCELoss
DiceLoss,to and
im- FocalLoss
prove the network’s focus in
as shown onEquation
the crack category
(7). The reason andfor segmentation accuracy.
setting equal weights for BCELoss and DiceLoss
DiceLoss [27]isisthat
a loss
thefunction based
objectives of theon theloss
two proportion
functionsof overlapping
are regions,iswhich
different, BCELoss more concerned
measures the similarity between segmentation results and labels, and is balanced forthe
with the classification accuracy, while DiceLoss is more concerned with similarity of
pos-
itive and negative samples. In the pavement crack segmentation task, using only BCELoss of these
the predicted segmentation results to the true segmentation, so the equal weights
as the loss function may cause the network to be insensitive to the differences between
positive and negative samples due to the extremely unbalanced ratio of positive and neg-
ative samples. Using DiceLoss allows the network to pay more attention to the differences
between positive and negative samples and optimize the proportion of overlapping re-
Information 2023, 14, 182 7 of 11

two loss functions can balance these two objectives. FocalLoss can enhance the performance
of the model by improving the learning of hard-to-classify samples. The reason for setting
the weight of FocalLoss to two is that there are a large number of hard-to-classify samples
in the pavement crack dataset of this paper, so the model needs to be enhanced to learn
these samples to improve the segmentation accuracy.

L = LBCE + Ldice + 2Lfocal (7)

3. Results
In this section, 150 road surfaces containing manhole covers, oil stains, and lane lines
were selected for experiments with different network structures to evaluate the performance
of the proposed method.

3.1. Training
The proposed method was tested on the ubuntu18.04 system with AMD Ryzen 7
5800 H with Radeon Graphics CPU @3.20 GHz and an NVIDIA GeForce RTX 3070 Laptop
GPU. It is implemented on the PyTorch [29] framework. The network uses the SGD
optimizer [30]. This network was trained from scratch for 100 epochs using an SGD
optimizer with a batch size of four, where the learning rate is 0.01, momentum is 0.9, and
the weight decaying factor is 0.0005. The polynomial decay was set to power 0.9 and the
minimum learning rate to 10−4 .
In the training phase, we used 700 cracked asphalt pavement images of 480 × 320 pixels
size that we acquired ourselves as training data. The data was mostly fine cracks and
contained distractions such as manhole covers, oil stains, and lane lines. Online data
enhancement was performed during training to reduce the effect of overfitting problems.
The training samples were randomly resized with a multiplier of 0.5 or 2 and randomly
trimmed from 480 × 320 pixels to 256 × 256 pixels. The input image also flipped horizon-
tally with a probability of 0.5. Moreover, photometric distortion was also added, such as
adjusting the brightness, chroma, contrast, and saturation of the image and adding noise.

3.2. Performance Evaluation


The model uses UNet as the backbone network for feature extraction. Based on the
characteristics of the asphalt pavement dataset used, UNet was selected to be combined
with CBAM and a Laplacian Pyramid and trained using PAN as an auxiliary head.
The task of pavement crack segmentation was pixel-level classification to judge
whether a pixel is a crack or a pavement, crack is the target that needs to be detected
and segmented, which are called positive classes, and others are called negative classes. By
comparing the segmentation result with the real value, true positives NTP , false positives
NFP , true negatives NTN , and false negatives NFN in the confusion matrix can be obtained.
NTP is the number of pixels that correctly classify a crack class into a crack class. NFP is
the number of pixels that misclassify noncrack class as a crack class. NTN is the number
of pixels that classify the noncrack class as a noncrack class. NFN is the number of pixels
that misclassify cracks into noncracks. In order to evaluate the results of pavement crack
segmentation, the check-all rate recall, the check-accuracy rate precision, the summation
average mFscore, the average pixel accuracy mPA, mean intersection over union mIou,
and the dice coefficient are selected as evaluation indexes. Their calculation formulae are
as follows:
TP
Recall = (8)
TP + FN
TP
Precision = (9)
TP + FP
k
1 2 × Precisionk × Recallk
mFscore = ∑
k + 1 i=0 Precisionk + Recallk
(10)
Information 2023, 14, 182 8 of 11

k
1
mPA = ∑
k + 1 i=0
Precisionk (11)

k
1 TP
mIou = ∑
k + 1 i=0 TP + FP + FN
(12)

2TP
Dice = (13)
2TP + FP + FN
We selected several open-source segmentation algorithms UNet, UperNet, ResUNet,
and Pointrend with good segmentation effects and compared the segmentation effects on
the collected pavement segmentation dataset. We used the same crop size and batch size
and performed the same number of iterations of training, and the segmentation results
are shown in Table 1 and Figure 3. From the results, we can see that Pointrend has a
certain effectiveness.
Table 1. Quantitative evaluations on the pavement crack dataset.

Architectures Recall Precision mFscore mPA mIou Dice


UNet 0.7595 0.5273 0.8089 0.7624 0.7213 0.6225
UperNet 0.7369 0.5543 0.8144 0.7731 0.7261 0.6301
Information 2023, 14, x FOR PEER REVIEWResUNet 0.8261 0.5265 0.8192 0.7625 0.7324 9 of 12
0.6431
Pointrend 0.7559 0.5717 0.8234 0.7846 0.7372 0.6510

(a)

(b)

(c)

(d)

(e)

(f)

Figure
Figure3. 3.
Results of pavement
Results of pavement segmentation dataset.
segmentation (a): input
dataset. colorcolor
(a): input images. (b): ground
images. truth. truth.
(b): ground (c):
results of the UNet. (d): results of the UperNet. (e): results of the ResUNet. (f): results of the
(c): results of the UNet. (d): results of the UperNet. (e): results of the ResUNet. (f): results of Poin-
trend.
the Pointrend.

3.3. 1.
Table Ablation Studyevaluations on the pavement crack dataset.
Quantitative
In this section, we𝐑𝐞𝐜𝐚𝐥𝐥
performed ablation 𝐦𝐅𝐬𝐜𝐨𝐫𝐞
𝐏𝐫𝐞𝐜𝐢𝐬𝐢𝐨𝐧 experiments 𝐦𝐏𝐀
by the proposed
𝐦𝐈𝐨𝐮 method𝐃𝐢𝐜𝐞 on the
Architectures
pavement crack dataset to verify the effectiveness of the proposed architecture, i.e., the
UNet 0.7595 0.5273 0.8089 0.7624 0.7213 0.6225
attention mechanism of the encoding layer (Att-UNet), the addition of weight standard-
UperNet 0.7369 0.5543 0.8144 0.7731 0.7261 0.6301
ization in the Att-UNet decoding layer (AttWS-UNet), the addition of multiscale Laplace
ResUNet 0.8261 0.5265 0.8192 0.7625 0.7324 0.6431
Pointrend 0.7559 0.5717 0.8234 0.7846 0.7372 0.6510

3.3. Ablation Study


Table 2. Comparison results of ablation experiments on different modules.

Architectures 𝐑𝐞𝐜𝐚𝐥𝐥 𝐏𝐫𝐞𝐜𝐢𝐬𝐢𝐨𝐧 𝐦𝐅𝐬𝐜𝐨𝐫𝐞 𝐦𝐏𝐀 𝐦𝐈𝐨𝐮 𝐃𝐢𝐜𝐞


Information 2023, 14, 182 Pointrend 0.7559 0.5717 0.8234 0.7846 0.7372 0.65109 of 11
Att-UNet 0.7987 0.5491 0.8232 0.7735 0.7368 0.6508
AttWS-UNet 0.7965 0.5514 0.8237 0.7747 0.7373 0.6482
AL-UNet 0.8125 0.5653 0.8313 0.7817 0.7459 0.6667
residuals in the AttWS-UNet decoding layer (AL-UNet), and add the PAN auxiliary head
ALP-UNet 0.8208 0.5802 0.8379 0.7889 0.7464 0.6798
module in AL-UNet (ALP-UNet). We trained them with the same crop size, batch size,
and the same loss function. We used 150 test images to test the segmentation results. The
From the above evaluation results, it can be seen that the segmentation results of
results of the qualitative comparison between these methods are shown in Tables 1 and 2
UNet, UperNet, ResUNet, and Pointrend were all affected by manhole covers and stains
and Figures 3 and 4. It can be seen that the previous methods are susceptible to the effects
on the pavement, and the Pointrend algorithm, which performs best in the evaluation
of cracked unexpected interfering objects, while the ALP-UNet with CBAM module, Lapla-
metrics in Figure 3, was also severely disturbed by manhole covers and lane lines. The
cian pyramid, and PAN structure reduces such effects and achieves better performance.
output accuracy
Therefore, of the model
we believe can be improved
that ALP-UNet by adding
can effectively a CBAM
reduce attentionand
interference mechanism
can better
inextract
each coding layer
detailed stage. By adding weight standardization and multilayer Laplace re-
features.
siduals in the coding layer, it can be seen from the segmentation results that not only the
segmentation accuracy
Table 2. Comparison of pavement
results of ablationcracks is improved
experiments but modules.
on different also the interference of other
factors is reduced. By adding the PAN auxiliary head, our model achieves the best results
Architectures
in mPA and mIou metrics,Recalland the
Precision mFscore
error between mPA and actual
the predicted mIou values Dice
of the
proposed model in this0.7559
Pointrend paper is minimized,
0.5717 effectively
0.8234 reducing
0.7846 the interference
0.7372 of other
0.6510
factorsAtt-UNet
on the pavement 0.7987 0.5491
on the segmentation 0.8232 and0.7735
accuracy, provides the 0.7368 0.6508
best performance
of theAttWS-UNet
mentioned model 0.7965 0.5514The results
in each metric. 0.8237validate
0.7747 0.7373 and superi-
the effectiveness 0.6482
AL-UNet 0.8125 0.5653 0.8313 0.7817 0.7459 0.6667
ority of the pavement 0.8208
ALP-UNet
crack segmentation
0.5802
model based on0.7889
0.8379
attention mechanism,
0.7464
weight
0.6798
standardization, Laplace pyramid, and PAN-assisted head.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

Figure
Figure4.4.Results
Resultsofofthe
thepavement
pavementsegmentation
segmentationdataset
datasetwith
withdifferent
differentmodules
modulesadded.
added.(a):
(a):input
input
color images; (b): ground truth; (c): results of the Pointrend; (d): results of the Att-UNet;
color images; (b): ground truth; (c): results of the Pointrend; (d): results of the Att-UNet; (e): (e): results
results
ofof
the AttWS-UNet;
the AttWS-UNet; (f):(f):
results ofof
results the AL-UNet;
the AL-UNet; (g): results
(g): ofof
results the
theALP-UNet.
ALP-UNet.

From the above evaluation results, it can be seen that the segmentation results of
UNet, UperNet, ResUNet, and Pointrend were all affected by manhole covers and stains
on the pavement, and the Pointrend algorithm, which performs best in the evaluation
metrics in Figure 3, was also severely disturbed by manhole covers and lane lines. The
Information 2023, 14, 182 10 of 11

output accuracy of the model can be improved by adding a CBAM attention mechanism
in each coding layer stage. By adding weight standardization and multilayer Laplace
residuals in the coding layer, it can be seen from the segmentation results that not only
the segmentation accuracy of pavement cracks is improved but also the interference of
other factors is reduced. By adding the PAN auxiliary head, our model achieves the best
results in mPA and mIou metrics, and the error between the predicted and actual values
of the proposed model in this paper is minimized, effectively reducing the interference
of other factors on the pavement on the segmentation accuracy, and provides the best
performance of the mentioned model in each metric. The results validate the effectiveness
and superiority of the pavement crack segmentation model based on attention mechanism,
weight standardization, Laplace pyramid, and PAN-assisted head.

4. Conclusions
In this paper, we proposed a novel pavement crack segmentation network based on
deep learning. The main idea of this method is to improve segmentation accuracy and
reduce the influence of pavement distractors on segmentation results by enhancing the
feature extraction and fusion capabilities of the UNet network. Specifically, we added a
CBAM attention module to capture the crack information more effectively during model
training. We added weight normalization to the decoding process to stabilize the training
process and improve accuracy. The boundary information in the feature map was fused into
the decoding layer using multiscale Laplacian residuals to refine the segmentation results.
We used the PAN structure to assist training by generating auxiliary supervision signals to
further improve the segmentation accuracy. Finally, we chose a reasonable loss function
for training, and the experimental results showed that our method achieves significant
advantages in several metrics compared to existing methods. Our method can effectively
segment small cracks and reduce the oversegmentation problems caused by lane lines or
other objects. This paper provides a novel and effective technical method for the field of
pavement crack image segmentation, and also provides insight and reference for other
similar problems. In future work, we plan to apply our method to more complex scenarios
with different types of pavement cracks and disturbances, and explore more effective
network architectures and crack segmentation methods.

Author Contributions: Conceptualization, X.G.; methodology, Y.Z. and X.G.; software, X.G.; val-
idation, X.G. and H.Z.; formal analysis, X.G.; investigation, X.G.; resources, Y.Z and H.Z.; data
curation, Y.Z. and X.G; writing—original draft preparation, X.G.; writing—review and editing, X.G.
and H.Z; visualization, X.G.; supervision, Y.Z.; project administration, Y.Z.; funding acquisition, Y.Z.
All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Huyan, J.; Li, W.; Tighe, S.; Xu, Z.; Zhai, J. CrackU-net: A novel deep convolutional neural network for pixelwise pavement crack
detection. Struct. Control Health Monit. 2020, 27, e2551. [CrossRef]
2. Wang, K.C.P.; Elliott, R.P. Investigation of Image Archiving for Pavement Surface Distress Survey; Springer: Berlin/Heidelberg,
Germany, 2016; Volume 9999, pp. 1–13.
3. Oliveira, H.; Correia, P.L. Automatic road crack detection and characterization. IEEE Trans. Intell. Transp. Syst. 2012, 14, 155–168.
[CrossRef]
4. Gopalakrishnan, K.; Khaitan, S.K.; Choudhary, A.; Agrawal, A. Deep convolutional neural networks with transfer learning for
computer vision-based data-driven pavement distress detection. Constr. Build. Mater. 2017, 157, 322–330. [CrossRef]
5. Hoang, N.D.; Nguyen, Q.L.; Tien Bui, D. Image processing–based classification of asphalt pavement cracks using support vector
machine optimized by artificial bee colony. J. Comput. Civ. Eng. 2018, 32, 04018037. [CrossRef]
6. Cha, Y.J.; Choi, W.; Büyüköztürk, O. Deep learning-based crack damage detection using convolutional neural networks. Comput.-
Aided Civ. Infrastruct. Eng. 2017, 32, 361–378. [CrossRef]
Information 2023, 14, 182 11 of 11

7. Shi, Y.; Cui, L.; Qi, Z.; Meng, F.; Chen, Z. Automatic road crack detection using random structured forests. IEEE Trans. Intell.
Transp. Syst. 2016, 17, 3434–3445. [CrossRef]
8. Oliveira, H.; Correia, P.L. Road surface crack detection: Improved segmentation with pixel-based refinement. In Proceedings of
the IEEE 2017 25th European Signal Processing Conference (EUSIPCO), Kos, Greece, 28 August–2 September 2017.
9. Lu, G.; Zhao, Q.; Liao, J.; He, Y. Pavement crack identification based on automatic threshold iterative method. In Proceedings
of the Seventh International Conference on Electronics and Information Engineering, Nanjing, China, 17–18 September 2016;
Volume 10322, pp. 320–325.
10. Dinh, T.H.; Ha, Q.P.; La, H.M. Computer vision-based method for concrete crack detection. In Proceedings of the 2016 14th
international conference on control, automation, robotics and vision (ICARCV), Phuket, Thailand, 13–15 November 2016; pp. 1–6.
11. Zhang, L.; Yang, F.; Zhang, Y.D.; Zhu, Y.J. Road crack detection using deep convolutional neural network. In Proceedings of the
2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3708–3712.
12. Zhang, A.; Wang, K.C.; Li, B.; Yang, E.; Dai, X.; Peng, Y.; Fei, Y.; Liu, Y.; Li, J.Q.; Chen, C. Automated pixel-level pavement crack
detection on 3D asphalt surfaces using a deep-learning network. Comput.-Aided Civ. Infrastruct. Eng. 2017, 32, 805–819. [CrossRef]
13. Mandal, V.; Uong, L.; Adu-Gyamfi, Y. Automated road crack detection using deep convolutional neural networks. In Proceedings
of the 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 10–13 December 2018; pp. 5212–5215.
14. König, J.; Jenkins, M.D.; Barrie, P.; Mannion, M.; Morison, G. A convolutional neural network for pavement surface crack
segmentation using residual connections and attention gating. In Proceedings of the 2019 IEEE International Conference on
Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 1460–1464.
15. Garbowski, T.; Gajewski, T. Semi-automatic inspection tool of pavement condition from three-dimensional profile scans. Procedia
Eng. 2017, 172, 310–318. [CrossRef]
16. Seo, H.; Huang, C.; Bassenne, M.; Xiao, R.; Xing, L. Modified U-Net (mU-Net) with incorporation of object-dependent high level
features for improved liver and liver-tumor segmentation in CT images. IEEE Trans. Med. Imaging 2019, 39, 1316–1325. [CrossRef]
[PubMed]
17. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of
the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich,
Germany, 5–9 October 2015; Springer International Publishing: New York, NY, USA, 2015; pp. 234–241.
18. Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference
on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19.
19. Lai, W.S.; Huang, J.B.; Ahuja, N.; Yang, M.H. Deep laplacian pyramid networks for fast and accurate super-resolution. In Proceed-
ings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 624–632.
20. Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141.
21. Park, J.; Woo, S.; Lee, J.Y.; Kweon, I.S. Bam: Bottleneck attention module. arXiv 2018, arXiv:1807.06514 2018.
22. Song, M.; Lim, S.; Kim, W. Monocular depth estimation using laplacian pyramid-based depth residuals. IEEE Trans. Circuits Syst.
Video Technol. 2021, 31, 4381–4393. [CrossRef]
23. Wang, Z.; Cui, Z.; Zhu, Y. Multi-modal medical image fusion by Laplacian pyramid and adaptive sparse representation. Comput.
Biol. Med. 2020, 123, 103823. [CrossRef] [PubMed]
24. Ghiasi, G.; Fowlkes, C.C. Laplacian pyramid reconstruction and refinement for semantic segmentation. In Proceedings of the
European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; Springer: Cham, Germany, 2016;
pp. 519–534.
25. Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE conference on
Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768.
26. Zhao, R.; Qian, B.; Zhang, X.; Li, Y.; Wei, R.; Liu, Y.; Pan, Y. Rethinking dice loss for medical image segmentation. In Proceedings
of the 2020 IEEE International Conference on Data Mining (ICDM), Sorrento, Italy, 17–20 November 2020; pp. 851–860.
27. Crum, W.R.; Camara, O.; Hill, D.L.G. Generalized overlap measures for evaluation and validation in medical image analysis.
IEEE Trans. Med. Imaging 2006, 25, 1451–1461. [CrossRef] [PubMed]
28. Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020,
42, 318–327. [CrossRef] [PubMed]
29. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch:
An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32.
30. Bottou, L. Stochastic gradient descent tricks. In Neural Networks: Tricks of the Trade; Springer: Berlin/Heidelberg, Germany, 2012;
pp. 421–436.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like