A Pavement Crack Detection Method Based On Multiscale
A Pavement Crack Detection Method Based On Multiscale
Research Article
A Pavement Crack Detection Method Based on Multiscale
Attention and HFS
Received 15 October 2021; Revised 19 December 2021; Accepted 28 December 2021; Published 27 January 2022
Copyright © 2022 Chun Li et al. This is an open access article distributed under the Creative Commons Attribution License, which
permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
To solve the problem of low detection accuracy due to the loss of detailed information when extracting pavement crack features in
traditional U-shaped networks, a pavement crack detection method based on multiscale attention and hesitant fuzzy set (HFS) is
proposed. First, the encoding-decoding structure is used to construct a pavement crack segmentation network, ResNeXt50 is used
to extract features in the encoding stage, and a multiscale feature fusion module (MFF) is designed to obtain multiscale context
information. Second, in the decoding stage, a high-efficiency dual attention module (EDA) is used to enhance the ability of
capturing details of the cracks while suppressing background noise. Finally, the membership degree of the crack is calculated based
on the advantages of the HFS in multiattribute decision-making to obtain the similarity of the crack, and the binary image after
segmentation is judged by the hesitation fuzzy measure. The experiment was conducted on the public road crack dataset Crack500.
In terms of segmentation performance, the evaluation indexes Intersection over Union (IoU), Precision, and Dice coefficients of
the proposed network reached 55.56%, 74.26%, and 67.43%, respectively; in terms of classification performance, for transversal
and longitudinal cracks, the classification accuracy was 84% ± 0.5%, while the block and the alligator were both 78% ± 0.5%. The
experimental results prove that the crack details detected by the proposed method are more abundant, and the image detection
effect of complex topological structures and small cracks are better.
selection method performs crack detection from a global pavement cracks and applied a principal component analysis
perspective, its detection performance is still unsatisfactory to classify the detected pavement cracks. The crack types
when dealing with cracks with disordered shapes or low were divided into transversal, longitudinal, cracked cracks.
contrast with surrounding pixels. It can be seen that automatic The accuracy scored higher than 90%. Nevertheless, patch
detection of road cracks is still a difficult task for researchers. classification is only suitable for rougher classification tasks.
In recent years, deep learning has been applied to road Cubero-Fernandez et al. [16] classified the discontinuous
crack detection tasks due to its outstanding feature ex- cracks in an image as a whole, though they did not consider
traction capabilities. Pauly et al. [6] cropped each crack the spatial distribution relationship between the cracks.
image into a patch, and then the patch was classified as crack Existing road crack detection methods enhance the
or noncrack after neural network training. Although this extraction and classification capabilities of crack features
method was very efficient, it produced false detections. To through global context modules, attention mechanisms,
further improve its detection accuracy, semantic segmen- and principal component analysis methods to improve
tation algorithms based on the encoding-decoding archi- detection and classification accuracy. Because of the crack
tecture are widely used. Lau et al. [7] introduced U-Net to image detection, the foreground pixels are relatively small
road crack detection. The network introduced skip con- and have different lengths and widths. If a natural image
nections into the encoding-decoding architecture, which detection method with a large proportion of foreground
helped to preserve rich image details, thereby improving the pixels is used, the effect is often poor, and eventually the
detection accuracy. Although U-Net performs well in the information of the detected cracks will be lost, thereby
field of image segmentation, the crack area of the crack affecting the detection effect. Therefore, this paper pro-
image is much smaller than the background area. Cao et al. poses a road crack detection method based on multiscale
[8] replaced the U-Net encoder with ResNet34 to deal with attention and HFS. The method is mainly divided into two
the loss of spatial information caused by continuous pooling. tasks: semantic segmentation of the crack image [1] to
Effectively avoiding gradient disappearance or gradient realize the separation of the crack area and the noncrack
explosion, Chen et al. [9] embedded a global context module area and the classification of the segmented binary image.
in the U-Net network structure to give the network the For the first task, the proposed solution uses rectangular
ability to capture global context information, which is soft pooling instead of global average pooling, which
conducive to the detailed segmentation of pavement crack effectively extracts long and narrow fracture feature in-
images. Augustauskas and Lipnickas [10] introduced a kind formation; rectangular pooling is used to fuse multiscale
of attention based on the U-shaped network. The force gate feature information to expand the receptive field of the
model suppresses background noise and strengthens the network, so that the small proportion of crack informa-
ability of the network to capture detailed features of cracks. tion in the image is also noticed, thereby improving the
Fan et al. [11] proposed an end-to-end pixel-level road crack accuracy of segmentation, using channel and spatial at-
detection network. By building multiple expansion convo- tention to assign the importance of two-dimensional
lution modules to help the network obtain the multiscale weights, and based on this importance to improve the
context information of the cracks, a hierarchical feature useful information for crack identification and suppress
learning module is designed to integrate low-level features useless information. Different from existing segmentation
and high-level features. The designed multiscale output methods, our proposed solution is more suitable for
feature map has better performance in fracture information segmenting images with unbalanced aspect ratios such as
inference, thereby improving the robustness and universality cracks. For the second task, the core of the classification
of the network. Ali et al. [12] implemented a deep fully algorithm of the proposed solution is to define the number
convolutional neural network based on residual blocks. For of crack branches, the number of inflection points, and the
the extreme imbalance between target and background centroid distance index and use the multiattribute deci-
pixels in crack images, a local weighting factor was proposed sion-making of HFS to calculate the similarity to classify.
to effectively reduce the trouble caused by pixel imbalance to Compared with the existing classification algorithms, our
the network; a crack image dataset with different crack width method provides a more detailed qualitative classification
directions and a location dataset were developed for re- method, which can derive the crack image category from
searchers to use for training, validation, and testing. Fan the comprehensive analysis of multiple indicators.
et al. [13] proposed a road crack automatic detection and The main contributions of this article are as follows:
measurement network based on probability fusion. Through
(1) Based on the encoding-decoding architecture, a
the designed integrated neural network model, satisfactory
multiscale feature fusion module is designed to
crack detection accuracy is obtained; according to the
obtain more receptive fields, so as to improve the
predicted crack map, the width and length of the crack can
network’s ability to recognize disordered cracks.
be measured effectively. Wang et al. [14] proposed a sem-
isupervised semantic segmentation network for crack de- (2) Design an efficient dual attention module to realize
tection. The model extracts multiscale crack feature the information interaction between spatial features
information through Efficient-UNet; it greatly reduces the and channel features, so as to improve the network’s
workload of labeling while maintaining high labeling ac- anti-interference ability and network feature ex-
curacy. Wang et al. [15] used a neural network to detect traction ability.
Computational Intelligence and Neuroscience 3
(3) The segmented binary image is analyzed by the mechanism [19–21], it does not require dimensionality re-
connected domain algorithm, and the advantages of duction operations to capture rich semantic information.
hesitant fuzzy sets in multiattribute decision-making However, in the high-level features of the cracks, it lacks
are used to calculate the fracture multiattribute sufficient spatial information. Inspired by the cascade of
membership degree to obtain the similarity of the channel and spatial attention in Convolutional Block At-
fractures to determine the fracture category. tention Module (CBAM) [22], this paper introduces the
importance of pooling in the ECA attention mechanism [23]
The rest of the paper is structured as follows: the second
to fuse channel weight information and spatial position
part elaborates the fracture segmentation network based on
information, while also designing an efficient dual attention
multiscale attention and the crack classification method
module EDA. Its operating principle is shown in Figure 2,
based on hesitant fuzzy sets; the third part analyzes and
which can make the network better distinguish the im-
discusses the experimental results; the fourth part sum-
portance of crack features and further improve the accuracy
marizes the paper.
of the segmentation network.
The module is divided into upper and lower branches:
2. Materials and Methods
(1) The upper branch is used to capture channel at-
2.1. Fracture Segmentation Network Based on Multiscale tention features. This branch first obtains the global
Attention. The overall structure of our proposed solution receptive field through a global average pooling
Multiscale Attention Crack Segmentation Network operation and then uses 2D convolution to achieve
(MACSNet) is shown in Figure 1. channel interaction, taking each channel and its K �
The network consists of Encoders (E1, E2, E3, E4), 3 neighbouring channels to generate local cross-
Decoders (D1, D2, D3, D4), an EDA module and a MFF channel attention information, as shown in
module. When designing the network structure, considering
K
that the proportion of crack pixels in the image is small, the ⎝ F · Y ⎞
Fi � σ ⎛ ⎠ Yi,j ∈ Ωi,k ,
j i,j , (1)
network structure should not be too deep, but a certain j�1
degree of accuracy must be ensured. Therefore, the encoder
uses ResNeXt50 as the basic network to extract the char- where Fj represents the one-dimensional convolu-
acteristics of the input crack image. Its essence is grouped tion of size K , Ωi,k represents the set of K adjacent
convolution, and the algorithm performance is improved by channels of the input feature Yi , σ represents the
increasing the number of branches. The encoder in this sigmoid activation function, and Fi represents the
structure retains the first five feature extraction modules of attention information of K adjacent channels to the
ResNeXt50, which are named pooling and E1-E4, respec- current channel.
tively, as shown in Figure 1.
In addition, to obtain multiscale features, a multiscale (2) The next branch is used to capture spatial attention
feature fusion operation is performed after the E4 encoder to features, which trains a weight function on the
better extract the multiscale context information from the original feature map, similar to the attention func-
crack image and optimise the segmentation effect, as well as tion, and then to perform a weighted average with
to incorporate efficient dual attention into the jump con- the original image. Then, the weight function dis-
nection of encoding and decoding. The module allows the tributes the spatial feature weights through the
network to effectively integrate the low-level spatial reso- sigmoid to obtain the importance space attention, as
lution and the high-level semantic information, while fur- shown in
ther paying attention to the area where the crack is located. i∈R ai · exp G ai ⎞
The module also combines the advantages of subpixel ⎝conv2d⎛
Fs � σ ⎛ ⎝ ⎠⎞
⎠, (2)
convolution and bilinear interpolation in the decoder j�R expGaj
D1–D4 and design a parallel feature fusion structure to
sequentially restore image resolution and detailed infor- where ai represents the original feature map, exp makes the
mation. To further integrate spatial resolution and high-level weight value nonnegative and easy to optimise, and G
semantic information, inspired by dense connection [17], D3 represents the weight function after the network obtains G
and D4 are, respectively, upsampled twice. Then, the results through training to enhance specific features.
of the 3 levels are superimposed in a concatenate manner.
2.1.2. Multiscale Feature Fusion Module. ASPP [24] obtains
2.1.1. Efficient Dual Attention Module. The features multiscale fracture feature information through hole con-
extracted in the fracture segmentation network must not volution with different sampling rates and has achieved good
only contain enough spatial information to locate small- results in classification and segmentation tasks. Because the
scale cracks, but also contain rich semantic information to fracture shape is long and narrow, only the adaptive average
effectively distinguish between cracks and other interference pooling is not enough to obtain the global context infor-
information. The Efficient Channel Attention (ECA) [18] mation of the fracture, while the strip pooling [25] can
strengthens the feature propagation ability of the channel obtain the long-distance dependence. Inspired by this, the
dimension. Compared with the classic channel attention Soft-Pool [26] of strip shape is introduced on the basis of
4 Computational Intelligence and Neuroscience
E1 Output
Input EDA
E2
E3 EDA
E4
MFF
D1
D2
EDA D3
D4
EDA
K=3
GAP
S
1×1×C
Fi:W×H×C
2W×H×C
2D Conv
W×H×C Stri- S
pool
W×H×1 Fs:W×H×C
ASPP, which effectively increases the global features. This summing all of the weighted activations in the kernel
research combines the advantages of ASPP and Soft-Pool to neighbourhood R, as shown in the following equation:
design a multiscale feature fusion module MFF, which ef-
fectively combines global information and multiscale con- a � Wi ∗ ai , (4)
i∈R
text information, reducing the discontinuity problem in
fracture segmentation, as shown in Figure 3.
The MFF first undergoes a 1 × 1 convolution to reduce
the dimensionality and then obtains a multiscale parallel 2.1.3. Decoder Module. The decoder restores the image
structure through a variety of sampling rates and pooling resolution through an upsampling method. A common
methods. The first three branches of the parallel structure upsampling method is bilinear interpolation, which restores
fuse hole convolutions with different sampling rates to the resolution through the neighbouring pixel values, but the
obtain multiscale information. The latter of the two branches restored boundary is blurry. The introduction of subpixel
obtains global information that is more suitable for the shape convolution in superresolution [27] can make the details of
of the crack through soft pooling of the strip shape. Strip soft the image clearer. Therefore, the decoding block shown in
pooling mainly uses the maximum approximate R in the Figure 4 is designed by combining bilinear interpolation and
activation area. Each activation ai with index i applies a subpixel convolution. The upper branch undergoes general
weight Wi , and the weight is equal to the natural number of operations, such as a 1 × 1 convolution, batch normalisation,
activation values divided by the sum of the natural expo- and ReLU, and then applies bilinear interpolation for
nents of all activation values, as shown in the following upsampling; the lower branch is subpixel convolution. After
equation: the fusion of the two branch features, the detailed infor-
mation of the detected cracks is more complete, and the
eai amount of calculation added is small. The structure of the
Wi � . (3)
j�R eaj decoder module is shown in Figure 4.
This weight is multiplied by the corresponding activation
value to make a nonlinear transformation together, and the 2.2. Crack Classification Algorithm Based on Hesitant Fuzzy
higher activation has a more obvious impact on the output. Sets. The type of crack is an important index to evaluate the
Since pooling is performed in a high-dimensional feature quality of the pavement. The evaluation of different types of
space, highlighting the maximum activation effect is more cracks directly affects the decision-making of different
reasonable than directly selecting the maximum value. The maintenance strategies. This section extracts the five features
output value of the Soft-Pool operation is obtained by of the number of cracks in the image, the number of
Computational Intelligence and Neuroscience 5
3×3Conv +
rate=6 W×H× (C/4)
3×3Conv +
rate=12 W×H× (C/4)
3×3Conv +
W×H× (C/4)
W×H×C rate=18 W×H× (5×C/4) W×H×C
SoftPool up
W/4×H× (C/4) W×H× (C/4)
SoftPool
up
BatchNorm
Conv 1×1 Interpolate +
Relu
Sub-pixel
Convolution
inflection points, the average centroid distance, the angle E � < x, hE (x) > |x ∈ X, (5)
between centroids, and the area of the cracks while using the
decision-making advantages of hesitation fuzzy set theory to where hE (x) represents the set of membership degrees of the
realize crack classification. element x in the set X to the set E. It is a possible mem-
On the basis of Zadeh fuzzy set theory [28], Torra bership set in [0,1]; then, h � hE (x) is called the hesitant
proposed a hesitating fuzzy set [29], which allows the degree fuzzy element. P and Q are two hesitant fuzzy sets about
to which the element belongs to the set to be given in the X � x1 , x2 , x3 , . . . , xn ; then the generalized hesitant fuzzy
form of a set of multiple possible values, in order to ef- distance measure between them is shown in
fectively characterize the uncertainty in decision-making. In
the formula below, X is a nonempty set and call E the
hesitant fuzzy set:
lx 1/λ
1 n ⎡ 1 i σ(j) λ
dghn (P, Q) � ⎣ hP xi − hQ xi ⎤⎥⎥⎦ , s(P, Q) � 1 − dghn (P, Q),
⎢
⎢ σ(j) (6)
n i lxi j�1
σ(j)
where n is the number of elements in the set X, hP (xi ) between them. This helps to realize the measurement of
σ(j)
and hQ (xi ) are the j-th largest values in hP (xi ) and hQ (xi ), multiattribute similarity.
respectively, s(P, Q) is the corresponding similarity, and λ is After detecting the pixel area containing the crack, the
the control parameter. The generalized hesitant fuzzy dis- connected area algorithm is used to divide the crack into
tance measure gives the distance calculation formula of two independent crack branch targets. After analysis, the
hesitant fuzzy sets under multiple attributes and multiple number, area, centre position coordinates, and approximate
indicators. The smaller the distance, the greater the similarity length and the width of the crack branches are obtained. To
6 Computational Intelligence and Neuroscience
Definition 2. NP is the hesitant fuzzy evaluation attribute of Figure 5: Fracture image analysis.
the crack image.
to 0.5, it is M type. Figure 5 shows the average distance
Definition 3. The cracks are divided into four types:
between the judgement and the calculated centroids. When
transversal, longitudinal, block, and alligator, which are
the average distance between the centroids of Cqj and Cq is
divided into T, V, M, and C.
closer, the similarity is higher.
ALGORITHM 1.
of different modules on the segmentation results. The fol- Table 1: The effect of weight value changes on the results.
lowing modules are added to verify the effectiveness of the c α Accuracy Precision Dice IoU
modules based on the U-shaped network with the ResNeXt
- — 97.12 69.81 65.20 54.12
encoder. The training parameters of each network con- 1 0.70 96.21 70.43 63.75 52.69
taining different modules are consistent with the proposed 1 0.75 96.90 70.49 64.12 53.24
network. 1 0.80 97.12 72.06 66.18 53.73
EDA: the high-efficiency dual attention module is added. 1 0.85 97.33 73.89 66.93 54.56
As seen in Table 1, Precision, Dice, and IoU have increased 1 0.90 97.29 73.60 66.87 54.51
by 0.47%, 0.12%, and 0.51%, respectively. Therefore, we can 2 0.10 98.35 71.85 65.50 53.22
conclude that the attention module is effective for pavement 2 0.15 98.48 74.30 66.24 53.90
crack detection tasks. 2 0.20 98.57 74.27 66.91 55.23
MFF: the multiscale feature fusion module is added. As 2 0.25 98.62 74.26 67.43 55.56
seen in Table 1, Precision, Dice, and IoU have increased by 2 0.30 98.61 74.09 67.38 55.51
2.25%, 1.11%, and 1.23%, respectively, which proves the Bold values are the best performing values.
effectiveness of adding the multiscale feature module.
Focal loss: after replacing the cross-entropy loss function information more abundantly, and qualitatively analyze the
with the focal loss function, the experimental results show effectiveness of MACSNet.
that Precision, Dice, and IoU increase by 1.66%, 0.87%, and Quantitative analysis: According to the evaluation in-
0.60%, respectively. dicators in Section 3.1.4, the test results are obtained on the
The results of the ablation experiment are shown in Table 2. public dataset Crack500, as shown in Table 3. Accuracy and
The focus loss function and the multiscale feature fusion Precision alone are not enough to judge the performance of
module improve the network performance most substantially. each algorithm for splitting cracks. At the same time,
The high-efficiency dual attention module contributes to the comprehensive evaluation indices Dice and IoU are used to
improvement of Precision and IoU. Due to the small and evaluate the performance of each algorithm. It can be
complex topological structure of the pavement cracks, the focus seen from Table 2 that MACSNet’s Dice is 2.10%, 3.58%,
loss function improves the segmentation quality of the small 2.17%, and 1.25% higher than U-Net, CE-Net, Deep-
cracks. The multiscale feature fusion module obtains multiscale Labv3, and DeepLabv3+, respectively. IoU is higher
context information to solve the complex topological structure than U-Net, CE-Net, and DeepLabv3, respectively. And
presented by the crack image. The high-efficiency dual at- DeepLabv3+ is 2.34%, 3.97%, 2.22%, and 1.13% higher.
tention mechanism suppresses noise information, such as Therefore, it is verified that the effect of MACSNet is
shadows and scratches through the importance of the channel substantial, which is consistent with the results of the
and space features, effectively enhancing the characteristic qualitative analysis.
ability of the network. The time complexity of the MACSNet algorithm pro-
posed in this paper is shown in Table 3. Frames Per Second
(2) Compared with Existing Algorithms. Qualitative analysis: (FPS) represents how many frames of images the algorithm
To verify the performance of MACSNet in road crack de- can process in one second, and we use FPS to represent the
tection, the algorithm in this paper is compared with other time complexity. Although our method is slower than the
algorithms on the public dataset Crack500, including U-Net general segmentation methods U-Net and CE-Net, it is
[35], CE-Net [21], DeepLabv3 [24], and DeepLabv3+ [36]. faster than the advanced segmentation method Deep-
Their data enhancement and training methods use the Labv3+. To analyze the reason, we have made a compro-
methods described in 3.1 and 3.2. Figures 8(a)–8(g) show mise in time complexity in order to improve the
some of the output results. segmentation accuracy of the algorithm, but our method
The segmentation results of each algorithm can be seen can still reach the real-time standard and has obvious
directly in Figure 8. When the crack topology in the image is advantages in time complexity.
simple, the above five algorithms can segment the cracks
well, as shown in the first row of the above figure. When (3) Compared with Other Advanced Algorithms. To further
there are shadows, scratches, and other noises in the illustrate the effectiveness of MACSNet, MACSNet is com-
background of the image, as in the second and third rows, pared with other advanced methods under the same dataset,
U-Net, DeepLabv3, DeepLabv3+, and CE-Net all have dif- and the results are shown in Table 4. In this paper, the
ferent degrees of crack segmentation discontinuity prob- MACSNet algorithm’s Accuracy, Precision, and IoU values
lems, and MACSNet can segment continuous cracks. The are better than other road crack segmentation algorithms.
reason may be considering the global context information.
When the topological structure of the crack in the image is
complex, such as in Lines 4–6, the missed detections of 3.2. Crack Classification Experiment Analysis
U-Net, DeepLabv3, and CE-Net are more serious. Although
DeepLabv3+ missed a few cracks, as seen in the fifth line, 3.2.1. Dataset Introduction. In order to verify the effec-
it lacks the integrity of the cracks. Contours and tiveness of the crack classification method based on the
MACSNet also add a local importance attention mechanism, hesitant fuzzy set, the dataset in this section selects 948
which can accurately segment small cracks, extract feature images from the dataset in Section 3.1. It is divided into four
10 Computational Intelligence and Neuroscience
Table 2: Test results of each algorithm’s segmentation index on the Crack500 dataset.
EDA MFF Focal loss Precision % Dice % IoU %
✓ 70.35 65.45 53.73
✓ ✓ 72.60 66.56 54.96
✓ ✓ 73.01 66.90 54.89
✓ 71.43 65.36 53.82
✓ ✓ 73.86 67.21 55.01
✓ ✓ ✓ 74.26 67.43 55.56
Figure 8: Segmentation results of each algorithm on the Crack500 dataset. (a) Image. (b) GT. (c) MACSNet. (d) U-Net. (e) DeepLabv3.
(f ) DeepLabv3+. (g) CE-Net.
Table 3: Test results of each algorithm’s segmentation index on the Crack500 dataset.
Algorithm Accuracy% Precision% Dice% IoU% FPS
U-Net [35] 96.12 69.88 65.33 53.22 47.49
CE-Net [21] 96.89 70.47 63.85 51.59 43.90
DeepLabv3 [24] 96.91 70.41 65.26 53.34 23.12
DeepLabv3+ [36] 96.94 71.06 66.18 54.43 23.80
MACSNet (our) 98.62 74.26 67.43 55.56 30.91
Computational Intelligence and Neuroscience 11
categories: 404 transversal cracks; 276 longitudinal cracks; Table 4: Comparison with other advanced methods on the
57 block cracks; 40 alligator cracks. Crack500 dataset.
Algorithm Accuracy % Precision % IoU %
3.2.2. Evaluation Index. In order to analyze the effectiveness Chen et al. [9] — — 51.40
of the multiattribute fuzzy classification method on crack Augustauskas et al. [10] 98.32 64.47 53.34
images, the Recall (R) and Precision (P) are selected to Cao et al. [8] N.A 68.05 54.92
evaluate the image classification results. MACSNet (our) 98.62 74.26 55.56
The time complexity of the hesitant fuzzy set classifi- The follow-up work is mainly carried out from the
cation algorithm proposed in this paper is shown in Table 6. following two aspects: The network training time cost is
It can be seen from the table that our method is better than considered and the lightweight semantic segmentation
method 1 and is slightly lower than method 2. To analyze the network is introduced into the multiscale attention seg-
reason, we introduced the hesitant fuzzy attribute on the mentation network to achieve faster and accurate binary
basis of the connected domain labeling algorithm, which images. According to the complex topology of the cracks, it
increased the calculation degree of the algorithm and caused is necessary to improve the attribute index and optimise the
our time complexity to be slightly lower than that of method classification method of crack images, especially for classi-
2. But our method has a better balance between accuracy and fying massive cracks and cracked cracks.
time complexity and has obvious advantages compared to
methods 1 and 2. Data Availability
4. Conclusion The data that support the findings of this study are available
upon request.
In the current crack detection methods, most of them only
segment the crack images and do not involve classification, Conflicts of Interest
but the type of cracks is very important to the evaluation of
the road health status, so we propose multiscale attention The authors declare that they have no conflicts of interest.
and HFS crack detection and classification method. This
method distributes the weights of the two dimensions of
Acknowledgments
channel and space through cross-channel attention and local
importance pooling, so that the network automatically pays This work is supported by the Natural Science Foundation of
more attention to the characteristic information of the crack Hebei Province, China (Grant Nos. F2019201329 and
area and further improves the detection accuracy of the F2019201451), and the Science and Technology Project of
crack. A multiscale feature fusion module is designed to fuse Hebei Education Department, China (Grant Nos. ZD2019131
multiscale context information, and the rectangular pooling and QN2018214).
method is used instead of average pooling to retain im-
portant fracture information. This detection method is more
suitable for the detection of crack images with unbalanced
References
aspect ratios than existing methods. On this basis, a road [1] Z. Liu, Y. Cao, Y. Wang, and W. Wang, “Computer vision-
crack image classification method based on HFS is designed. based concrete crack detection using U-net fully convolu-
On the basis of the connected domain algorithm, using the tional networks,” Automation in Construction, vol. 104,
advantages of HFS in multiattribute decision-making, the pp. 129–139, 2019.
membership degree of the cracks is calculated, and the [2] W. Song, G. Jia, H. Zhu, D. Jia, and L. Gao, “Automated
similarity of the cracks is obtained for classification pavement crack damage detection using deep multi-scale
judgement. This classification method uses fuzzy multi- convolutional features,” Journal of Advanced Transportation,
vol. 2020, Article ID 6412562, 11 pages, 2020.
attribute special features to further improve classification
[3] A. Akagic, E. Buza, S. Omanovic, and A. Karabegovic,
accuracy. Through comparative experiments, the effective- “Pavement crack detection using Otsu thresholding for image
ness of the above methods is verified. Experimental results segmentation,” in Proceedings of the 41st International Con-
show that this method has good crack detection and clas- vention on Information and Communication Technology,
sification effects, and it has a certain auxiliary effect on the Electronics and Microelectronics (MIPRO), pp. 1092–1097,
evaluation of road health. IEEE, Opatija, Croatia, May 2018.
Computational Intelligence and Neuroscience 13
[4] R. Medina, J. Llamas, E. Zalama, and J. Gómez-Garcı́a-Bermejo, Vision and Pattern Recognition, pp. 7132–7141, Salt Lake City,
“Enhanced automatic detection of road surface cracks by UT, USA, June 2018.
combining 2D/3D image processing techniques,” in Proceedings [20] X. Li, W. Wang, X. Hu, and J. Yang, “Selective kernel net-
of the 2014 IEEE International Conference on Image Processing works,” in Proceedings of the IEEE/CVF Conference on
(ICIP), pp. 778–782, IEEE, Paris, France, October 2014. Computer Vision and Pattern Recognition (CVPR), pp. 510–
[5] R. Amhaz, S. Chambon, J. Idier, and V. Baltazart, “A new 519, California, USA, June 2019.
minimal path selection algorithm for automatic crack de- [21] Z. Gu, J. Cheng, H. Fu et al., “Ce-net: context encoder network
tection on pavement images,” in Proceedings of the 2014 IEEE for 2d medical image segmentation,” IEEE Transactions on
International Conference on Image Processing (ICIP), Medical Imaging, vol. 38, no. 10, pp. 2281–2292, 2019.
pp. 788–792, IEEE, Paris, France, October2014. [22] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: con-
[6] L. Pauly, D. Hogg, R. Fuentes, H. David, and R. Fuentes, volutional block attention module,” in Proceedings of the
“Deeper networks for pavement crack detection,” in Pro- European Conference on Computer Vision (ECCV), pp. 3–19,
ceedings of the 34th ISARC, pp. 479–485, IAARC, Taipei, Munich, Germany, July 2018.
Taiwan, June 2017. [23] Z. Gao, L. Wang, and G. Wu, “Lip: local importance-based
[7] S. L. H. Lau, E. K. P. Chong, X. Yang, and X. Wang, “Au- pooling,” in Proceedings of the IEEE/CVF International
tomated pavement crack segmentation using u-net-based Conference on Computer Vision (ICCV), pp. 3355–3364,
convolutional neural network,” IEEE Access, vol. 8, Article ID Seoul, Korea, November 2019.
114899, 2020. [24] L. C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Re-
[8] Y. H. Cao, G. T. Yang, and X. Y. Yang, “Deep learning thinking atrous convolution for semantic image segmenta-
pavement crack detection based on attention mechanism,” tion,” 2017, https://arxiv.org/abs/1706.05587.
Journal of Computer-Aided Design & Computer Graphics, [25] Q. Hou, L. Zhang, M. M. Cheng, and J. Feng, “Strip pooling:
vol. 32, no. 8, pp. 1324–1333, 2020. rethinking spatial pooling for scene parsing,” in Proceedings of
[9] J. Chen, G. Liu, and X. Chen, “Road crack image segmentation the IEEE/CVF Conference on Computer Vision and Pattern
using global context U-net,” in Proceedings of the 2019 3rd Recognition (CVPR), pp. 4003–4012, Seattle, Washington,
International Conference on Computer Science and Artificial USA, June 2020.
Intelligence, pp. 181–185, Association for Computing Ma- [26] A. Stergiou, R. Poppe, and G. Kalliatakis, “Refining Activation
chinery, New York, USA, December2019. Downsampling with Softpool,” 2021, https://arxiv.org/abs/
[10] R. Augustauskas and A. Lipnickas, “Improved pixel-level
2101.00440.
pavement-defect segmentation using a deep autoencoder,” [27] W. Shi, J. Caballero, F. Huszár et al., “Real-time single image
Sensors, vol. 20, no. 9, 2020.
and video super-resolution using an efficient sub-pixel con-
[11] Z. Fan, C. Li, Y. Chen et al., “Automatic crack detection on
volutional neural network,” in Proceedings of the IEEE
road pavements using encoder-decoder architecture,” Mate-
Conference on Computer Vision and Pattern Recognition,
rials, vol. 13, no. 13, 2020.
pp. 1874–1883, New York, USA, September2016.
[12] R. Ali, J. H. Chuah, M. S. A. Talip, N. Mokhtar, and
[28] L. A. Zadeh, “Fuzzy sets,” Information and Control, vol. 8,
M. A. Shoaib, “Automatic pixel-level crack segmentation in
no. 1, pp. 338–353, 1965.
images using fully convolutional neural network based on
[29] V. Torra, “Hesitant fuzzy sets,” International Journal of In-
residual blocks and pixel local weights,” Engineering Appli-
telligent Systems, vol. 25, no. 6, pp. 529–539, 2010.
cations of Artificial Intelligence, vol. 104, Article ID 104391,
[30] W. Song, G. Jia, D. Jia, and H. Zhu, “Automatic pavement
2021.
crack detection and classification using multiscale feature
[13] Z. Fan, C. Li, Y. Chen et al., “Ensemble of deep convolutional
neural networks for automatic pavement crack detection and attention network,” IEEE Access, vol. 7, Article ID 171012,
measurement,” Coatings, vol. 10, no. 2, p. 152, 2020. 2019.
[14] W. Wang and C. Su, “Semi-supervised semantic segmentation [31] L. Zhang, F. Yang, Y. D. Zhang, and Y. J. Zhu, “Road crack
network for surface crack detection,” Automation in Con- detection using deep convolutional neural network,” in
struction, vol. 128, Article ID 103786, 2021. Proceedings of the 2016 IEEE International Conference on
[15] X. Wang and Z. Hu, “Grid-based Pavement Crack Analysis Image Processing (ICIP), pp. 3708–3712, IEEE, Phoenix, AZ,
Using Deep Learning,” in Proceedings of the 2017 4th Inter- USA, September 2016.
national Conference on Transportation Information and Safety [32] Z. Zhu, H. Wei, G. Hu, Y. Li, G. Qi, and N. Mazur, “A novel
(ICTIS), pp. 917–924, IEEE, Alberta, Canada, August 2017. fast single image dehazing algorithm based on artificial
[16] A. Cubero-Fernandez, F. J. Rodriguez-Lozano, R. Villatoro, multiexposure image fusion,” IEEE Transactions on Instru-
J. Olivares, and J. M. Palomares, “Efficient pavement crack mentation and Measurement, vol. 70, pp. 1–23, 2020.
detection and classification,” EURASIP Journal on Image and [33] M. Zheng, G. Qi, Z. Zhu, Y. Li, H. Wei, and Y. Liu, “Image
Video Processing, vol. 2017, no. 1, pp. 1–11, 2017. dehazing by an artificial image fusion method based on
[17] H. Huang, L. Lin, R. Tong et al., “Unet 3+: a full-scale con- adaptive structure decomposition,” IEEE Sensors Journal,
nected unet for medical image segmentation,” in Proceedings vol. 20, no. 14, pp. 8062–8072, 2020.
of the IEEE International Conference on Acoustics, Speech and [34] T. Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal
Signal Processing (ICASSP), May 2020. loss for dense object detection,” in Proceedings of the IEEE
[18] Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu, “Eca-net: international conference on computer vision, pp. 2980–2988,
efficient channel attention for deep convolutional neural Venice, Italy, October 2017.
networks,” in Proceedings of the IEEE/CVF Conference on [35] O. Ronneberger, P. Fischer, and T. Brox, “U-net: convolu-
Computer Vision and Pattern Recognition (CVPR), IEEE, tional networks for biomedical image segmentation,” in
Seattle, WA, USA, June 2020. Proceedings of the International Conference on Medical Image
[19] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation net- Computing and Computer-Assisted Intervention, pp. 234–241,
works,” in Proceedings of the IEEE Conference on Computer Springer, Munich, Germany, October 2015.
14 Computational Intelligence and Neuroscience