0% found this document useful (0 votes)
43 views24 pages

A Review of Deep Learning Methods For Pixel-Level Crack Detection

Uploaded by

sorese6187
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views24 pages

A Review of Deep Learning Methods For Pixel-Level Crack Detection

Uploaded by

sorese6187
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

j o u r n a l o f t r a f fi c a n d t r a n s p o r t a t i o n e n g i n e e r i n g ( e n g l i s h e d i t i o n ) 2 0 2 2 ; 9 ( 6 ) : 9 4 5 e9 6 8

Available online at www.sciencedirect.com

ScienceDirect

journal homepage: www.keaipublishing.com/jtte

Review Article

A review of deep learning methods for pixel-level


crack detection

Hongxia Li a,b, Weixing Wang a,*, Mengfei Wang a, Limin Li c,**,


Vivian Vimlund d
a
School of Information Engineering, Chang'an University, Xi'an 710064, China
b
School of Computer, Baoji University of Arts and Sciences, Baoji 721013, China
c
School of Electrical and Electronic Engineering, Wenzhou University, Wenzhou 325035, China
d
Department of Computer Science and Technology, Linkoping University, Link€oping, Sweden

highlights

 A survey of deep learning-based pixel-level crack image segmentation (CIS) methods.


 This survey groups the CIS methods into 10 topics based on the backbone network architecture.
 Databases, evaluation metrics and loss functions are systematically summarized.
 Six common problems are discussed, and eight possible research directions are suggested.

article info abstract

Article history: Cracks are a major sign of aging transportation infrastructure. The detection and repair of
Received 5 July 2022 cracks is the key to ensuring the overall safety of the transportation infrastructure. In
Received in revised form recent years, due to the remarkable success of deep learning (DL) in the field of crack
15 November 2022 detection, many researches have been devoted to developing pixel-level crack image seg-
Accepted 28 November 2022 mentation (CIS) models based on DL to improve crack detection accuracy, but as far as we
Available online 5 December 2022 know there is no review of DL-based CIS methods yet. To address this gap, we present a
comprehensive thematic survey of DL-based CIS techniques. Our review offers several
Keywords: contributions to the CIS area. First, more than 40 papers of journal or top conference most
Crack image segmentation published in the last three years are identified and collected based on the systematic
Crack detection literature review method. Second, according to the backbone network architecture of the
Convolutional neural networks models proposed in them, they are grouped into 10 topics: FCN, U-Net, encoder-decoder
Deep learning model, multi-scale, attention mechanism, transformer, two-stage detection, multi-modal
Systematic literature review fusion, unsupervised learning and weakly supervised learning, to be reviewed. Meanwhile,
our survey focuses on discussing strengths and limitations of the models in each topic so
as to reveal the latest research progress in the CIS field. Third, publicly accessible data sets,
evaluation metrics, and loss functions that can be used for pixel-level crack detection are
systematically introduced and summarized to facilitate researchers to select suitable

* Corresponding author. Tel.: þ86 29 82334562


** Corresponding author.
E-mail addresses: [email protected] (H. Li), [email protected] (W. Wang), [email protected] (L. Li).
Peer review under responsibility of Periodical Offices of Chang'an University.
https://doi.org/10.1016/j.jtte.2022.11.003
2095-7564/© 2022 Periodical Offices of Chang'an University. Publishing services by Elsevier B.V. on behalf of KeAi Communications Co.
Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
946 J. Traffic Transp. Eng. (Engl. Ed.) 2022; 9 (6): 945e968

components according to their own research tasks. Finally, we discuss six common
problems and existing solutions to them in the field of DL-based CIS, and then suggest
eight possible future research directions in this field.
© 2022 Periodical Offices of Chang'an University. Publishing services by Elsevier B.V. on
behalf of KeAi Communications Co. Ltd. This is an open access article under the CC BY-NC-
ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

proposed, ranging from image segmentation methods based


1. Introduction on older traditional IP techniques like edge detection,
thresholding, region growing, and histogram to superior al-
Cracks are the most typical imaging manifestation of trans- gorithms combined with some particular theories and
portation infrastructure aging. Automatic crack detection methods like cluster analysis, fuzzy set, genetic code, and
(ACD) technology has been widely concerned by researchers wavelet transform.
in this field and traffic infrastructure management and However, just in the last few years, DL create a new era of
maintenance personnel. In the past few years, considerable image segmentation model duo to its outstanding image
research efforts have been devoted to crack detection segmentation performance promotions. DL is a method of
methods based on deep learning (DL) due to the DL successful machine learning. It uses more neural network layers and
application on the field of computer vision (CV) and image. wider width, and can theoretically map arbitrary functions to
There is a need for a review report to sort out these works and solve various complex problems. The most widely used DL
probe the DL research progress in crack detection area. algorithms in the computer vision and image fields are the
To find an entry point for our study, we surveyed and convolutional neural networks (CNNs).
compared 10 reviews related to crack detection published Fig. 1 shows a standard CNN structure for crack detection
from 2017 to 2021 (Azimi et al., 2020; Cao et al., 2020a, b; carrying out three different CV tasks (i.e., image
Elghaish et al., 2021; Gopalakrishnan, 2018; Hsieh and Tsai, classification, object detection and image segmentation),
2020; Mohan and Poobal, 2018; Munawar et al., 2021; Zakeri which can capture the nonlinear and dynamic relationship
et al., 2017; Zawad et al., 2021). We summarized the between the input image and the predicted output. In the
publication year, main theme, type of crack detection features extraction part, low-level features like edges and
method of the 10 reviews, and compared whether the patterns are extracted using the first few convolutional
reviews made a detailed analysis on the leading research layers, shapes and colors are extracted using the middle
methods, data sets, future research directions and layers, and more abstract deep-level features such as
challenges, with details in Table 1. complete objects are extracted using the higher
Despite these reviews have combed the papers related to convolutional layers. The final output of the feature
crack detection from their respective research perspectives, extractor layer enters a fully connected layer to forecast
namely, traditional image processing (IP), DL (review usually whether the input image belongs to a crack class or not
including three CV tasks: image classification, object detec- when the CNN is employed for an image-level classification
tion, and image segmentation), or both above, as of yet there task. Its input is some images like Fig. 1(a). When the CNN is
exists no review specifically surveyed the DL-based crack applied to object detection mission, its input is shown as
image segmentation (CIS) methods. In contrast to image seg- Fig. 1(b), and the last fully connected layer is used to
mentation based on traditional IP, DL-based image-level compute the predicted bounding box coordinates and the
classification prediction and DL-based object detection, DL- class of objects in the bounding box and its output is
based image segmentation techniques can produce dense illustrated as Fig. 1(d). While for a CIS task, the final output
pixel-wise predictions with semantic label for each input of the feature extractor layer enters the final convolutional
image (semantic segmentation) or detection and partition of layer with a Softmax layer for multi-classification or a
individual objects of interest within the image (instance seg- sigmoid layer for binary classification, which achieves image
mentation), often reaching the highest detection accuracy segmentation by performing pixel classification. For
rates on benchmark data sets. In recent years, many new instance, the semantic segmentation of the crack image is
image segmentation algorithms have emerged in the field of when the pixels are split into two categories, crack pixels
DL and they are now major and more widely used techniques and non-crack pixels, as shown in Fig. 1(e); the instance
to improve the performance of crack detection adopted by segmentation of crack image is illustrated as in Fig. 1(f),
related researchers; however, as far as we know there is still a where each identified crack in the image is indicated with a
lack of comprehensive review of CIS methods based on DL. different color. Compared with the DL-based image
By offering a thorough survey of DL-based CIS Literature classification (e.g., Fig. 1(c)) and DL-based objector detection
released in recent years, we hope to fill this gap in the litera- (e.g., Fig. 1(d)) models, the DL-based CIS models can extract
ture. It is commonly known that image segmentation is the the complete pixel-level crack shape (e.g., Fig. 1(e) and (f)) so
most important and challenging approach in the field of CV as to can provide more accurate information for the
and image, while also being the most widely used and suc- subsequent classification, quantitative analysis and
cessful technique for automatic crack detection. In the evaluation of cracks. Therefore, DL-based CIS methods have
research literature, a wide range of algorithms have been become the mainstream way of crack detection.
Table 1 e Survey of published review articles.
Surveys Zakeri et al. (2017) Mohan and Gopalakrishnan Azimi et al. Cao et al. Hsieh and Cao et al. Munawar Elghaish et al. Zawad et al.
Poobal (2018) (2018) (2020) (2020a) Tsai (2020) (2020b) et al. (2021) (2021) (2021)
Year 2017 2018 2018 2020 2020 2020 2020 2021 2021 2021
Main theme Crack detection, Engineering Crack detection Structural Road Crack Pavement Crack Building Civil
classification and structures crack and classification damages damages detection defect detection distress engineering
quantification in detection detection detection detection detection structure crack
asphalt pavement detection
Type of learning Traditional IP Traditional IP DL DL DL ML and DL Traditional IP Traditional IP DL Traditional and

J. Traffic Transp. Eng. (Engl. Ed.) 2022; 9 (6): 945e968


methods and DL and ML DL
covered
Compared Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
research
methods
Analyzed data No Yes No No No Yes Yes Yes No No
sets in detail
Detailed future Yes No Yes Yes Yes Yes Yes No Yes Yes
research
Main Based on the asphalt Fifty research This paper is the first The latest This study 68 ML-based The application Thirty papers An overview of The
contribution pavement crack papers related to to review 12 papers applications of extensively crack of DL neural published in methods for experimental
detection crack detection in DL-based DL technology in evaluates detection network in top journals detecting and structure, steps
method, the are grouped automatic pavement vibration-based eight DL- methods and crack detection and classifying followed,
research status of IP according to the damage detection and vision-based based road performance is reviewed and conferences cracks in results and
technology is type of images from 2016 to 2018. structural health damage evaluation of compared from are reviewed pavements and limitations of
explained from three used to detect The network monitoring SHM detection 8 ML-based three methods for their crack buildings using 30 IP-based
stages of distress them, and architectures, are reviewed. models. crack based on detection DL, crack detection
detection, various papers hyperparameters segmentation classification, techniques, highlighting studies are
classification and are analyzed and crack detection models are based on object features, the focus of reviewed.
quantification. based on performances used reviewed. detection and performance, each study, the
objective, data in them are based on data sets and methods
set, error and compared. segmentation. limitations. employed, and
accuracy level. limitations.

947
948 J. Traffic Transp. Eng. (Engl. Ed.) 2022; 9 (6): 945e968

Fig. 1 e Structure of convolutional neural network. (a) The input image used to image-level crack classification task. (b) The
input image used to crack detection or pixel-level crack classification. (c) The output of image-level crack classification based
on detecting whether the input image contains cracks, thereby classifying the image into crack class or non-crack class. (d)
The output of crack detection task is image with rectangular boxes encapsulating the crack area in the image. (e) The output
of crack semantic segmentation task is pixel-level annotation of cracks in the image. (f) The output of crack instance
segmentation task is to mark the detected cracks with different colors to distinguish different cracks.

The objectives of this survey are to: (1) systematically suitable data sets, evaluation metrics and loss functions
collect, investigate and analyze research Literature on DL according to their own research tasks. Lastly, we extend the
applications in the field of CIS and (2) pinpoint major obstacles discussion of the six challenges and its existing coping
to the research field and suggest future research zone. Our strategies of CIS based on the analysis of the literature and
review offers several contributions to the CIS area. First, we developing eight potential avenues for future research. We
collect and refine more than 40 journal or top conference ar- are hopeful the ideas we put forth encourage scholars to
ticles indexed by SCI, most published in the recent three years, explore improvement schemes of CIS.
based on a systematic literature review (SLR) (Khallaf et al., The structure of this paper is as follows: section 2 clarifies
2018; Kitchenham et al., 2009). We are hopeful this section the method and steps we employ to report this review; section
will provide high quality Literature to our survey. Second, 3 reviews the literature grouped into 10 categories; section 4
we then address how the DL-based method achieves the summaries data sets, evaluation metrics and loss functions
pixel-wise CIS. The DL-based CIS methods proposed by the used in the surveyed literature; section 5 discusses about the
selected literature are grouped into 10 categories: fully existing problems and the possible prospect of CIS methods.
convolutional network (FCN), U-Net, encoder-decoder model,
multi-scale and expansion convolutional, attention
mechanism, Transformer, two-stage detection, multi-modal 2. Methodology
fusion, unsupervised learning and weakly supervised
learning based on their network backbone architectures. We With the aim of systematically and efficiently screening,
hope this review will help to reveal the latest research reporting and assessing the high-quality literature related to
progress in this field by comparing and summarizing DL-based CIS published in recent years, this review adopts the
advantages, disadvantages, and detection performance of systematic literature review (SLR) method proposed by
newer and classical CIS algorithms belong to each category, (Kitchenham et al., 2009). Our research results are reported in
meanwhile, make it easier for researches to pick out 5 steps: identifying the research question, defining the
appropriate techniques for their own research. Third, we literature search process and literature inclusion and
collate the data sets used in the literature, focusing on exclusion criteria, collecting literature, assessing literature
providing the resources of publicly accessible data sets. quality, and performing a literature descriptive analysis
Furthermore, we survey the evaluation metrics and loss (Khallaf et al., 2018).
functions used in IS and compare their applicability in According to the research goals of this paper, three
specific tasks. We hope for facilitating researchers to choose research questions are identified. Q1: What were the search
J. Traffic Transp. Eng. (Engl. Ed.) 2022; 9 (6): 945e968 949

Fig. 2 e Systematic literature review process.

questions and screening criteria for the literature included in during the last three years. Table 2 shows the year distribution
this study? Q2: How does the DL-based method achieve pixel- of articles on CIS using DL. According to the strategies and
level segmentation of crack images? Q3: What challenges and architectures adopted by their network backbones, the
opportunities does DL face in the field of CIS? models in the literature are divided into 10 topics, namely,
Fig. 2 shows the research process of our SLR method. CIS based on FCN, U-Net, encoder-decoder model, multi-
Following this, we answered Q1 according to the process of
collection, selection and refinement of research articles Table 2 e Year distribution of published articles on DL-
described in Fig. 2 and then screened more than 40 journal based CIS.
articles published between 2018 and 2022 for this survey. To
Year Article
solve problem Q2, firstly, we extracted the following item:
2018 Yang et al. (2018)
year, journal, research purpose, methods and models,
2019 Ko€ nig et al. (2019); Liu et al. (2019a),
findings and limitations from the selected literature. Then, 2020 Chen and Jahanshahi (2020); Choi
based on the structures and strategies of the backbone and Cha (2020); Fan et al. (2020);
networks proposed in them, these literatures are grouped Huyan et al. (2020); Kang et al.
into 10 topics, so that the different facets of IS methods used (2020); Li et al. (2020); Sun et al.
in them, such as network architecture, data set, loss (2020); Wang et al. (2020a, b); Wang
and Cheng (2020); Yang et al.
function, evaluation metrics, are reviewed, discussed and
(2020a); Zhang et al. (2020)
compared. Finally, on the basis of the review and summary
2021 Ali et al. (2021); Chen and Lin (2021);
of the latest and classic methods, the existing problems and Guo et al. (2021); Jiang et al. (2021);
future potential research directions for DL-based CIS Kang and Cha (2021); Li and Tang
methods are presented, so as to answer Q3. (2021); Liu et al. (2021); Palermo et
al. (2021); Park et al. (2021); Qiao et
al. (2021); Qu et al. (2021a); Wu et al.
(2021a, b); Yang and Ji (2021); Zhou
3. Literature review for DL-based crack
et al. (2021)
image segmentation 2022 Fan et al. (2022); Han et al. (2022);
Ko€ nig et al. (2022); Li et al. (2022a, b);
This section addresses the Q2 and reviews in detail more than Pang et al. (2022); Wang and Su
40 literature related to DL-based CIS, most of which published (2022)
950 J. Traffic Transp. Eng. (Engl. Ed.) 2022; 9 (6): 945e968

scale, attention mechanism, transformer, two-stage thereby extending the image-level classification prediction
detection, multi-modality fusion, unsupervised learning, and of CNNs to dense pixel-level classification and opening the
weakly supervised learning, respectively. Table 3 shows the era of CNNs-based IS. The framework of FCN consists of two
strategies and network architectures of the backbone parts: the down-sampling part and the up-sampling part.
network of DL-based models for CIS proposed in the The down-sampling part of FCN is the VGG19 network
reviewed articles. (Simonyan and Zisserman, 2015 a and b); it is worth noting
that the part to the network that implements down-
3.1. Crack image segmentation based on fully sampling can be replaced by other state-of-the-art DNNs,
convolutional network such as GoogLeNet (Szegedy et al., 2014) and ResNet (He
et al., 2015). The training time of the entire network is
Classical convolutional neural networks (CNNs) (Lecun, 1998) significantly reduced by loading pre-trained weights to the
used fully connected layers after convolutional layers, which parameters of the down-sampling network part. The core
limits them to fixed-size images. FCN (Long et al., 2015) novelty of FCN is its up-sampling part. Through the fusion of
replaced the last fully connected layers of CNNs with FCN-8 s, 16 s, and 32 s of information of different scales, the
convolutional layers and adds the up-sampling process, reduced feature map finally obtained by down-sampling is

Table 3 e Overview over the strategies and network architectures of the backbone network of models of the reviewed
articles.
Strategy Network architecture Article
FCN FCN (VGG19) Yang et al. (2018)
FCN þ DSN þ CRF Liu et al. (2019)
FCN þ active rotating filters (ARFs) Chen and Jahanshahi (2020)
FCN þ AtrousConvolution þ CRF Wang and Cheng (2020)
U-Net I-UNet Wang et al. (2020a)
CrackU-Net Huyan et al. (2020)
Residual style U-Net Lau et al. (2020)
Dual U-Net Guo et al. (2021)
CrackW-Net Han et al. (2022)
U-Net þ IRACblock þ wavelet transform Li and Tang (2021)
Encoder-decoder model SDD-Net Choi and Cha (2020)
HED þ Feature Pyramid Yang et al. (2020)
Customize the CNN Ali et al. (2021)
A morphology branch (MB) þ a shallow detail branch Pang et al. (2022)
with a small stride
Multi-scale and dilated U-Net þ PSPNet Sun et al. (2020)
convolutional DeepLab þ multiscale feature fusion module Qu et al. (2021a)
HybridASPP Chen and Lin (2021)
MS-DPDL Wu et al. (2021b)
Attention mechanism U-Net þ Attention Gates Ko€ nig et al. (2019b)
Xception model Qiao et al. (2021)
VGG þ hybrid pooling þ spatial attention module Zhou et al. (2021)
STRNet Kang and Cha (2021)
SegNet þ Scal AB þ self-AB Liu et al. (2021)
Transformer ResNet-34 þ Transformer Guo and Markoni (2021)
Hierarchical transformer þ Feature pyramid decoder Wang and Su (2022)
Two-stage detection Faster R-CNN þ TuFF Kang et al. (2020)
YOLOv4 þ HDCB-Net Jiang et al. (2021)
U-Net þ RefineNet Park et al. (2021)
Vgg16 classification model and U-Netþþ with Yang and Ji (2021)
backbone ResNet-18
YOLOv4 þ ResNet-34 Li et al. (2022a, b)
Multi-modality fusion Three individual CNN (without a pooling layer) Fan et al. (2020)
Parallel ResNet Fan et al. (2022)
Faster-RCNN þ Random forest classifier Palermo et al. (2021)
FCN þ SFD Wang et al. (2020b)
Unsupervised learning U-Net þ GAN Duan et al. (2020)
K-means clustering þ Otsu Mubashshira et al. (2020)
CNN þ K Means Clustering Li et al. (2021)
ResNet-18 þ MemAE Wu et al. (2021a)
Weakly/semi supervised FC-DenseNet þ discriminator network Li et al. (2020)
learning U-Net Generator þ One-Class Discriminator Zhang et al. (2020)
Asymmetric U-Net Generator Zhang et al. (2021)
ResNet-50 þ Thresholding Segmentation Ko€ nig et al. (2022)
J. Traffic Transp. Eng. (Engl. Ed.) 2022; 9 (6): 945e968 951

expanded to the original image size, which is the premise of of concrete cracks, pavement cracks and corrosion and
realizing the image pixel-level classification. Some studies achieve higher average precision values than DeepCrack.
have improved FCN for CIS. Since the decoder structure of FCN is relatively simple,
The method proposed in Yang et al. (2018) is one of the which affects the prediction result of the cracks in complex
earliest algorithms for pixel-level crack detection based on background, Wang and Cheng (2020) proposed the model
FCN. Yang collected more than 800 crack images and called DilaSeg-CRF based on FCN to improve the prediction
annotated them at a pixel level, thus constructing a crack accuracy of the details and boundaries of pipe defects.
data set for pixel-level detection. Compared with the DilaSeg-CRF utilizes the dilated convolution to decrease the
existing crack detection methods, the proposed method spatial resolution loss of feature maps, and its backbone
achieves end-to-end pixel-level crack detection and reduces attaches the recurrent neural network (RNN) layers that
the training time from days to hours. Moreover, crack implement the dense conditional random field (CRF)
morphological characteristics can be extracted from the inference algorithm for getting better segmentation
cracks predicted by the proposed model for assessing the accuracy. As shown in Fig. 3, the DilaSeg-CRF network
structure health. However, the model based on the current consists of a modified FCN (DilaSeg) and an RNN converted
version of FCN cannot accurately predict thin cracks from from the dense CRF model. DilaSeg is mainly obtained by
images of various scenes, and it needs to be improved to ordinary convolution, dilated convolution, multi-scale
obtain the physical and geometric properties of cracks from dilated convolution and bilinear interpolation. Compared
pixel-level prediction maps in real-time. with FCN-8s, DilaSeg-CRF increases 32% mean intersection
DeepCrack (Liu et al., 2019) improved the generalization over union (mIoU) values.
ability of FCN structure by inserting a batch normalization
(BN) layer between the convolutional layer and the ReLU 3.2. Crack image segmentation based on U-Net
layer (Nair and Hinton, 2010) of VGG-16 and adding an side
network for supervision to the prediction of each layer to Compared with FCN, U-Net (Ronneberger et al., 2015) requires
speed up the network convergence speed. Its final fusion fewer training images for more accurate CIS. The first feature
prediction graph is refined by using conditional random field of U-Net is that its left encoder and right decoder are basically
(Zheng et al., 2015) and guided filtering (He et al., 2013). This completely symmetrical, while FCN's decoder is relatively
study contributes a publicly available crack detection data simple and only uses one deconvolution operation. The
set called DeepCrack consisting of 537 images with their second feature of U-Net is that the convolution results of
pixel-level annotations. Compared to Yang's method, this each layer of the encoder are concatenated with the feature
method improved the precision of crack prediction from maps of the corresponding layer of the decoder through the
images of diverse scenes by four percentage points. concatenate (long-skips) operation (different from the
Meanwhile, the crack segmentation performance of element-wise addition of FCN). The following convolutional
DeepCrack outperforms that of HED (Xie and Tu, 2015), layers achieve fused information. Many studies have found
RoadCNN and SegNet. that the models based on U-Net perform well on crack
Chen and Jahanshahi (2020) revised DeepCrack to exploit feature extraction from the image and can produce a good
the rotation-invariant property of cracks by adding active density prediction for cracks.
rotation filter (ARF) in the encoder module. The proposed The work (Cheng et al., 2018) is one of the first methods to
model called ARF-crack not only reduced the number of use U-Net for full-CIS, and it has better performance than
network parameters but also enhanced the ability of the other ML methods. Later, more methods proved that U-Net
model to identify rotational invariance defects. It was is very suitable to the crack segmentation task. I-UNet
evaluated the learning generalization ability on the images (Wang et al., 2020a) applied dilated convolution and

Fig. 3 e The network architecture of DilaSeg-CRF (Wang and Cheng, 2020).


952 J. Traffic Transp. Eng. (Engl. Ed.) 2022; 9 (6): 945e968

inception module to improve the encoder structure of U-Net, weights for different types of cracks in the loss function. The
and replaces the ReLU activation function in the U-Net with overall architecture of the network is visualized in Fig. 4.
Elu. This UNet-based improved network for road crack Another similar study is shown in Han et al. (2022) where
detection is more robust and accurate than U-Net methods. an UNet-based CrackW-Net similar to the shape “W"
Huyan et al. (2020) proposed an UNet-based CrackU-Net structure was proposed for pavement cracks IS. This model
model to improve pixel-level detection accuracy of adds a temporary up-sampling and down-sampling
pavement cracks. Being different from I-UNet, CrackU-Net component used a custom pooling function at the
changes the number of convolutions in each stage of the U- bottleneck of U-Net for first increasing the size of feature
Net encoder from 2 to 4 to enhance the prediction of small- maps, and then decreasing it. This method can effectively
scale features. The CrackU-Net model outperforms FCN and improve the CIS accuracy compared with FCN, U-Net and
U-Net in crack detection accuracy when there are a lot of ResU-Net. However, it could not handle the problem of
noise or tiny cracks, however, it requires more training continuous cracks interruption well.
samples. Some information is lost during the process of dimension
Another branch of U-Net-based approaches use U-Net style reduction by the pooling layer, which has a certain impact on
architectures. That is, the encoder of U-Net can be replaced by the accuracy location prediction required by the segmenta-
popular image classification architectures such as ResNet (He tion. To address this problem, the work in Li and Tang (2021)
et al., 2015) or EfficientNet (Tan et al., 2020), VGG (Simonyan replaced the pooling layer in U-Net with wavelet transform
and Zisserman, 2015a,b), and Inception (Szegedy et al., 2016) and integrated the proposed IRAC blocks with atrous
to exploit the better feature extraction capabilities, and the convolution structure into the encoder to reduce the loss of
decoder can be customized to create IS outputs. Lau et al. information and obtain richer information in the receptive
(2020) proposed that including residual style blocks in the U- field, thereby improving CIS performance on the network.
Net style architecture could further improve the However, it fails to detect thin cracks.
performance of crack detection. Moreover, Ko € nig et al. (2021)
showed that if classical image classification structures are 3.3. Crack image segmentation based on encoder-
used as encoders, which utilize the weights pre-trained on decoder model
data sets such as ImageNet, the network can obtain stronger
feature extraction baseline, thereby improving its crack Encoder-decoder is a very common model framework in DL.
detection performance. Based on it, a variety of CIS algorithms are designed for end-
Recently, Guo et al. (2021) extended U-Net architecture by a to-end learning.
dual U-Net architecture to overcome the problem of Rather than only the final prediction result of U-Net to be
inaccurate crack boundary of the prediction results. The first used to calculate the segmentation loss, DeepCrack (Zou et al.,
U-Net in the sequence as a base crack prediction module 2019a) is based on another promising encoder-decoder model
appends an edge detection branch in parallel where the called SegNet (Badrinarayanan et al., 2017) and calculates the
Sobel edge detection filter is followed by a spatial and pixel-wise prediction loss at each scale. Different scale feature
channel attention module. The output of the first U-Net maps from every corresponding stage in the encoder and the
fused with the output of the edge detection chain is fed to decoder are merged, and each scale prediction loss is
the second U-Net to ferine the predicated cracks. Compared calculated. Prediction maps with the five distinct scales are
to the conventional U-Net structure, the proposed fused through a series of operations of deconvolution, crop,
architecture improved the prediction accuracy of crack and concatenation. As last, the model produces the
boundary. However, it needs manually to adjust the penalty prediction maps at each stage and the last overall fused

Fig. 4 e Road crack detection consists of three parts, i.e., Base Predictor Module. Edge Adaptation Module, and Refinement
Module (Guo et al., 2021).
J. Traffic Transp. Eng. (Engl. Ed.) 2022; 9 (6): 945e968 953

layer. This work shows that fusing the prediction maps at network complexity, and training model time of each model
each stage can achieve better performance than basic U-Net based on the amount of training data, data heterogeneity,
or SegNet in a CIS task. network complexity, and number of epochs. The work states
To achieve real-time segmentation of cracks in images, that increasing the number of model layers, parameters and
Choi and Cha (2020) proposed a semantic damage detection training samples with minimal heterogeneity has no
network called SDDNET, which includes DenSep, ASPP, and significant effect on the crack detection performance of the
a decoder using separable convolutions. The DenSep module proposed model. However, these factors increase
with four separable convolutions reduces the computational computation time and lead to model overfitting.
effect by about 70% and is faster in computation than For achieving a balance between the speed and the accu-
standard convolutions. The ASPP removed the original racy of CIS, Pang et al. (2022) proposed a novel model called
global average pool GAP. Choi built a dataset of 200 crack DcsNet (Fig. 5). DcsNet includes two feature extraction
images and divided the crack images into 5 categories: thick branches, a morphology branch (MB) and a shallow detail
crack, thin crack, blurry crack, severe crack, and crack-like branch (DB) with a short stride. MB can preserve the
features. The model has much fewer parameters, higher morphology information of scale invariance; DB with a small
detection accuracy and faster detection speed. However, this stride can obtain supplementary detailed information. MB is
method produces crack disconnections in blurry images. composed of a lightweight convolutional network, a pyramid
Aim to improve the generalizability and precision of the pooling module (PPM) and an attention module CSA. CSA
crack detection model, Yang et al. (2020) advocated an FPHBN consists of a channel attention module that continuously
network for pavement crack detection that uses HED (Xie and processes feature maps in the channel dimension and a
Tu, 2015) as the network backbone. Feature pyramid special attention module that continuously process feature
architecture aims to alleviate the cluttered and ambiguous maps in the spatial dimension. DcsNet achieves a better
output problem in bottom-up architecture due to lack of balance between accuracy and inference speed than state-
contextual information in lower-level layers. Side networks of-the-art methods on four challenging data sets (Crack500,
append after the feature pyramid structure to perform crack DeepCrack, Gaps384, and Damcrack). However, the model is
prediction separately at each level. Hierarchical boosting is sensitive to noise.
proposed to deal with the hard sample problem. This study
proposes the average intersection over union (AIU) as a 3.4. Crack image segmentation based on multi-scale
complementary metric for evaluating crack detection. The
proposed FPHBN can produce sharper and better detection The fixed sampling scale will make the network scale
results in complex backgrounds, but its prediction results of dependent, which is not conducive to distinguish the crack
cracks are thicker and blurry than GTs. image features of different scales. Due to cracks in the images
To explore how to choose the right DL architecture and data with dissimilar sizes, many models improve detection per-
set size for the specific problem of crack detection and locali- formance by enhancing their ability to extract multi-scale
zation, Ali et al. (2021) proposed a customized encoder-decoder crack features in the CIS task.
CNN model for crack segmentation. The proposed new Sun et al. (2020) suggested a new multi-scale convolution
architecture is compared to VGG-16, VGG-19, ResNet-50, and block composed of kernels of different sizes to increase the
Inception-v3 to measure each model's detection accuracy, receptive field and found that it is effective to remove noise

Fig. 5 e An overview of DcsNet. Each box corresponds to a multi-channel feature map, size ratios to the full-resolution input.
“PPM” stands for the pyramid pooling module, “FFM” stands for the fusion of two feature maps, “CSA” stands for the
channel and spatial attention module, and “UPS” stands for the final £ 2 up sampling. The red characters “Label” represents
deep supervision we take only during training (Pang et al., 2022).
954 J. Traffic Transp. Eng. (Engl. Ed.) 2022; 9 (6): 945e968

in the practical problems by merging features at various 2021). Ca and Co represent the number of input and output
levels. Meanwhile, the author found that combining the channels, respectively. Cm represents the number of output
DICE coefficient with binary cross entropy will help to channels in each atrous convolutional layer.
improve the problem of the unbalanced samples in the Wu et al. (2021b) proposed a pixel distribution learning
small number of training images. However, the model did model with multi-scale for concrete crack detection. The
not address the issue of the recognition of features close to a proposed model combines the pixel distribution learning
certain small scale crack. method in background subtraction and the CNN model to
Since single-scale features hardly capture the complicated address the fine crack detection in a complex background.
geometric structures of cracks, Qu et al. (2021a, b) advocated a The network input is the patches of different scales
CNN model with multiscale feature fusion module for generated by random arrangement of spatial pixels. Output
pavement crack detection. To address the problem that deep is the prediction of the center pixel in the input patch, i.e.,
semantic features are continuously lost during the up- cracked (1) and non-cracked (0). After it is trained on
sampling process, the model employs the DeepLab CRACK500, the method is used to classify in another data
architecture as feature extractor in the down-sampling stage set, and the prediction results verify its good generalization
and a proposed multi-scale convolution feature fusion block performance. This method demonstrates that the
is used to fuse the distinct scale convolution features combination of multi-scale pixel distribution learning ability
learned in different convolution stages. This work shows and CNN can effectively detect thin cracks.
that combining the proposed multi-scale fusion block, and
the deep supervision mechanism help to improve the model 3.5. Crack image segmentation based on attention
convergence speed and the accuracy of crack prediction. mechanism
HACNet (Chen and Lin, 2021) is a pixel-level crack detection
network with fewer parameters and more accurate Although the above DL-based models can obtain pixel-by-
predictions. This work explored not to use any down- pixel segmentation results, their disadvantage is that the
sampling and up-sampling operations but to use a hybrid models tend to ignore the relationship between pixels during
atrous convolution to keep the predicted image higher training, which hampers satisfactory estimation of crack
spatial precision and distinguish cracks of different sizes. segmentation maps. To address this issue, some researchers
The core of HACNet is the proposed HybridASPP module that implanted a special structure, attention mechanism, into the
employs four atrous convolutional layers using 3  3 kernel considered model to learn automatically and calculate the
but with different dilation rates in cascade. HACNet made contribution to the input data for the crack prediction. These
improvements to the detection of thin cracks, crack edges attention mechanisms include attention gates, squeeze and
and integrity crack on public data sets of Deepcrack and excitation, concurrent spatial and channel attention module,
YCD; however, there some failure cases of not properly bottleneck attention module, self-attention module.
segmenting the thinner cracks and cracks with complex Ko€ nig et al. (2019) established a new CNN based on the U-
background. The architecture of the proposed HybridASPP is Net architecture for pavement surface crack. The model
shown in Fig. 6. HybridASPP keeps the same spatial integrates residual connection into each encoder-decoder
resolution throughout the whole process (Chen and Lin, block and adds attention gates before the decoder. Attention

Fig. 6 e Architecture of the proposed HybridASPP (Chen and Lin, 2021).


J. Traffic Transp. Eng. (Engl. Ed.) 2022; 9 (6): 945e968 955

gates preserve spatially correlated features and generate scale Another interesting work (Ong et al., 2022) incorporated a
coefficients. They are added to decoder blocks via skip self-attention mechanism module atop a feature pyramid
connections. The work adopts two loss functions, cross- network to generate long-distance context information and
entropy loss and dice loss, to reduce the difference between improve the crack segmentation accuracy (Fig. 7). The novel
ground truth and predicted mask. self-guided attention refinement module contributes to the
Inspired from scSE, Qiao et al. (2021) put forward a network model in restraining noise and reducing the loss of crack
structure called DFANet combined with scSE attention details and allows the model to get better prediction
mechanism for CIS. The backbone of this model is built on a performance for fine cracks. This work finds that the
lightweight Xception network that directly fuses the deepest segmentation performance of some backbone networks
features from the three backbones using bilinear up- such as PANet, U-Net and DeepLab v3þ can be upgraded if
sampling. It uses the scSE attention module to emphasize they incorporate the proposed attention refinement module.
crack regions of interest and suppress irrelevant background
regions by learning a set of weight coefficients, and enhance 3.6. Crack image segmentation based on transformer
the ability to capture detailed features of cracks.
CrackDFANet shows strong anti-interference ability and With Transformers extended to enhance computer vision
generalization in interference conditions of light, parking tasks, it has recently attracted the researchers with in the field
line, water spot, plant disturbance, oil spot, and shadow. of crack detection. Transformer (Vaswani et al., 2017) is a
Zhou et al. (2021) proposed a crack detection network to classic NLP model. It replaces what the RNN does with the
obtain richer crack features and edge information by self-attention layers and each of its output can see the
introducing the special and channel attention module. The entire input sequence, meanwhile, can be a parallel
network utilizes the channel-wise attention module to calculation. Therefore, it enables the model to be trained in
capture the rich contextual information in the high-level parallel and has global information. By leveraging attention
features, and utilize the spatial attention module to filter out across the entire sequence, the transformer can highlight
some background details in the low-level features. It the most important step in the sequence as an important
replaces the traditional spatial pooling with a hybrid pooling output determinant. Recently, some scholars applied them
module, consisting of vertical pooling, horizontal pooling to optimize the crack recognition tasks.
and average pooling, to capture the feature dependence
between the short range and long range. The test results on
the DeepCrack data set show that the F1-score of the new
model is better than that of HED, SegNet, DeepCrack and
FPHBN.
STRNet (Kang and Cha, 2021) is a network structure for
improving the accuracy and real-time performance of crack
detection in complicated scenes. The proposed STR module
integrated with the spatial attention is employed in this
network encoder to reduce the computational cost of real-
time processing of complex scenes and improve
segmentation accuracy. The decoder uses bottleneck
attention modules with coarse up-sampling blocks and
applies the swish activation function. Compared with the
other six methods, STRNet achieves some performance
improvements as follows: (1) has a smaller number of
parameters; (2) can process the large-size input crack images
in real-time; (3) segment cracks in complex scenes (different
regions, structures and lighting conditions) with better
results; and (4) alleviate the effect of shadows. However, the
problem of false positives and false negatives in predictions
needs to be strengthened.
CrackFormer (Liu et al., 2021) is a CISnetwork. Its network
structure is similar to SegNet, combining the proposed self-
attention block (Self-AB) and the scaled attention block
(Scal-AB) to detect elegant cracks. Self-AB focuses on the
detection of fine cracks that are easily disturbed by various
noises. Scal-AB reduces noise interference by emphasizing
crack features and suppressing non-crack features.
CrackFormer explores how to exploit the advantages over
the Transformer model (Vaswani et al., 2017) to capture
long-range interactions while employing small convolution Fig. 7 e Self-guided attention refinement module
kernels for fine-grained attention perception. architecture (Ong et al., 2022).
956 J. Traffic Transp. Eng. (Engl. Ed.) 2022; 9 (6): 945e968

Dosovitskiy et al. (2021) are the first to explore the use of To address the inaccurate detection of crack boundaries
standard Transformer encoders for image classification predicted by most models in complex environments, Guo et al.
tasks. They proposed an image classification model based (2021) suggested a new ViT to concentrate on refining the
entirely on the Transformer architecture called the Visual coarse crack prediction output by the encoder based on the
Transformer (ViT). The structure of ViT completely replaces ResNet-34 network with the aim to generate clearer crack
the convolution operation in image recognition and adds boundaries. Compared to the original ViT, the proposed ViT
upfront processing operations that interpret the image as a does not send all embedded features produced by the
series of image patches by using the standard Transformer transformer back to the MLP for producing classification
encoder. predictions in its refinement procedure. In contrast to
Unlike the ViT for image classification, Wang et al. (2021) convolution, the transformer refines the coarse cracks in
offered a pure transformer backbone called pyramid vision each patch and explores the dependency of each patch in
Transformer (PVT) for dense prediction tasks. The pyramid their sequence order. The enhanced ViT method in this
structure was introduced into PVT to enhance the local research can provide more accurate crack boundary than
continuity of the target features, which inherits the the former schemes. However, the induction deviation of
advantages of CNN and Transformer, and can replace the the two-dimensional structure to the crack image needs to
CNN backbone with a non-convolution structure to complete be injected into the ViT manually.
visual tasks such as object detection, instance and semantic Most recently, Wang and Su (2022) developed a SegCrack
segmentation. This paper shows that under the same model for pixel-level crack segmentation by incorporating
number of parameters, the proposed PVT has higher Transformer blocks. The overall architecture of the SegCrack
detection accuracy than the ResNet-50 network. is illustrated in Fig. 8. First, the overlapping patch
Another work about PVT (Cao et al., 2021) port Transformer embedding module splits the input 608  608 image into
to learn the global and remote semantic interaction small patches, and then these patches are fed into the
information for a dense prediction task. Unlike the work of encoder. A hierarchical Transformer used as an encoder
Wang, a proposed hierarchical transformer called Swin contains four sets of Transformer blocks and the overlap
Transformer can be used as a general backbone for the CV patch-merging module. The proposed Transformer block is
tasks. The proposed hierarchical structure can flexibly composed of a layer norm, efficient self-attention, and the
handle various scale images with linear computational Mix Feed-Forward Network (Mix-FFN). SegCrack exhibits
complexity. The proposed shifted window scheme can suitable performance in concrete crack segmentation, but it
restrict the self-attention computation to non-overlapping needs to improve segmentation efficiency.
local windows and allow cross-window connection to
improve the prediction accuracy and efficiency. This 3.7. Crack image segmentation based on two-stage
hierarchical design and shifted window approach are detection
beneficial for all MLP architectures. Recently, few of
researchers tried to apply the Transformer technology to the Some studies implement a two-stage detection strategy for
crack segmentation task. improving crack segmentation results. They have introduced

Fig. 8 e Overall structure of SegCrack (Wang and Su, 2022).


J. Traffic Transp. Eng. (Engl. Ed.) 2022; 9 (6): 945e968 957

object detection algorithms such as Faster R-CNN (Ren et al., Park et al. (2021) studied a reinforcement learning-based
2016), YOLOv4 (Bochkovskiy et al., 2020) for the first-stage framework to refine the predictions of common crack
crack detection. Then, the pixel-level crack segmentation is segmentation models for identifying specific small crack
extracted regions containing cracks in the second stage. The details. The model adopts two-step strategy. As shown in
two-stage methods enhance the network's segmentation Fig. 9, the first step uses U-Net with ResNet-50 to obtain
accuracy of fuzzy cracks. initial crack predictions. The framework contains two stage
Kang et al. (2020) presented a two-step approach to hybrid for both training and inference (Park et al., 2021). The second
crack detection, segmentation and quantification. First, refinement stage is based upon the advantage of ActorCritic
various bounding boxes are used to locate the crack area method and incorporates an agent that improves the initial
detected by the faster R-CNN algorithm, and then the inte- segmentation map and bridges gaps in predictions. The
grated CLAHE algorithm is used to improve the smoothing method alleviates the problem about the disconnected
function, the tubular flow field (TuFF). Thus, the modified segmentation along boundaries.
TuFF algorithm can segment cracks with complex back- Yang and Ji (2021) investigated a two-stage crack
grounds from the detected crack regions. Finally, the crack segmentation model using transfer learning to improve
width and its length are measured at a pixel level by using the precision and processing speed of detecting cracks
improved distance transform method (DTM). The advantage uncomplex scenes. The knowledge base is responsible for
to this method is the case that the cost of building a bounding storing two types of data: (1) crack images on different
box-based training data set used by Fast R-CNN is much lower buildings and crack images interfered by various noises
than that of building pixel-level annotated training images. (such as shadows, stains, uneven lighting); (2) model weights
Compared with DeepCrack and Mask R-CNN, the proposed obtained for VggNet and ResNet models on ImageNet data
method is more suitable for crack segmentation in compli- set. In the first stage, the VggNet model for crack
cated backgrounds. However, this method produces poor classification is trained on the crack classification image
contour predictions for large cracks, especially those with data set in the knowledge base. After its parameters are
complex contours. fine-tuned, VggNet can accurately determine whether the
To improve the pixel-level detection accuracy of fuzzy newly input image contains cracks. In the second stage, the
cracks and realize the analysis of refined pavement damage, U-Netþþ model for crack segmentation is trained on the
Jiang et al. (2021) proposed a pixel wise crack detection crack segmentation image data set in the knowledge base.
network HDCB-Net based on a two-stage strategy. In the first The input data consists of crack data sets with various
stage of the network, YOLOv4 is applied to identify images shapes and background disturbances. The encoder part on
with or without cracks. Overlapping sliding window strategy the U-Netþþ model is replaced with ResNet, which is preset
is applied on the images with cracks to generate coarse to the weights obtained on the ImageNet data set. Finally,
proposal regions containing cracks. In the second stage, the Vgg16 classification model, U-Netþþ and the backbone
HDCB-Net uses the proposed hybrid dilated convolution network ResNet-18 constitute a two-stage crack detection
block (HDCB) to extract features from the rough proposal model. Using the combined loss function, this crack
area. HDCB-Net expands the receptive field of the detection model achieves 98.6% classification accuracy,
convolution kernel while reducing the computational 99.21% pixel segmentation accuracy and 84.62%
complexity and realizes pixel-wise crack detection. The test segmentation mIoU on the test data set.
results on five data sets show that the recall of the two-stage Another recently approach based on two stages for sleeper
strategy is higher than that of the one-stage method and the crack identification and refinement of crack segmentation
two-stage strategy detects 7.5 times faster on test images results is proposed by Li et al. (2022a, b) to improve the
than the one-stage segmentation network. However, the prediction efficiency and accuracy of the model. First, the
new model is harder to be trained than the one-stage model. sleeper images enter a CEDNet based on the improved

Fig. 9 e Overview of the framework proposed by Park (Park et al., 2021).


958 J. Traffic Transp. Eng. (Engl. Ed.) 2022; 9 (6): 945e968

YOLOv3 network. By adding SE and SPP modules to the end of identify the location of potential cracks, while a random
YOLOv3, CEDNet improves the accuracy of rough localization forest classifier is used for tactile identification of cracks to
prediction map of the sleeper crack region. Then, the confirm their existence. The coordinates of the detected
prediction map is input into the CEDNet that is a coarse cracks are sent to the robot that can approach the area to be
extraction network of saliency features based on ResNet-34, inspected. It is found that using only tactile data to detect
and the coarse saliency prediction map to sleeper cracks is cracks would be time-consuming, taking about 199 s, as the
output. In these rough saliency maps, crack boundaries are robot would need to perform multiple surface scan
blurred, some salient regions are missing, the background is operations to find and characterize cracks. But, when the
incorrectly labeled as an object, and the localization is model fused the visual and tactule surfaces detection, it took
inaccurate. Next, these rough saliency maps feed into a only 31 s to detect cracks and could detect 92.85% of the cracks.
crack residual refinement network (CRRNet) that contains Aim to create a high-performance pavement crack detec-
encoder, decoder and bridge connection for further tion and measurement system, the work (Fan et al., 2020)
optimizing segmentation results. The proposed method can explored a pooling-free CNN based on probabilistic fusion to
identify details of thin cracks; however, the parameters to enhance the ability of the small cracks in the image. The
the model need to be adjusted twice that increases the time proposed method consists of four key steps: (1) the outputs
cost to manual processing. of three individual CNN were averaged to obtain a predicted
probability map; (2) morphological operation was used to
3.8. Crack image segmentation based on multi-modality segment image; (3) medial-axis algorithm was applied to
fusion obtain the crack skeleton; (4) measure the crack length and
width according to different crack types (complex, common,
Another successful crack segmentation method is to use an thin and intersecting cracks).
ensemble model that combines the outputs of several models In order to achieve efficient segmentation of cracks and
and exhibits better generalization ability because each model potholes in practical environment, Guan et al. (2021) proposed
may have learned to focus on different features. an improved U-Net deep learning architecture with the
Aim to make the CIS accuracy meet the need of practical introduction of depth wise separable convolution to segment
applications, Wang et al. (2020b) proposed a neural network cracks and potholes in an image set composed of color
ensemble segmentation (NNES) method to detect cracks in images, depth images and color depth-overlapping images.
the image. The NNES combines a fully convolutional Under the condition of illumination variation, this improved
network CRACK-FCN and multi-scale structured forest edge U-Net architecture has the best segmentation effect for the
detection (SFD). The CRACK-FCN uses FCN to extract the overlapped image, followed by the depth image and the
overall shape of the crack, and then the SFW classifier is color image. Based on the proposed deep learning
used to capture the details of the crack. The SFW combines architecture and volume measurement algorithm, this work
the SFD with the semi-reconstruction method of further realizes pixel level pothole automatic detection and
antisymmetric biorthogonal wavelets to overcome the volume measurement.
limitations of the disconnection of crack edge detection. The Another recent architecture (Fan et al., 2022) utilized three
proposed NNES can reduce the detection error caused by parallel ResNet to detect cracks. It used mathematical
complex backgrounds. However, the NNES is not as ideal as morphology to further reduce noise, correctly identify thin
other methods in terms of MAE and running time due to the and complex cracks, and improved segmentation accuracy
output maps union. to build a high-performance pavement crack detection and
Palermo et al. (2021) explored to locate and identify cracks in measurement system. This method can effectively perform
different extreme environments using combined visual and on static crack images but it isn't suitable for detecting
force sensing modalities. In this work, the proposed Faster R- cracks in end-to-end video streams. The overall procedure of
CNN-based algorithm can analyze the captured images to the proposed approach is summarized in Fig. 10.

Fig. 10 e Overview of the automated pavement crack detection and measurement approach (Fan et al., 2022).
J. Traffic Transp. Eng. (Engl. Ed.) 2022; 9 (6): 945e968 959

3.9. Crack image segmentation based on unsupervised threshold module are integrated to increase the prediction
learning accuracy of the model. The memory-augmented module
helps the network weaken the influence of irregular features
Unsupervised learning is another way of training ML. It is a on image reconstruction, and the hard-threshold module
statistical method to find some underlying structure in the enlarges the difference between the inputs and the
unlabeled data, so this method does not require labeling the reconstructions of abnormal images. The proposed method
training data set. Currently, few of ML- and DL-based seg- realizes unsupervised learning of the whole network and
mentation methods of the pavement cracks employ unsu- greatly reduces the time to collect and label crack images,
pervised solutions. and the network shows good performance at the task of
Duan et al. (2020) was one of the first unsupervised crack pavement crack classification.
segmentation methods. Aiming at the problems of manual
labeling with a strong subjectivity and heavy workload and 3.10. Crack image segmentation based on weakly
over-reliance on labels in the supervised DL training, Duan supervised learning
proposed an unsupervised-learning crack detection method
based on the generative adversarial network (GAN) that Most CIS methods use supervised learning and rely on a large
converts crack images into binary images. A cycle-consistent number of masked crack images with pixel-level annotations.
loss is introduced to improve the accuracy of crack However, a common problem with supervised learning is that
localization. A CNN connected by eight residual blocks is manual annotations at the pixel level can be time-consuming
used as a generator for feature extraction, and a five-layer and costly. Therefore, some semi-supervised or weakly su-
FCN is used as a discriminator. Post-processing can achieve pervised learning-based CIS methods have recently been
crack enhancement and smoothing of crack boundaries by proposed to solve this problem. These approaches aim to train
using the morphological opening and separate bright spots models that perform CIS tasks by using data sets with image-
around cracks by the top-hat operation. The proposed level classification labels, annotations in the form of bounding
unsupervised crack segmentation method outperforms rule- boxes, or only a small subset of pixel level annotations.
based and ML-based methods. To lower the cost of labeling training data set, Li et al. (2020)
Mubashshira et al. (2020) established an unsupervised established a semi-supervised learning based detection model
model to detect and localize pavement cracks. Firstly, the of pavement crack that can use unlabeled pavement images
pavement area is detected by analyzing the color histogram for model training. The model consists of a fully
on the pavement. Then, the dent noise on the pavement convolutional segmentation network FC-DenseNet and a
image is thresholded by Otsu. The K-means clustering discriminator that distinguishes predictions from target
method is used to segment the cracks in the image, and the samples. FC-DenseNet is suitable for pavement crack
broken crack pixels are connected by the morphological detection. It requires fewer parameters than traditional
closing operation after the crack feature is extracted. Finally, convolutional networks. The discriminator network can use
contour selection could remove very small contours that unlabeled road images during training model and generate
may be noise and particular large contours that are often additional monitoring signals for untagged crack images
unlikely to be crack regions to remove noise and preserving during training model that allows the model to detect low-
crack edges. This unsupervised method outperforms edge resolution information. The model can better adapt to crack
detection and Otsu's segmentation method in crack detection. detection in complex environments, and can accurately
To improve the intelligence and generalization of the detect crack regions in diverse environments. The limitation
automatic classification method of pavement cracks, Li et al. to this model is that the results of detecting relatively fine
(2021) studied an automated pavement crack classification cracks in complex environments are not very satisfactory.
method that combines a deep CNN and a K-means clustering Aim to no longer depend on the pixel-annotation crack
with unsupervised learning. The proposed method does not data sets, Zhang et al. (2020) put forward a crack detection
require manual labels of ground truth images for model network with self-supervised structure. The proposed
training, and employs a mini-batch gradient descent (MBGD) network consists of two GANs: a forward GAN F trained to
loss function to train the network. The model's classification generate a structured image (Y) from an input crack image
results for transverse (AP 0.806), longitudinal (AP 0.792), and (X), and a reverse GAN R that the input structured image (Y)
Alligator cracks (AP 0.913) show the promise of unsupervised is converted back to the crack image (X). GAN F and GAN R
learning for crack image classification. use the same network architecture where each GAN consists
Since the arrangement of pavement cracks doesn't of two parts: U-Net Generator and One-Class Discriminator.
centralize on one place, it is hard to get more images con- The generator is a U-shaped image-to-image translation
taining cracks. The motivation behind Wu et al. (2021a) was to network, and the discriminator is a classification network
overcome the above difficulty. Wu et al. (2021a) established a with a larger receptive field that is implemented by adding
new unsupervised crack classification model called MemAE, another strided convolutional layer on top of the original
which can acquire more features near the target from the discriminator. The proposed method achieves the best
road images. A self-supervised pre-training strategy and a Hausedorff (HD) score and F1-score due to cycle-consistent
post-processing method are combined to improve the supervision and the proposed one-class discriminator.
performance of MemAE for pavement crack classification. A Aim at the “all Blacks” problem that is due to partially ac-
memory-augmented convolutional auto-encoder and a hard- curate labeling ground truths (GTs), Zhang et al. (2021)
960 J. Traffic Transp. Eng. (Engl. Ed.) 2022; 9 (6): 945e968

proposed a CrackGAN network with following contributions: Second-level indicators derived from the number of sta-
(1) it address this problem by the proposed crack-patch-only tistics of the first-level indicators include pixel accuracy,
supervised adversarial learning and the asymmetric U- precision, and recall. In order to make standardization mea-
network to achieve monitoring at the image level; (2) even it surement easier, each of them is a ratio between 0 and 1. Like
is trained with small image patches and partially annotated in Eq. (1), pixel accuracy is the ratio of the number of pixels
GT images, it can deal with the crack detection/ that belong to the appropriate category to the total number
segmentation tasks on full-size crack images with good of pixels in the image. The proportion of our correctly
performance; (3) the method can alleviate the data predicted crack pixels among all forecasted pixels is referred
imbalance problem. to as precision, just like in Eq. (2). Similar to Eq. (3), recall
Ko€ nig et al. (2021) developed a weakly supervised-based indicates the proportion of cracked pixels among all cracked
crack segmentation network. They explored a classification pixels that we correctly predicted.
CNN to create pseudo-labels of cracks, which can be merged Ideally, it is best to have precision and recall both high, but
into a training data set of the CIS model. The proposed in general, when the precision is high, the recall is low, and
method selects ResNet-50-152 as the image classification when the recall is high, the precision is low. Therefore, in
CNN, the last convolutional layer of the classifier uses practice, it is often necessary to comprehensively weight
GradCamþþ to generate sufficient gradient activation maps, these two indicators, which leads to a new indicator Fb -score,
and the modified ResNet classifier generates only two output as shown in Eq. (4). This is the harmonic value considering
classes to determine whether the patch contains cracks. The precision and recall. When b ¼ 1, it is called F1 -score, as in
corresponding pixel-level segmentation pseudo-labels can Eq. (5), precision and recall with the same weights; b < 1, the
be generated from crack images with the image-level precision rate is more important; and b > 1, the recall rate is
classification labels. These pseudo-labels are not as accurate more important.
as the “correct” segmentation labels, but they are sufficient
TP þ TN
to train an end-to-end segmentation algorithm. Accuracy ¼ (1)
TP þ TN þ FP þ FN

TP
4. Data sets, evaluation metrics and loss Precision ¼ (2)
TP þ FP
functions
TP
This section further answers Q2. First, it provides some pub- Recall ¼ (3)
TP þ FN
licly available data set resources that can be used to train and
 
test CIS models, then summarizes popular metrics used to Precision  Recall
Fb ¼ 1 þ b2 (4)
evaluate the performance of segmentation models, and finally b2  Precision þ Recall
introduces loss functions commonly used for CIS models.
Precision  Recall 2 TP
F1 ¼ 2 ¼ (5)
4.1. Data sets Precision þ Recall 2 TP þ FN þ FP
where TP denotes the number of true positive, FP denotes the
Most DL-based supervised learning CIS methods rely on reli-
number of false positive, TN denotes the number of true
ably annotated data sets. Most of the crack data sets are
negative and FN denotes the number of false negative. In Eq.
gathered and built by researchers themselves for their own
(4), the recall rate is weighted b times the accuracy rate.
work so that there are no public-access channels of these data
sets. Table 2 displays the details of the commonly publicly
4.2.2. Intersection over union and mean intersection over
available data sets used for crack detection in the current
union
researches. It contains the number of images in the data set,
Intersection over union (IoU), as in Eq. (6), calculates the ratio
a brief description of the data set, the annotation types of
of the intersection and union of the model's prediction result
the images, and the way to access the data set.
for a certain category and the true value.
Mean intersection over union (mIoU) is the ratio of the
4.2. Evaluation metrics
intersection and union of the model's predicted results for
each category and the true value, summed and averaged.
Models should be assessed by appropriate metrics to measure
Taking the mIoU of the binary classification as an example,
their performance on the data sets. In this section, it intro-
mIoU ¼ (IoU_positive þ IoU_negative)/2, as in Eq. (7).
duced assessment metrics used for the literature to evaluate
DL-based CIS algorithms. Area of overlap TP
IoU ¼ ¼ (6)
Area of union TP þ FN þ FP
4.2.1. Pixel accuracy, precision, recall, F1-score (dice
 
coefficient) 1 TP TN
mIoU ¼ þ (7)
In classification problems, there are four first-level indicators, 2 TP þ FP þ FN TN þ FN þ FP
i.e., true positive (TP), true negative (TN), false positive (FP),
false negative (FN). They are used to measure whether the 4.2.3. F1-based metrics: ODS and OIS
predicted value as a sample is consistent with the actual value However, IoU is not an appropriate choice to evaluate CIS al-
of it. gorithms for the extreme imbalance of foreground-
Table 4 e Overview over commonly used openly available surface crack data sets.
Reference Name Num. images (resolution) Description Annotations
Yang et al. (2020) CRACK500 500 (2000  1500 px) (Public) pavement cracks, captured with Pixel-level
smartphone. (Train/Val/Test Split: 250/50/200)
Eisenbach et al. (2017) GAPs v1 1969 (1920  1080 px) (Public) data captured by specialized vehicle with Detection
multiple damage types such as cracks and
potholes. (Train/Val/Test Split: 1418/51/500)
Stricker et al. (2019) GAPs v2 50 (1920  1080 px) (Public) enlarge the GAPs v1 by 500 additional Detection
50 k (160  160 px) images (Train/Val/Test Split: 1417/551/500)
Stricker et al. (2021) GAPs 10 m 2 (10 m  4.5 m) (Not publicly available) 193 images (4 m  3 m), Pixel-level
201 images (10 m  4.5 m)
Yang et al. (2020) GAPs 509 509 (640  540 px) (Public) subset of 384 (1920  1080 px) crack Pixel-level
images from GAPS v1
Zou et al. (2012) CrackTree200 200 (800  600 px) (Public) cracks with shadows, shading. Pixel-level
Zou et al. (2019b) Cracktree260 260 (various sizes) (Public) expansion of the CrackTree Pixel-level

J. Traffic Transp. Eng. (Engl. Ed.) 2022; 9 (6): 945e968


Zou et al. (2019b) Stone331 331 (512  512 px) Stone surfaces Pixel-level
Zou et al. (2019b) CRKWH100 100 (512  512 px) 100 images of road surface taken in visible light Pixel-level
Zou et al. (2019b) CrackLS315 315 (512  512 px) Road pavements Images, captured under laser Pixel-level
illumination
Shi et al. (2016) Crackforest (CFD) 118 (320  480 px) (Public) the image of road surfaces in Beijing Pixel-level
contains noise such as water stains, oil spots and
shadows
Amhaz et al. (2016) AELLT Aigle RN (38), ESAR (15), LCMS (5), (Public) 68 photos of road cracks taken by three Pixel-level
LRIS (3), and Tempest (7) systems Aigle RN, ESAR, LCMS, LRIS, and
Tempest.
Liu et al. (2019) DeepCrack (DCD) 537 (544  384 px) (Public) diverse surface textures and scenes. The Pixel-level
cracks range from 1 to 180 pixels wide.
Yang et al. (2018) Yang. CD 800 (72e300 dpi) (Public) pavement and concrete walls. The cracks Pixel-level
range from 1 to 100 pixels wide.
Oliveira and Correia (2014) CrackIT 84 (1536  2048) (Public) pavement surface grayscale images Pixel-level
Tabernik et al. (2020) TabernikSDD 399 (1408  512) (Public) the data set consists of 50 defected Pixel-level
electrical commutators
Maeda et al. (2018) Road Damage 9053 9053 (600  600 px) (Public) 9053 smartphone camera images were Detection & Image-level
annotated with bounding boxes of damage
Dorafshan et al. (2018) SDNet2018 56 k (256  256 px) (Public) images of pavements, walls and bridge Image-level
decks with optional cracks

Ozgenel and Sorguç (2018) CCIC 40 k (227  227 px) (Public) images of concrete with optional cracks Image-level
Chen and Jahanshahi (2018) BDC 5326 (120  120 px) (Public) 5326 crack and noncrack image patches Image-level

961
962 J. Traffic Transp. Eng. (Engl. Ed.) 2022; 9 (6): 945e968

background data at the pixel level. Some studies (Ko € nig et al., Tversky loss (Salehi et al., 2017) is based upon the Tversky
2021; Liu et al., 2019, 2021; Qu et al., 2021b; Yang et al., 2020) index, which is a generalization of the Dice coefficient. It
use F1 -score -based metrics: optimal data set scale (ODS), increases the weights of FP or FN to make a trade-off
optimal image scale (OIS). between precision and recall rate, which address the
In order to calculate the OIS and ODS indicators, binarize a imbalanced data problem in segmentation tasks.
single prediction of a sample i at the confidence threshold Focal loss (Lin et al., 2020) adapts the cross-entropy
t 2 [0, 1]. If it is greater than t, it is set to 1, which is repre- criterion by multiplying the probability with its original loss
sented as positive samples; if it is less than t, it is set to 0, value, reducing the weight of the negative class so that the
which is represented as negative samples. positive class gets more attention from the network. Focal
ODS, OIS represents different ways of setting the threshold loss helps the network focus on poorly trained pixels, so it
t: ODS, also known as global optimal, select a fixed threshold t performs well on extremely imbalanced data sets.
to apply to all images to maximize the F-score on the entire A survey of loss functions used in the Literature related to
data set. OIS, also called the best single image, determinates CIS is shown in Table 5. It can be seen that BCE and Dice Loss
the optimum threshold t that maximizes the F-score of the have better effects on CIS models based on the supervised
picture. learning algorithms. For weakly/semi supervised algorithms,
L1 loss and adversarial loss can be adopted. Table 4.
4.3. Loss functions

The loss function is used to measure the degree of difference 5. Comparison between crack detection and
between the predicted value f(x) of the model and the real crack segmentation
value Y of the sample. The model constantly adjusts the
model parameters for the training process according to the 5.1. Application
value of loss function, so that it can predict the correct cate-
gory of new sample as far as possible. Therefore, it is very Here are the discussions about applications of crack detection
important to choose an appropriate loss function for the and crack segmentation.
model based upon the task. Crack detection is a technique that applies classification
Loss functions commonly used in IS algorithms include technique to serve two purposes, (1) distinguish the cracks in
cross-entropy loss, weighted binary cross-entropy loss, Dice the image; (2) tell us where each crack is in the image by
loss, Tversky loss, and Focal loss. drawing bounding boxes around the found cracks. This tech-
Cross-entropy loss function assigns equal weight to the nique is effective if you want to simply recognize cracks in the
background pixel and crack pixel, so it is not suitable for the scene and obtain estimated localization of them. However, if
crack data set with uneven samples. In many literatures on you need to know more about the shape or curvature of the
CIS, this problem is solved by using a weighted binary cross- cracks, crack detection technique is not enough because it
entropy loss function, which can be written as Eq. (8). cannot provide information on the area/perimeter of the
  X    cracks base on the bounding boxes around them.
WBCE W;wðmÞ ¼  b log Pi yi ¼1X;W;wðmÞ
i2Yþ
X    (8)
ð1 bÞ log Pi yi ¼0X;W;wðmÞ
i2Y
Table 5 e Loss function used by published articles on DL-
where during the image-to-image training, the loss function is based CIS.
computed over all pixels in a training image X ¼ ðxi; i ¼ 1;/;jXjÞ
 Loss function Article
and crack map Y ¼ ðyi ;i¼ 1;/; jYjÞ: ðyi X; W;wðmÞ Þ ¼ sðam i Þ2 [(0,
Cross entropy Chen et al. (2021); Fan et al. (2022);
1)] is computed by a sigmoid function s(.) on the activation at
Huyan et al. (2020); Liu et al. (2019a);
pixel. The imbalance between cracked and non-cracked pixels
Wang et al. (2020a); Wang and Cheng
can be offset by artificial adjustment of the value of weight b, (2020); Yang et al. (2018)
b ¼ jY j=jYj and 1  b ¼ jYþ j=jYj, jYþ j and jY j denote the pro- BCE Guo et al. (2021); Jiang et al. (2021); Li
portion of crack and non-crack pixels in ground truth image, and Tang (2021); Li et al. (2021a); Park
respectively. et al. (2021); Yang et al. (2020a); Zhou
Dice loss function is designed based on dice coefficients et al. (2021)
Dice loss Pang et al. (2022); Wang and Su (2022);
(Milletari et al., 2016) for images with highly imbalanced data.
Zhang et al. (2021)
It has two versions. Generalized dice loss is used for multi- € nig et al. (2019);
BCE þ Dice Chen and Lin (2021); Ko
class segmentation, with weights at each level inversely Li et al. (2022b); Sun et al. (2020); Yang
proportional to class frequency. The dice loss for two and Ji (2021)
classifications can be defined as Eq. (9). Focal loss Qiao et al. (2021)
mIoU loss Choi and Cha (2020)
2jPGj 2TP MSE loss Wu et al. (2021a)
DðP; GÞ ¼ ¼ (9)
jPj þ jGj 2TP þ FN þ FP L1 loss Zhang et al. (2020, 2021)
Adversarial loss Ko€ nig et al. (2022); Li et al. (2020)
where P and G are sets of predicted binary labels and ground Cyclic consistency Duan et al. (2020); Zhang et al. (2020)
truth, respectively. If it is used in the loss layer during the loss
training process, it balances precision and recall rate equally.
J. Traffic Transp. Eng. (Engl. Ed.) 2022; 9 (6): 945e968 963

Crack segmentation is an additional extension of crack indicators are based on the metrics for evaluating image
detection since cracks can be marked through the pixel-wise classification.
masks predicted by the model based on image segmentation. Pixel accuracy, mean pixel accuracy, class pixel accuracy,
Crack segmentation help sus obtain more information about intersection over union (IoU), mean intersection over union,
shape or curve of cracks by predicting which class each pixel and dice coefficient are the assessment metrics used for crack
in the image belongs to. It specifically helps to further mea- segmentation. These evaluation measures are usually vari-
sure and evaluate the distribution parameters of the crack ants of pixel accuracy and IoU.
length, width, and aggregate area based on each segment
result of cracks.
You can look at the crack segmentation models if you 6. Common problems and future research
require additional details regarding cracks. If not, crack opportunities
detection still functions for you.
This section addresses Q3, first introduces the common
5.2. Data set problems in DL-based CIS work, and then gives possible
research directions for this field.
Crack detection and crack segmentation based on supervised
learning need abundance of training images with the labels. 6.1. Common problems
Compared with labeling the position along the crack in the
image with a rectangular box, accurately labeling each pixel The following is some general problems encountered by DL in
corresponding to the crack in the image is a more compli- the field of CIS and corresponding exiting solutions to them.
cated, time-consuming and challenging task. Therefore, as of Data sets: although there are some publicly available data
now, the data sets used for crack segmentation task are fewer sets for CIS, the scale of these sets is limited to a few hundred
in number and smaller in scale than that used for crack crack images with pixel-wise annotation, openly attainable
detection, especially the publicly available crack segmenta- benchmark data sets for training, testing and evaluating crack
tion data sets are relatively less. detection and segmentation models are still lacking. The
small data sets make it difficult for crack segmentation. When
creating data sets for crack segmentation, we encounter the
5.3. Loss functions following obstacles. First, it is difficult to curate a data set that
reflects every possible background circumstance. Second,
The loss functions used for crack detection and crack seg- making the ground truth for crack image segmentation takes a
mentation are discussed below. lot of money, and meanwhile labels of tiny cracks might be
Object detection uses two basic loss functions: classifica- missing. Thirdly, crack segmentation is a small data problem,
tion loss and regression loss. Classification loss usually in- in contrast to image classification using natural images, which
cludes cross entropy loss, focal loss, distributional ranking is a huge data problem. Additionally, there are some GTs of
loss and average precision loss. Regression loss includes images in the data sets that have been mislabeled.
smooth L1 loss, balanced L1 loss, KL divergence loss and re- Over-fitting: the complex models used for crack segmen-
gion-based loss. The region-based loss plays a very good role tation are prone to over-fitting not only on the training data
on the task of a regression box. but also on the validation set due to the relatively small size of
In the crack segmentation task, the loss function can be the data sets. This ultimately lowers the model's stability.
divided into distribution-based, region-based, boundary- Data augmentation techniques and fine-tuning with pre-
based and compound loss functions. The distribution-based trained models are typical solutions to this issue.
loss function aims to minimize the difference between two Class imbalance: one of the main challenges in the CIS is to
distributions. The most basic function in this category is deal with imbalanced data, i.e., the number of crack pixels
cross-entropy, and all the other functions can be viewed as within the crack image is much less than the number of
deriving from cross-entropy. Region-based loss aims to background (the background is absolutely dominant), which
minimize mismatch or maximize the overlap area between will cause the classification results to be biased towards more
the ground truth and the predicted segmentation. The key observed background classes. There are two common ways to
factor of this kind is the dice loss. Boundary-based loss is a improve the classification accuracy of the model for unbal-
novel loss function designed to minimize the distance be- anced classes: the perspective of improving data and the
tween the ground truth and the predicted segments. In gen- perspective of optimizing model training (Kuhn and Johnson,
eral, the compound loss function performs better, but there is 2013). Improving the data includes two schemes: adjusting the
no perfect loss function for every situation. sample weights and the sampling strategies. Optimizing
model training can be achieved by optimizing cost functions.
5.4. Evaluation metrics Metrics: the applicability and reliability of any method of
crack detection and crack segmentation can be evaluated by
Below is a discussion of the assessment metrics used in crack reviewing its performance results. In the reviewed articles, the
segmentation and crack detection tasks. researchers used different performance metrics to validate
Confusion matrix, accuracy, recall, precision, PR curve, AP/ their methods on own-setup data sets. Therefore, it is hard to
mAP, F-score, ROC curve, and AUC value are the evaluation compare their experimental results due to without the stan-
metrics used for crack detection tasks. The majority of these dard benchmark of evaluation, although they perform similar
964 J. Traffic Transp. Eng. (Engl. Ed.) 2022; 9 (6): 945e968

tasks. It will greatly increase the comparability of algorithms if network optimization such as careful initialization,
they use public benchmark data sets, uniform evaluation normalization methods and skip connections, it is unclear
procedures, and well-defined metrics. whether the optimization parts used in training neural
Semi, weakly and unsupervised learning: the number of networks has been done properly for both theoretical and
semi-/weakly-supervised and unsupervised learning methods empirical researchers.
utilized for CIS is still relatively modest, especially there are Integrated attention mechanism and ViT: new attention
essentially no CIS methods based on unsupervised learning. mechanism combined with CNN, such as DaViT (Ding et al.,
Temporal and depth data: the severity of a crack can be 2022), NeuFA (Li et al., 2022a), and FLANet (Song et al., 2022),
estimated by measuring the depth of the crack, but there is also need to be noticed to improve pixel-level prediction
currently no mature DL-based technology to measure crack accuracy of the CIS models.
depth. Meanwhile, the availability of time data demonstrating The segmentation method based on two-stage: the current
crack growth is another issue. The currently available data set some researches show that the crack segmentation methods
only shows cracks at a single point in time. Since DL has made based on two-stage can generally achieve higher prediction
remarkable progress in other fields such as video sequence accuracy than the schemes based on one-stage, but their
prediction (Guen and Thome, 2020) or time series prediction training efficiencies are lower. In the future, researchers could
(Lim and Zohren, 2021), crack detection, segmentation and try to improve the model training process and crack seg-
assessment can greatly benefit from this type of data set mentation efficiency of the two-stage segmentation method.
with the data of the crack propagation process as it can The method based on multi-modality fusion: this scheme
support predictive maintenance measures. Another concern can effectively improve the performance of CIS. Therefore, the
is currently no mature DL-based technology to measure effective fusion model, such as the fusion of traditional
crack depth. detection methods and DL methods, can also be explored to
overcome the shortcomings of DL-base CIS methods.
6.2. Future research opportunities Semi, weakly and unsupervised learning: researches on
semi-/weakly-supervised learning and unsupervised learning
DL-based algorithms are expected to replace traditional could be a very valuable way to automatic CIS due to the dif-
methods to achieve more accurate and robust CIS. Although it ficulty and cost of obtaining large amounts of correctly pixel-
has achieved remarkable successes, DL-based crack detection wise labeled data.
method still needs to be improved, especially at the pixel-level Quantitative cracks: the current researches mostly focus
IS. The possible research opportunities to be considered in on the detection and segmentation of cracks. Based on the CIS
future work are demonstrated as follows. results, further researches can be extended to quantify the
Data set: more publicly available standard data sets with length, width, and shape of cracks, and then classify the
accurate pixel-wise annotation is required to be used as severity of cracks according to the quantification results.
benchmark data sets for training, testing and evaluating crack Moreover, the measurement of crack depth can help deter-
detection and segmentation models. We suggest that the data mine the severity of cracks; so subsequent study could focus
sets used for crack segmentation can be improved in the on integrating DL and 3D technologies to detect and estimate
following ways. (1) Purely increasing the data set to improve 3D information of crack.
performance is not a practical solution. However, models can Implementation: the DL-based crack segmentation archi-
achieve higher test scores on small training data sets if suffi- tectures are highly parameterized and require a large memory
cient variances were provided among the data samples used and fast processing devices. More attention can be paid to
for model training. Meanwhile, complexity and variance in the optimization and real-time detection of CIS models, so that
data samples used to train the network have a greater impact the algorithms can be ported to mobile phones and industrial-
on model performance than the size of the data set (Ali et al., grade chips to run.
2021). (2) A new paradigm is needed to replace manual
labeling of cracked regions, especially for accurate annotation
of boundary regions and fine cracks. (3) Particularly, crack 7. Conclusions
data sets with temporal data of showing crack extension
need to be constructed. (4) In addition, GAN or meta-learning As stated in the Introduction, our main purpose was to help
can also be used to create synthetic data. These techniques researchers who want to implement crack detection through
can be used in the data pre-processing phase. (5) The the pixel-level CIS methods to identify the improvement di-
valuable 3D CIS data sets, which are more challenging to rections of the future work to better use DL method to
create than their low-dimensional counterparts, are in high enhance the performance and efficiency of CIS. Our survey
demand as 3D image segmentation gains popularity in the has several leading implications for research into DL-based
crack image analysis field. CIS. First, this study has identified and collected more than 40
Optimization of neural network: since the data sets used to articles mainly published in the last three years related to DL-
train the CIS network are generally not very large, this also based pixel-level CIS and grouped them into 10 topics to pre-
limits the improvement of the crack segmentation models by sent the state of development in the field. In addition, a
increasing the depth and width of them (Beyer et al., 2022). A comprehensive summary and comparison of the data sets,
more effective strategy would be to focus on the optimization evaluation indicators, and loss functions used in the CIS field
of neural network (Li et al., 2017; Smith and Topin, 2017). provides a reference for researchers to select appropriate
Although there are some existing ways related to neural components for their specific research task. We analyze and
J. Traffic Transp. Eng. (Engl. Ed.) 2022; 9 (6): 945e968 965

compare the differences between segmentation and detection Cao, W., Liu, Q., He, Z., 2020b. Review of pavement defect
of cracks to help the scholars to make selection based on the detection methods. IEEE Access 8, 14531e14544.
goal of their tasks. Finally, six common problems currently Chen, F., Jahanshahi, M.R., 2020. ARF-Crack: rotation invariant
deep fully convolutional network for pixel-level crack
faced in the pixel-level segmentation task of crack images are
detection. Machine Vision and Applications 31, 47.
pointed out, and eight possible future research directions in Chen, H., Lin, H., 2021. An effective hybrid atrous convolutional
this field are suggested. network for pixel-level crack detection. IEEE Transactions on
This study is mainly aimed at the task of 2D pavement CIS. Instrumentation and Measurement 70, 1e12.
In our future research, we plan to expand the research scope Chen, F., Jahanshahi, M.R., 2018. NB-CNN: deep learning-based
to surface crack segmentation tasks of various infrastructures crack detection using convolutional neural network and
and the quantification and measurement of cracks. Naive Bayes data fusion. IEEE Transactions on Industrial
Electronics 65 (5), 4392e4400.
Cheng, J., Xiong, W., Chen, W., et al., 2018. Pixel-level crack
detection using U-Net. In: TENCON 2018-2018 IEEE Region 10
Conflict of interest Conference, Pack Ville, 2018.
Choi, W., Cha, Y., 2020. SDDNet: real-time crack segmentation.
IEEE Transactions on Industrial Electron 67, 8016e8025.
The authors do not have any conflict of interest with other
Ding, M., Xiao, B., Codella, N., et al., 2022. DaViT: Dual Attention
entities or researchers.
Vision Transformers. Correll University, Ithaca.
Dorafshan, S., Thomas, R.J., Maguire, M., 2018. SDNET2018: an
annotated image dataset for non-contact concrete crack
Acknowledgments detection using deep convolutional neural networks. Data in
Brief 21, 1664e1668.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al., 2021. An image is
This work was sponsored by the National Natural Science
worth 1616 words: transformers for image recognition at
Foundation of China (No. 61971005), the Scientific Research scale. In: International Conference on Learning
Project of Department of Transport of Shaanxi Province in Representations 2021, Vienna, 2021.
2020 (No. 20-24K), the Key Project of Baoji University of Arts Duan, L., Geng, H., Pang, J., et al., 2020. Unsupervised pixel-
and Science (ZK2018013), Research Project of Department of level crack detection based on generative adversarial
Education of Zhejiang Province (Y202146796), and Major Sci- network. In: Proceedings of the 2020 5th International
Conference on Multimedia Systems and Signal Processing,
entific and Technological Innovation Project of Wenzhou City
Chengdu, 2020.
(ZG2021029).
Eisenbach, M., Stricker, R., Seichter, D., et al., 2017. How to get
pavement distress detection ready for deep learning? A
systematic approach. In: 2017 International Joint Conference
references on Networks (IJCNN), Alaska, 2017.
Elghaish, F., Matarneh, S.T., Talebi, S., et al., 2021. Deep learning
for detecting distresses in buildings and pavements: a
Ali, L., Alnajjar, F., Jassmi, H.A., et al., 2021. Performance critical gap analysis. Construction Innovation 22 (2), 554e579.
evaluation of deep CNN-based crack detection and Fan, Z., Li, C., Chen, Y., et al., 2020. Ensemble of deep
localization techniques for concrete structures. Sensors 21 convolutional neural networks for automatic pavement
(5), 1688. crack detection and measurement. Coatings 10 (2), 152.
Amhaz, R., Chambon, S., Idier, J., et al., 2016. Automatic crack Fan, Z., Lin, H., Li, C., et al., 2022. Use of parallel ResNet for high-
Detection on two-dimensional pavement images: an performance pavement crack detection and measurement.
algorithm based on minimal path selection. IEEE Sustainability 14 (3), 1825.
Transactions on Intelligent Transportation Systems 17, Gopalakrishnan, K., 2018. Deep learning in data-driven pavement
2718e2729. image analysis and automated distress detection: a review.
Azimi, M., Eslamlou, A.D., Pekcan, G., 2020. Data-driven structural Data 3 (3), 28.
health monitoring and damage detection through deep Guan, J., Yang, X., Ding, L., et al., 2021. Automated pixel-level
learning: state-of-the-art review. Sensors 20 (10), 2778. pavement distress detection based on stereo vision and deep
Badrinarayanan, V., Kendall, A., Cipolla, R., 2017. SegNet: a deep learning. Automation in Construction 129, 103788.
convolutional encoder-decoder architecture for image Guen, V.L., Thome, N., 2020. Disentangling physical dynamics
segmentation. IEEE Transactions on Pattern Analysis and from unknown factors for unsupervised video prediction. In:
Machine Intelligence 39 (12), 2481e2495. 2020 IEEE/CVF Conference on Computer Vision and Pattern
Beyer, L., Zhai, X., Kolesnikov, A., 2022. Better Plain ViT Baselines Recognition, Seattle, 2020.
for ImageNet-1k. Cornell University, Ithaca. Guo, J., Markoni, H., Lee, J., 2021. BARNet: boundary aware
Bochkovskiy, A., Wang, C., Liao, H., 2020. YOLOv4: Optimal Speed refinement network for crack detection. IEEE Transactions
and Accuracy of Object Detection. Cornell Univeristy, Ithaca. on Intelligent Transportation Systems 23 (7), 7343e7358.
Cao, H., Wang, Y., Chen, J., et al., 2021. Swin-Unet: Unet-like Pure Han, C., Ma, T., Huyan, J., et al., 2022. CrackW-net: a novel
Transformer for Medical Image Segmentation. Cornell pavement crack image segmentation convolutional neural
University, Ithaca. network. IEEE Transactions on Intelligent Transportation
Cao, M., Tran, Q., Nguyen, N., et al., 2020a. Survey on performance Systems 23 (11), 22135e22144.
of deep learning models for detecting road damages using He, K., Sun, J., Tang, X., 2013. Guided image filtering. IEEE
multiple dashcam image resources. Advanced Engineering Transactions on Pattern Analysis and Machine Intelligent 35
Informatics 46, 101182. (6), 1397e1409.
966 J. Traffic Transp. Eng. (Engl. Ed.) 2022; 9 (6): 945e968

He, K., Zhang, X., Ren, S., et al., 2015. Deep residual learning for network and K-means clustering. Journal of Transportation
image recognition. In: 2016 IEEE/CVF Conference on Engineering, Part B: Pavements 147, 04021066.
Computer Vision and Pattern Recognition, Alaska, 2016. Lim, B., Zohren, S., 2021. Time series forecasting with deep
Hsieh, Y.-A., Tsai, Y.J., 2020. Machine learning for crack detection: learning: a survey. Philosophical Transactions of the Royal
review and model performance comparison. Journal of Society A 379, 20200209.
Computing in Civil Engineering 34 (5), 04020038. Lin, T., Goyal, P., Girshick, R., et al., 2020. Focal loss for dense
Huyan, J., Li, W., Tighe, S., et al., 2020. CrackU-net: a novel deep object detection. IEEE Transactions on Pattern Analysis and
convolutional neural network for pixelwise pavement crack MachineIntelligence 42, 318e327.
detection. Struct Control Health Monit 27 (8), 00002551. Liu, H., Miao, X., Mertz, C., et al., 2021. CrackFormer: transformer
Jiang, W., Liu, M., Peng, Y., et al., 2021. HDCB-net: a neural network for fine-grained crack detection. In: 2021 IEEE/CVF
network with the hybrid dilated convolution for pixel-level International Conference on Computer Vision (ICCV),
crack detection on concrete bridges. IEEE Transactions Montreal, 2021.
Industrial Informatics 17, 5485e5494. Liu, Y., Yao, J., Lu, X., et al., 2019. DeepCrack: a deep hierarchical
Kang, D., Benipal, S.S., Gopal, D.L., et al., 2020. Hybrid pixel-level feature learning architecture for crack segmentation.
concrete crack segmentation and quantification across Neurocomputing 338, 139e153.
complex backgrounds using deep learning. Automation in Long, J., Shelhamer, E., Darrell, T., 2015. Fully convolutional
Construction 118, 103291. networks for semanticsegmentation. 2015 IEEE Conference
Kang, D., Cha, Y., 2021. Efficient attention-based deep encoder on Computer Vision Pattern Recognition (CVPR), Boston, 2015.
and decoder for automatic crack segmentation. Structural Maeda, H., Sekimoto, Y., Seto, T., et al., 2018. Road damage
Health Monitoring 21 (5), 2190e2205. detection and classification using deep neural networks with
Khallaf, R., Naderpajouh, N., Hastak, M., 2018. A systematic smartphone images. Computer-Aided Civil and
approach to develop risk registry frameworks for complex Infrastructure Engineering 33, 1127e1141.
projects. Built Environment Project and Asset Management 8 Milletari, F., Navab, N., Ahmadi, S.-A., 2016. V-net: Fully
(4), 334e347. Convolutional Neural Networks for Volumetric Medical
Kitchenham, B., Brereton, O.P., Budgen, D., et al., 2009. Systematic Image Segmentation. Cornell University, Ithaca.
literature reviews in software engineeringea systematic Mohan, A., Poobal, S., 2018. Crack detection using image
literature review. Information and Software Technology 51, processing: a critical review and analysis. Alexandria
7e15. Engineering Journal 57, 787e798.
Ko€ nig, J., Jenkins, M.D., Barrie, P., et al., 2019. A Convolutional Mubashshira, S., Azam, M.M., Masudul Ahsan, S.M., 2020. An
Neural: Network for Pavement Surface Crack Segmentation unsupervised approach for road surface crack detection. In:
Using Residual Connections and Attention Gating. In: 2019 2020 IEEE Region 10 Symposium (TENSYMP), Dhaka, 2020.
IEEE International Conference on Image Processing. ICIP, Munawar, H.S., Hammad, A.W., Haddad, A., et al., 2021. Image-
Taipei, 2019. based crack detection methods: a review. Infrastructures 6
Ko€ nig, J., Jenkins, M.D., Mannion, M., et al., 2021. Optimized deep (8), 115.
encoder-decoder methods for crack segmentation. Digital Nair, V., Hinton, G.E., 2010. Rectified linear units improve
Signal Processing 108, 102907. restricted Boltzmann machines. In: 27th International
Ko€ nig, J., Jenkins, M.D., Mannion, M., et al., 2022. Weakly- Conference on Machine Learning, Madison, 2010.
supervised surface crack segmentation by generating Oliveira, H., Correia, P.L., 2014. CrackIT: an image processing
pseudo-labels using localization with a classifier and toolbox for crack detection and characterization. In: 2014
thresholding. IEEE Transactions on Intelligent Transportation IEEE International Conference on Image Processing. ICIP,
Systems 99, 1e12. Paris, 2014.
Kuhn, M., Johnson, K., 2013. Applied Predictive Modeling. Ong, J.C., Lau, S.L., Ismadi, M.-Z., et al., 2022. Feature pyramid
Springer, New York. network with self-guided attention refinement module for
Lau, S.L.H., Chong, E.K.P., Yang, X., et al., 2020. Automated crack segmentation. Structural Health Monitoring, https://
pavement crack segmentation using U-Net-based doi.org/10.1177/14759217221089571.
convolutional neural network. IEEE Access 8, 114892e114899. €
Ozgenel, Ç.F., Sorguç, A.G., 2018. Performance comparison of
Lecun, Y., 1998. Gradient-based learning applied to document pretrained convolutional neural networks on crack detection
recognition. Proceedings of the IEEE 86 (11), 47. in buildings. In: Presented at the 34th International
Li, E., Tang, H., 2021. A novel convolutional neural network for Symposium on Automation and Robotics in Construction,
pavement crack segmentation. In: 20th IEEE International Taipei, 2018.
Conference on Machine Learning and Applications (ICMLA), Palermo, F., Rincon-Ardila, L., Oh, C., et al., 2021. Multi-modal
Pasadena, 2021. robotic visual-tactile localisation and detection of surface
Li, G., Wan, J., He, S., et al., 2020. Semi-supervised semantic cracks. In: 2021 IEEE 17th International Conference on
segmentation using adversarial learning for pavement crack Automation Science and Engineering. CASE, Lyon, 2021.
detection. IEEE Access 8, 51446e51459. Pang, J., Zhang, H., Zhao, H., et al., 2022. DcsNet: a real-time deep
Li, H., Xu, Z., Taylor, G., et al., 2017. Visualizing the Loss network for crack segmentation. Signal, Image and Video
Landscape of Neural Nets. Cornell University, Ithaca. Processing 16 (4), 911e919.
Li, J., Meng, Y., Wu, Z., et al., 2022. NeuFA: neural network based Park, J., Chen, Y., Li, Y., et al., 2021. Crack detection and
end-to-end forced alignment with bidirectional attention refinement via deep reinforcement learning. In: 2021 IEEE
mechanism. In: 2022 IEEE International Conference on International Conference on Image Processing (ICIP).
Acoustics, Speech and Signal Processing (ICASSP), Singapore, Anchorage, 2021.
2022. Qiao, W., Liu, Q., Wu, X., et al., 2021. Automatic pixel-level
Li, L., Zheng, S., Wang, C., et al., 2022b. Crack detection method of pavement crack recognition using a deep feature aggregation
sleeper based on cascade convolutional neural network. segmentation network with a scSE attention mechanism
Journal of Advanced Transportation 2022, 1e14. module. Sensors 21 (9), 2902.
Li, W., Huyan, J., Gao, R., et al., 2021. Unsupervised deep learning Qu, Z., Cao, C., Liu, L., et al., 2021a. A deeply supervised
for road crack classification by fusing convolutional neural convolutional neural network for pavement crack detection
J. Traffic Transp. Eng. (Engl. Ed.) 2022; 9 (6): 945e968 967

with multiscale feature fusion. IEEE Transactions on Neural. Wang, S., Wu, X., Zhang, Y., et al., 2020b. A neural network
Networks and Learning Systems 33 (9), 4890e4899. ensemble method for effective crack segmentation using
Qu, Z., Chen, W., Wang, S., et al., 2021b. A crack detection fully convolutional networks and multi-scale structured
algorithm for concrete pavement based on attention forests. Machine Vision and Applications 31 (7e8), 1e18.
mechanism and multi-features fusion. IEEE Transactions on Wang, W., Su, C., 2022. Automatic concrete crack segmentation
Intelligence Transportation Systems 23 (8), 11710e11719. model based on transformer. Automation in Construction
Ren, S., He, K., Girshick, R., et al., 2016. Faster R-CNN: towards 139, 104275.
real-time object detection with region proposal networks. Wang, W., Xie, E., Li, X., et al., 2021. Pyramid vision transformer: a
IEEE Transactions on Pattern Analysis and Machine versatile backbone for dense prediction without convolutions.
Intelligence 39 (6), 1137e1149. In: 2021 IEEE/CVF International Conference on Computer
Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Convolutional Vision (ICCV), Montreal, 2021.
Networks for Biomedical Image Segmentation. Cornell Wu, T., Zhang, H., Liu, J., et al., 2021a. Memory-augment
University, Ithaca. convolutional Autoencoder for unsupervised pavement crack
Salehi, S.S.M., Erdogmus, D., Gholipour, A., 2017. Tversky Loss classification. In: 2021 China Automation Congress (CAC),
Function for Image Segmentation Using 3D Fully Beijing, 2021.
Convolutional Deep Networks. Springer, Cham. Wu, X., Ma, J., Sun, Y., et al., 2021b. Multi-scale deep pixel
Shi, Y., Cui, L., Qi, Z., et al., 2016. Automatic road crack detection distribution learning for concrete crack detection. In: 25th
using random structured forests. IEEE Transactions on International Conference on Pattern Recognition (ICPR),
Intelligent Transportation Systems 17, 3434e3445. Milan, 2020.
Simonyan, K., Zisserman, A., 2015a. Very Deep Convolutional Xie, S., Tu, Z., 2015. Holistically-Nested Edge Detection. Cornell
Networks for Large-Scale Image Recognition. Cornell University, Ithaca.
University, Ithaca. Yang, F., Zhang, L., Yu, S., et al., 2020. Feature pyramid and
Simonyan, K., Zisserman, A., 2015b. Very Deep Convolutional hierarchical boosting network for pavement crack detection.
Networks for Large-Scale Image Recognition. Cornell IEEE Transactions Intelligent Transportation Systems 21,
University, Ithaca. 1525e1535.
Smith, L.N., Topin, N., 2017. Super-Convergence: Very Fast Yang, Q., Ji, X., 2021. Automatic pixel-level crack detection for
Training of Neural Networks Using Large Learning Rates. civil infrastructure using U-Netþþ and deep transfer
Cornell University, Ithaca. learning. IEEE Sensors Journal 21, 19165e19175.
Song, Q., Li, J., Li, C., et al., 2022. Fully Attentional Network for Yang, X., Li, H., Yu, Y., et al., 2018. Automatic pixel-level crack
Semantic Segmentation. Cornell University, Ithaca. detection and measurement using fully convolutional
Stricker, R., Eisenbach, M., Sesselmann, M., et al., 2019. Improving network: pixel-level crack detection and measurement using
visual road condition assessment by extensive experiments FCN. Computer-Aided Civil and Infrastructure Engineering
on the extended gaps dataset. In: 2019 International Joint 33, 1090e1109.
Conference on Neural Networks. IJCNN, Budapest, 2019. Zakeri, H., Nejad, F.M., Fahimifar, A., 2017. Image based
Stricker, R., Aganian, D., Sesselmann, M., et al., 2021. Road surface techniques for crack detection, classification and
segmentation - pixel-perfect distress and object detection for quantification in asphalt pavement: a review. Archives of
road assessment. In: 2021 IEEE 17th International Conference Computational Methods in Engineering 24, 935e977.
on Automation Science and Engineering. CASE, Lyon, 2021. Zawad, M.R.S., Zawad, M.F.S., Rahman, M.A., et al., 2021. A
Sun, M., Guo, R., Zhu, J., et al., 2020. Roadway crack segmentation comparative review of image processing based crack
based on an encoder-decoder deep network with multi-scale detection techniques on civil engineering structures. Journal
convolutional blocks. In: 2020 10th Annual Computing and of Soft Computing in Civil Engineering 5, 58e77.
Communication Workshop and Conference (CCWC), Las Zhang, K., Zhang, Y., Cheng, H., 2021. CrackGAN: pavement crack
Vegas, 2020. detection using partially accurate ground truths based on
Szegedy, C., Liu, W., Jia, Y., et al., 2014. Going Deeper with generative adversarial learning. IEEE Transactions Intelligent
Convolutions. Cornell University, Ithaca. Transportation Systems 22 (2), 1306e1319.
Szegedy, C., Vanhoucke, V., Ioffe, S., et al., 2016. Rethinking the Zhang, K., Zhang, Y., Cheng, H., 2020. Self-supervised structure
inception architecture for computer vision. IEEE Conference learning for crack detection based on cycle-consistent
on Computer Vision and Pattern Recognition (CVPR), Las generative adversarial networks. Journal of Computing in
Vegas, 2016. Civil Engineering 34 (3), 04020004.

Tabernik, D., Sela, S., Skvarc , J., et al., 2020. Segmentation-based Zheng, S., Jayasumana, S., Romera-Paredes, B., et al., 2015.
deep-learning approach for surface-defect detection. Journal Conditional random fields as recurrent neural networks. In:
of Intelligent Manufacturing 31, 759e776. 2015 IEEE International Conference on Computer Vision
Tan, M., Pang, R., Le, Q.V., 2020. EfficientDet: scalable and efficient (ICCV), Santiago, 2015.
object detection. In: 2020 IEEE/CVF Conference on Computer Zhou, Q., Qu, Z., Cao, C., 2021. Mixed pooling and richer attention
Vision and Pattern Recognition (CVPR), Seattle, 2020. feature fusion for crack detection. Pattern Recognition Letters
Vaswani, A., Shazeer, N., Parmar, N., et al., 2017. Attention Is All 145, 96e102.
You Need. Cornell University, Ithaca. Zou, Q., Cao, Y., Li, Q., et al., 2012. CrackTree: automatic crack
Wang, L., Ma, X., Ye, Y., 2020a. Computer vision-based road crack detection from pavement images. Pattern Recognition
detection using an improved I-UNet convolutional networks. Letters 33, 227e238.
In: 2020 Chinese Control and Decision Conference (CCDC), Zou, Q., Zhang, Z., Li, Q., et al., 2019a. DeepCrack: learning
Hefei, 2020. hierarchical convolutional features for crack detection. IEEE
Wang, M., Cheng, J., 2020. A unified convolutional neural network Transactions on Image Process 28, 1498e1512.
integrated with conditional random field for pipe defect Zou, Q., Zhang, Z., Li, Q., et al., 2019b. DeepCrack: learning
segmentation. Computer-Aided Civil and Infrastructure hierarchical convolutional features for crack detection. IEEE
Engineering 35 (2), 162e177. Transactions on Image Processing 28, 1498e1512.
968 J. Traffic Transp. Eng. (Engl. Ed.) 2022; 9 (6): 945e968

Hongxia Li is a PhD candidate at the Chan- Limin Li received his BS degree in elec-
g'an University. She received her master's tronics science and technology at Zhejiang
degree in computer engineering from Xi'an University in 2006 and PhD degree in
Technological University in 2014. Her cur- communication and information systems at
rent research interests are mainly with the Shanghai Institute of Microsystem and In-
areas of computer vision of artificial intelli- formation Technology, Chinese Academy of
gence and digital image processing. Sciences in 2012. He is currently an asso-
ciate professor in the College of Electrical
and Electronic Engineering, Wenzhou Uni-
versity, China. His current research in-
terests include intelligent information
processing, signal detection and analysis.

Weixing Wang received the PhD degree in


computer vision from Royal Institute of
Technology, Sweden, in 1997. He is currently
a visiting professor in the School of Infor-
mation Engineering, Chang'an University,
Xi'an, China. His research interests include
pattern recognition and intelligence, image
and graph processing and analysis, and
computer vision.

You might also like