0% found this document useful (0 votes)
31 views15 pages

DeepCrack Learning Hierarchical Convolutional Features For Crack Detection

Uploaded by

w1360590955
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views15 pages

DeepCrack Learning Hierarchical Convolutional Features For Crack Detection

Uploaded by

w1360590955
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

1498 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO.

3, MARCH 2019

DeepCrack: Learning Hierarchical Convolutional


Features for Crack Detection
Qin Zou , Member, IEEE, Zheng Zhang, Qingquan Li, Xianbiao Qi ,
Qian Wang , and Song Wang, Senior Member, IEEE

Abstract— Cracks are typical line structures that are of interest one rainy night, which will then be hazardous for high-
in many computer-vision applications. In practice, many cracks, speed vehicles. For a country like China or US, there are
e.g., pavement cracks, show poor continuity and low contrast, over 100,000 Km highway to be tested and maintained peri-
which bring great challenges to image-based crack detection by
using low-level features. In this paper, we propose DeepCrack- odically. Automatic testing methods are greatly desired to
an end-to-end trainable deep convolutional neural network for improve the testing efficiency and reduce the cost. Crack is
automatic crack detection by learning high-level features for one of the most common defects. Fixing a crack before its
crack representation. In this method, multi-scale deep convolu- deterioration can greatly reduce the cost of maintenance. Up
tional features learned at hierarchical convolutional stages are to date, fully automatic crack detection from noise background
fused together to capture the line structures. More detailed
representations are made in larger scale feature maps and more is still a challenge.
holistic representations are made in smaller scale feature maps. As a crack is visually a linear/curvilinear structure, crack
We build DeepCrack net on the encoder–decoder architecture of detection can be formulated as line detection, which is a
SegNet and pairwisely fuse the convolutional features generated fundamental problem in computer vision [5]–[7]. In visual
in the encoder network and in the decoder network at the perception, a crack can be characterized from two perspectives.
same scale. We train DeepCrack net on one crack dataset and
evaluate it on three others. The experimental results demonstrate From a global perspective, it looks like a one-pixel wide edge
that DeepCrack achieves F-measure over 0.87 on the three in the image, as it is thin and often holds jumping intensity to
challenging datasets in average and outperforms the current the background. From a local perspective, it is a line object that
state-of-the-art methods. has a certain width. Accordingly, the crack detection methods
Index Terms— Line detection, edge detection, contour can be roughly divided into two categories: edge-detection
grouping, crack detection, convolutional neural network. based ones and image-segmentation based ones. In the ideal
case, if a crack has good continuity and high contrast, then
I. I NTRODUCTION traditional edge detection and image segmentation methods
could detect it with high accuracy.

C RACKS are common defects that can be found on sur-


faces of various types of physical structures, e.g., the road
pavement [1], [2], the wall of nuclear power plants [3],
However, in practice cracks may constantly suffer from
noise in the background, leading to poor continuity and
low contrast. For example, in the pavement image shown
the ceiling of tunnels [4], etc. Repairing cracks is an important in Fig. 1(a), impulse noises brought by the grain-like pavement
task for preventing the expansion of harms and keeping the texture break the crack and undermine its continuity, while
safety of engineering infrastructures. For example, a crack the shadow reduces the contrast between the crack and the
on the highway pavement will easily become a hole in just background. In addition, the direction of exposure may also
Manuscript received March 1, 2018; revised September 15, 2018; accepted impact the imaging quality of the crack. These complications
October 25, 2018. Date of publication October 31, 2018; date of current commonly lead to degraded performance of the traditional
version November 21, 2018. This work was supported in part by the low-level feature based crack detection methods.
National Natural Science Foundation of China under Grant 61872277, Grant
61301277, and Grant 91546106, in part by the National Key Research and In recent years, deep convolutional neural network (DCNN)
Development Program of China under Grant 2016YFB0502203, and in part by has demonstrated state-of-the-art, human-competitive, and
the Hubei Provincial Natural Science Foundation under Grant 2018CFB482. sometimes better-than-human performance in solving many
The associate editor coordinating the review of this manuscript and approv-
ing it for publication was Dr. Yonggang Shi. (Corresponding author: computer vision problems, e.g., image classification [8],
Qingquan Li). object detection [9], image segmentation [10], [11], etc.
Q. Zou, Z. Zhang, and Q. Wang are with the School of Computer Sci- For line detection, DCNN-based methods have also been
ence, Wuhan University, Wuhan 430072, China (e-mail: qzou@whu.edu.cn;
zhangzheng@whu.edu.cn; qianwang@whu.edu.cn). proposed for tasks such as edge detection [12], [13], contour
Q. Li is with the Shenzhen Key Laboratory of Spatial Smart Sens- detection [14], [15], boundary segmentation [16], [17] and so
ing and Service, Shenzhen University, Shenzhen 518060, China (e-mail: on. These deep architectures build high-level features from
liqq@szu.edu.cn).
X. Qi is with the Shenzhen Research Institute of Big Data, low-level primitives by hierarchically convolving the sensory
Shenzhen 518172, China (e-mail: qixianbiao@gmail.com). inputs.
S. Wang is with the Department of Computer Science and Engineering, In particular, when using deep learning for edge detection,
University of South Carolina, Columbia, SC 29200 USA (e-mail: song-
wang@cec.sc.edu). it has been observed that, the convolutional features become
Digital Object Identifier 10.1109/TIP.2018.2878966 coarser and coarser in the convolving-pooling pipeline, and
1057-7149 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Authorized licensed use limited to: Hebei University of Technology. Downloaded on November 21,2023 at 11:16:41 UTC from IEEE Xplore. Restrictions apply.
ZOU et al.: DEEPCRACK: LEARNING HIERARCHICAL CONVOLUTIONAL FEATURES FOR CRACK DETECTION 1499

Fig. 1. A real example of crack detection using DeepCrack. The bottom row shows the feature maps generated by convolutional feature fusion at different
scales in the DeepCrack net (for the image patch denoted by the rectangle in the input image).

the detailed features in larger-scale layers and the abstracted network at one same scale are fused to compute the
features in the smaller-scale layers can be fused together to training loss at the corresponding scale. The fusion of
improve the performance of edge detection [13], [18], [19]. hierarchical convolutional features is found to be very
When using deep learning for image segmentation, for exam- effective for inferring the cracks out from the image
ple the SegNet [20], the convolutional features in the decoder background.
network have been found to be useful to improve the perfor- • Four datasets are constructed for performance evaluation,
mance of semantic image segmentation, and the indexing of where one dataset containing 260 pavement images is
pooling positions can further improve accuracy of boundary used for training the network, and three others are used
localization. for test. For the three test datasets, two are pavement
Inspired by these observations, we propose to fuse the con- image datasets and one is stone surface image dataset.
volutional features in both the encoder and decoder networks, The ground-truth cracks are manually labeled by human
and construct a new DeepCrack network for crack detection. expert, and the datasets are shared to the community
We build the DeepCrack on the encoder-decoder architecture to promote the research of crack detection. Extensive
proposed in SegNet [20]. In SegNet, a convolution stage in experiments are conducted and the results demonstrate
the encoder network is corresponding to a convolution stage the effectiveness of the proposed method.
in the decoder network, at the same scale. In DeepCrack, The rest of this paper is organized as follows. Section II
we first pairwisely fuse the convolutional features of the briefly reviews the related work. Section III describes the deep
encoder network and decoder network at each scale, which neural network architecture for crack detection. Section IV
produces the single-scale fused feature map, and then combine demonstrates the effectiveness of the proposed method by
the fused feature maps at all scales into a multi-scale fusion experiments. Finally, Section V concludes the paper.
map for crack detection. An example is shown in Fig. 1,
the bottom row shows the fused feature maps at different II. R ELATED W ORK
scales. The sparse feature in smaller scales and the continuous A. Line Detection
feature in larger scales are fused to get better crack-detection Line detection is a fundamental problem in computer vision.
performance. In a broad sense, line detection includes the edge/contour
The contributions of this work lie in three-fold: detection and line object detection. When edges and contours
• Our main contribution is the design of a new neural can be built and perceived on the gradient, the detection of
network architecture for crack detection. This new net- them could be treated as line object detection or line grouping
work takes full use of the information of the encoder in the gradient map. In the past several decades, the research in
and decoder network, and builds a trainable end-to-end edge and contour detection has experienced three main stages.
network for crack detection. The first stage is featured by computing the first
• In the proposed network, a convolutional layer of the order or second order gradients on the pixel intensity, where
encoder network and a convolutional layer of the decoder a representative in this stage is the Canny edge detector [21].

Authorized licensed use limited to: Hebei University of Technology. Downloaded on November 21,2023 at 11:16:41 UTC from IEEE Xplore. Restrictions apply.
1500 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 3, MARCH 2019

In the second stage, edge detection and contour grouping are the main limitation is that the seed points for path tracking
featured by energy minimization methods and middle-level should be set in advance.
feature learning algorithms. The global Pb [22] is a representa- Machine learning based methods have also been investigated
tive learning method for edge detection, and sketch token [23] for crack detection. In [2], deep convolutional neural network
and structure edge detector [24] have promoted the learning was used to classify the image patches into crack blocks and
ability to a peak in this stage. While for contour detection, non-crack ones. In [4], the detection of bridge cracks was
the ratio contour [25], level set [26], [27] and untangling studied by using a modified active contour model and greedy
cycles [28] are part of the representatives, which model the search-based support vector machine. In [3], fully convolu-
line clutters with graph, and minimize the energy function tional neural networks were studied to infer cracks of nuclear
to infer out the contour. In the third stage, the detection power plant using multi-view images. Many other methods
of edges and contours is featured by deep learning, e.g., were also proposed for crack detection, e.g., the saliency
the deep learning method for edge detection [12], [13], [29], detection method [44], the structure analysis methods by using
contour detection [14], [15], and boundary segmentation [11], the minimal spanning tree [45] and the random structure
[16], [17]. In [29], line sections are predicted from image forest [46]. Generally, deep learning based methods produce
patches under a deep learning framework, and a multi-scale better results than traditional methods. However, there still
version was constructed for edge detection. In [30], DCNN lacks investigation on end-to-end trainable CNN models for
feature abstraction and neighbor search are combined together robust crack detection.
to handle edge detection and line object extraction. In [12],
the edge is detected by a deep convolutional network in an III. D EEP C RACK N ETWORK
end-to-end manner. The convolutional features in multiple In this section, we introduce first the architecture of the
convolutional stages are found to be useful for improving the DeepCrack, then the design of the loss function, and finally
edge detection results. Similarly, in [13], richer convolutional the difference of DeepCrack with other deep convolutional
features generated by a fully convolutional network are fused networks.
to further improve the performance.
A number of line object detection methods have also been A. Network Architecture
developed for different application purposes. In [31], a path The DeepCrack network is built on the SegNet net-
voting based method was proposed for wire-line detection work [20]. SegNet is a deep convolutional encoder-decoder
from vessel X-ray image. The minimal paths were calculated architecture designed for pixel-wise semantic segmentation,
on image patches, and were aggregated to construct a line which contains an encoder network and a corresponding
probability map. In [32], road network extraction from satellite decoder network. The encoder network is inspired by the
images was studied by regression learning and optimization. convolutional layers in the VGG16 network [47], which
In [33], edge detector was built on CNNs, and used to consists of 13 convolutional layers and 5 down-sampling
provide information for semantic image segmentation. The pooling layers. The decoder network also has 13 convolutional
convolutional features in different scales were also investigated layers, and each decoder layer has a corresponding layer in
for some other applications, e.g., video segmentation [19] and the encoder network. Thus, the encoder network is almost
symmetry detection [34]. symmetric to the decoder network, where the only difference is
that, the first encoder layer, i.e., the first convolution operation,
B. Crack Detection produces a multi-channel feature map, and the corresponding
Under a normal illuminance, a crack is generally darker last decoder layer, i.e., the last convolution operation, produces
than the background. Therefore, the image thresholding is a c-channel feature map, with c the number of classes in the
a straightforward way for crack detection. For example image segmentation task.
in [35], the threshold value was figured out by examin- After each convolution operation, a batch-normalization step
ing the difference between the cracks and their neighboring is applied to the feature maps. The max-pooling operation
non-crack pixels. In [36], the threshold value was calcu- with a stride larger than 1 can reduce the scale of feature
lated in a heuristic way. However, pavement shadows and maps while not causing translation variance over small spatial
uneven illuminations would undermine the robustness of the shifts, but the sub-sampling will cause a loss of spatial
thresholding-based methods. As the crack is thin and displays resolution, which may lead to the bias of boundaries. To avoid
as an edge, many methods stemmed from edge detection the absence of detail representation, max-pooling indices are
and wavelet transformation have been developed for crack used to capture and record the boundary information in the
detection [37]–[40]. However, the edge information would encoder feature maps when sub-sampling is performed. Then,
easily be tangled by heavy noise. in the decoder network, the corresponding decoder layer uses
As a branch of energy minimization methods, minimal the max-pooling indices to perform non-linear up-sampling.
path searching has also been studied for crack detection. This up-sampling step will produce sparse feature maps.
In [35] and [41], seed-growing methods built on minimal path However, compared with continuous and dense feature maps,
searching were proposed for pavement crack detection. In [42], the sparse feature maps obtain more precise location of region
minimal path searching was performed in a path-voting way. boundaries.
In [43], the minimal path searching was used to track cracks in Meanwhile, due to the nature of hierarchical learning of
complex background. In these minimal-path-based methods, deep convolutional neural networks, multi-scale convolutional

Authorized licensed use limited to: Hebei University of Technology. Downloaded on November 21,2023 at 11:16:41 UTC from IEEE Xplore. Restrictions apply.
ZOU et al.: DEEPCRACK: LEARNING HIERARCHICAL CONVOLUTIONAL FEATURES FOR CRACK DETECTION 1501

Fig. 2. An illustration of the DeepCrack network. The feature maps of the encoder network and decoder network are pairwisely connected and fused at each
convolution stage, which produces fused maps of different scales. At each scale, the pixel-wise prediction loss is calculated by a skip-layer fusion procedure,
independently. Meanwhile, the fused maps at all scales are concatenated and fused to product a multi-scale fusion map, which is the output of the DeepCrack
network. This output is a crack probability map for crack detection.

features can be learnt in the form of increasingly larger recep- corresponding scale in the decoder network. The skip-layer
tive fields in the down-sampled layers. The fusion of the multi- fusion handles the concatenated convolutional features with a
scale convolutional features has been proved to be useful for sequence of operations.
improving the performance of line detectors [13], [16], [18]. Figure 3 illustrates the skip-layer fusion in details. First,
In this work, we consider the scale changes caused by both the feature maps from encoder network and decoder net-
the pooling operation and upsampling operation, and build work are concatenated, followed by a 1×1 conv layer which
the DeepCrack on the SegNet’s encoder-decoder architecture. decreases the multi-channel feature maps to 1 channel. Then,
In SegNet, there exist five different scales, which correspond to in order to calculate pixel-wise prediction loss in each scale,
5 down-sampling pooling layers. In order to utilize both sparse a deconv layer is added to up-sample the feature map and
and continuous feature maps in each scale, the DeepCrack a cr op layer is used to crop the up-sampling result into
conducts a skip-layer fusion to connect the encoder network the size of the input image. After these operations, we can
and decoder network. As illustrated in Fig. 2, the convolutional get the prediction maps of each scale with the same size of
layer before the pooling layer at each scale in the encoder the ground-truth crack maps. The prediction maps generated
network is concatenated to the last convolutional layer at the in the five different scales are further concatenated, and

Authorized licensed use limited to: Hebei University of Technology. Downloaded on November 21,2023 at 11:16:41 UTC from IEEE Xplore. Restrictions apply.
1502 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 3, MARCH 2019

Fig. 3. An illustration of skip-layer fusion at scale K . In each scale, the last conv layer in the encoder network and the last conv layer in the decoder
network are concatenated, followed by a 1×1 conv layer with 1-channel output. Then, a deconv layer is used to up-sample the feature map. After cropped
into the size of the label map, the output is passed to a sigmoid cross-entropy layer to calculate the loss.

a 1×1 conv layer is added to fuse the outputs at all scales. can be formulated as
As last, we can obtain the prediction maps at each skip-layer 
I  K
(k) f use
fusion and the overall fused layer in the end. L(W ) = ( l(Fi ; W ) + l(Fi ; W )). (2)
i=1 k=1

B. Loss Function
C. Comparison With Other Architectures
Given a training data set containing N images as S =
(n) The proposed DeepCrack has two main differences with
{(X n , Y n ), n = 1, ..., N}, where X n = {x i , i = 1, ..., I }
denotes the raw input image, Y n = {yi(n) , i = 1, ..., I, the original SegNet [20]. First, the original SegNet has no
connection between the convolutional features in the encoder
yi(n) {0, 1}} denotes the ground-truth crack label map cor-
network and decoder network, which would cause sparse
responding to X n , I denotes the number of pixel in every
outputs. In DeepCrack, skip-layer fusion is applied to connect
image, our goal is to train the network to produce prediction
the encoder network and decoder network. Second, the original
maps approaching the ground truth. In the encoder-decoder
SegNet is designed for semantic segmentation, which sets up
architecture, let K be the number of convolution stages, then
a softmax loss layer to measure the prediction error in each
at the stage k, the feature map generated by the skip-layer
object channel. While in the DeepCrack network, the output
fusion can be formulated as F (k) = { f i(k) , i = 1, ..., I },
is a 1-channel prediction map that indicates the probability
where k = 1, ..., K . Further, the multi-scale fusion map can
f use of each pixel belonging to the crack by using a cross-entropy
be defined as F f use = { f i , i = 1, ..., I } . loss. DeepCrack is also quite different with U-Net [11]. U-Net
Different from semantic segmentation on Pascal VOC, there performs skip-layer fusion by copying convolution layers in an
are only two classes in crack detection, which can be seen early stage as a part of a corresponding later stage in the main
as a binary classification problem. We adopt a cross entropy network, which results in a sole loss. DeepCrack performs
loss to measure the prediction error. Generally, the ground- skip-layer fusion at each stage independently and assigns it a
truth crack pixels stand as a minority class in the crack image, loss, which leads to multiple losses, and to effective capturing
which makes it an imbalance classification or segmentation. information of thin objects at each scale. Compared with
Some works [12], [13] deal with this problem by adding larger DeepEdge [29], DeepContour [14] and N4 -Fields [30] which
weights to the minority class. However, in crack detection, perform convolution on image patches, DeepCrack performs
we find that larger weights adding to the cracks will result in convolution on the whole image and generates results in an
more false positives. Thus, we define the pixel-wise prediction end-to-end manner.
loss as We also compare the DeepCrack network with two end-

log(1 − P(Fi ; W )), if yi = 0, to-end deep edge detection architectures, i.e., HED [12] and
l(Fi ; W ) = (1) RCF [13]. Both HED and RCF have their main architectures
log(P(Fi ; W )), otherwise,
built on VGG16, which is similar to the encoder network in
where Fi is the output feature map of the network in pixel i , DeepCrack. Besides the lack of the pool5 layer, RCF changes
W is the set of standard parameters in the network layers, and the stride of pool4 layer to 1 and uses the atrous algorithm
P(F) is the standard sigmoid function, which transforms the to fill the holes. In the five convolution stages, HED connects
feature map into a crack probability map. Then, the total loss the last convolution layers in each scale to produce the fused

Authorized licensed use limited to: Hebei University of Technology. Downloaded on November 21,2023 at 11:16:41 UTC from IEEE Xplore. Restrictions apply.
ZOU et al.: DEEPCRACK: LEARNING HIERARCHICAL CONVOLUTIONAL FEATURES FOR CRACK DETECTION 1503

prediction map, while RCF connects all convolution layers in illumination. The line-array camera captures the pave-
each scale at first, and then fuses multi-scale feature maps. Dif- ment at a ground sampling distance of 1 millimeter.
ferent from HED and RCF which do not have a corresponding • CrackLS315 It contains 315 road pavement images
decoder network, the proposed DeepCrack pairwisely fuses captured under laser illumination. These images are also
convolutional features in the encoder network and decoder captured by a line-array camera, at the same ground
network at the same scale. Due to the absence of sparse sampling distance.
and non-linear up-sampling features in the decoder network, • Stone331 It contains 331 images of stone surface. When
feature maps generated by HED and RCF are often continuous cutting the stone, cracks may occur on the cutting surface.
and dense, which would lead to inaccurate localization and These images are captured by an area-array camera under
error prediction. We will illustrate this point in the experiment. visible-light illumination. We produce a mask for the area
of each stone surface in the image. Then the performance
IV. E XPERIMENTS AND R ESULTS evaluation can be constrained in the stone surface.
In this section, we first introduce the experimental settings, 3) Evaluation Metrics: For each image, Pr eci si on and
and then report crack detection results obtained by DeepCrack Recall can be calculated by comparing the detected cracks
and the comparison methods. At last, we investigate the against the human annotated ground truth. Then, the
Precision·Recall
performance of DeepCrack at different settings. F-measure ( 2· Precision+Recall ) can be computed as an overall
metric for performance evaluation. Specifically, three different
A. Experimental Settings F-measure-based metrics are employed in the evaluation: the
best F-measure on the data set for a fixed threshold (ODS),
1) Implementations Details: We implement our network
the aggregate F-measure on the data set for the best threshold
using the publicly available Caffe [48] which is well-known
on each image (OIS), and the average precision (AP), which
in this community. In our network, batch normalization is
is equivalent to the area under the precision-recall curve [24].
used after each convolutional layer in both the encoder and
Considering that cracks have a certain width, a detected crack
decoder network, which is convinced to speed the convergence
pixel is still taken as a true positive if it is no more than
in training process. The weights of conv layer in the entire
2 pixels away from human annotated crack curves.
network are initialized by the ‘msra’ method and the biases are
4) Comparison Methods: We compare the performance of
initialized to 0. The up-sampling operation in decoder network
DeepCrack with current state-of-the-art methods. In these
is achieved by using the pooling indices stored in max-pooling
methods, the CrackTree is a traditional low-level feature based
layer, and in the skip-layer fusion is conducted by bi-linear
method, and the other ones are deep learning based methods.
interpolation. In training, the initial global learning rate is set
• HED [12]. It fuses multi-scale convolutional features by
to 1e-5 and will be divided by 10 after every 10k iterations.
using the last convolutional feature map at each stage in
The momentum and weight decay are set to 0.9 and 0.0005,
VGG16. We train HED on CrackTree260.
respectively. The stochastic gradient descent method (SGD) is
• RCF [13]. It fuses multi-scale convolutional features
employed to update the network parameters with mini-batch
by using all convolutional feature maps at each stage in
size of 2 in each iteration. We train the network with 100k
VGG16. We train RCF on CrackTree260.
iterations in total. All experiments in this paper are performed
• SegNet [20]. It achieves an end-to-end learning and seg-
by using a single GeForce GTX TITAN-X GPU.
mentation by sequentially using an encoder network and
2) Datasets1 : Four crack datasets are used in this study,
a decoder network. We train SegNet on CrackTree260.
in which the pavement crack dataset CrackTree260 is used for
• SRN [34]. It is originally designed for end-to-end object
training the deep networks, and the other three ones are used
symmetry detection, which uses the similar feature fusion
for test. The images in the test datasets share the same size
strategy in HED. We train SRN on CrackTree260.
of 512×512. The ground-truth cracks are annotated by four
• U-Net [11]. It performs skip-layer fusion for end-to-end
persons using a specialized labeling tool.
• CrackTree260 It contains 260 road pavement images -
boundary segmentation and formulates the training target
an expansion of the dataset used in [45]. These pave- with one single loss. We train U-Net on CrackTree260.
• SE [24]. It learns edges and line structures using the
ment images are captured by an area-array camera under
visible-light illumination. We use all 260 images for train- random decision forests. We train SE on CrackTree260 by
ing. Data augmentation has been performed to enlarge using a number of 8 decision trees and default parameters
the size of the training set. We rotate the images with released by [24].
• CrackTree [45]. It is a method specifically designed for
9 different angles (from 0-90 degrees at an interval of 10),
flip the image in the vertical and horizontal direction at pavement crack detection. The edge-length threshold for
each angle, and crop 5 subimages (with 4 at the corners graph construction is 10, and the tree-pruning threshold
and 1 in the center) on each flipped image with a size is 50, for all test images.
• CrackForest [46]. It uses SE architecture to generate the
of 512×512. After augmentation, we get a training set
of 35,100 images in total. crack map, and post-processes the crack map to obtain
• CRKWH100 It contains 100 road pavement images the final crack.
• DeepCrack. DeepCrack is trained on CrackTree260.
captured by a line-array camera under visible-light
Note that, the results generated by RCF, HED, SRN and SE
1 https://sites.google.com/site/qinzoucn are thick crack maps, as shown in Fig. 4, which require to be

Authorized licensed use limited to: Hebei University of Technology. Downloaded on November 21,2023 at 11:16:41 UTC from IEEE Xplore. Restrictions apply.
1504 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 3, MARCH 2019

Fig. 4. Crack maps produced by different methods. Note that, DeepCrack, SegNet [20] and U-Net [11] produce thin crack maps, and RCF [13], SRN [34],
HED [12] and SE [24] produce thick crack maps.

Fig. 5. Precision-Recall curves on the three test datasets. (a) CRKWH100. (b) CrackLS315. (c) Stone331.

post-processed. As in these methods, we employ the standard the chart, and achieves the best precision and recall values,
non-maximum suppression (NMS) [24] to thin the soft crack as denoted by the best F-measure rectangle. The performances
maps, and take the post-processed results in the comparison. of RCF, SRN and HED are very close. SegNet shows the low-
For the results generated by DeepCrack, they are already thin est performance among these deep learning methods, which
crack maps and can be directly evaluated. indicates that the combination of convolutional features in
different scales is an effective way to improve the crack-
B. Overall Performance detection performance. Note that, the deep learning based
Figure 5 shows the precision-recall curves of nine methods methods achieve significant boost performance over the low-
on the three test datasets, where six methods are deep- level feature based methods - CrackTree, CrackForest and SE.
learning-based. A small rectangle has been plotted at the Table I shows the quantitative results of the comparison
position corresponding to the best F-measure for each curve. methods. The best result is achieved by DeepCrack, with an
As the CrackTree and CrackForest methods produce hard ODS F-measure value of 0.9095. Comparing to RCF, SRN,
crack curves, they are marked as point (denoted by a triangle) HED and U-Net, there are 4.74%, 4.93%, 6.92% and 6.36%
on the chart using the average precision and recall values. performance improvement on ODS, respectively. Although the
1) CRKWH100: It can be seen from Fig. 5(a) that, Deep- performance of SegNet is relatively lower than other deep
Crack holds a curve most close to the up-right corner in learning methods, it still achieves ODS value of 0.8184.

Authorized licensed use limited to: Hebei University of Technology. Downloaded on November 21,2023 at 11:16:41 UTC from IEEE Xplore. Restrictions apply.
ZOU et al.: DEEPCRACK: LEARNING HIERARCHICAL CONVOLUTIONAL FEATURES FOR CRACK DETECTION 1505

TABLE I
Q UANTITATIVE E VALUATION OF D IFFERENT M ETHODS ON THE T HREE T EST D ATASETS

Among the methods, none of the three low-level feature based crack maps, as illustrated by Fig. 4. One possible reason is that,
methods - CrackTree, CrackForest and SE achieves an ODS the bone networks of DeepCrack, SegNet and U-Net contain
value over 0.7, while the value of SE is 0.6888 and CrackTree an almost symmetrical decoder network corresponding to the
obtains the lowest ODS value 0.6269. encoder network. The decoder network explicitly up-samples
2) CrackLS315: Images in this dataset are captured under the feature maps stage by stage, resulting in an output with
laser illumination, which makes them more different with the same size of the input, which can help recover thin crack
the training images than that in CRKWH100. The precision- structures, as imposed by the ground truth. The bone networks
recall curves are shown in Fig. 5(b). DeepCrack achieves of RCF, HED and SRN do not contain a decoder network, thus
the best performance on CrackLS315. HED, SRN, RCF and are less capable of producing thin cracks.
SegNet both show commendable results, while RCF has better In Fig. 6, DeepCrack, SegNet and U-Net are observed to
performance than HED, SRN and SegNet. It can be observed be able to suppress more background artifacts than HED,
from Table I that, the ODS of DeepCrack reaches up to RCF, SRN, which indicates that the decoder network can also
0.8449 that outperforms all compared methods. RCF holds an improve the precision of crack prediction. As the DeepCrack
ODS value of 0.7878, which ranks the second. The ODS of fuses low-level and high-level features in the convolution
HED, SRN, SegNet and U-Net, are 8.16%, 9.00%, 8.39% and stages of different scales, it can further improve the precision
17.31% lower than the results of DeepCrack, respectively. And of crack extraction and robustness of background-artifacts
compared with CrackTree, DeepCrack obtains an improvement suppression. U-net also adopts skip-layers, but it applies one
of 20.20% in terms of ODS. The performance of SE suffers a sole loss working on the final prediction, which makes it hard
surprising decline on this dataset, which only holds an ODS to converge and easily to product incomplete prediction.
value of 0.4586. In summary, better results can be achieved by the Deep-
3) Stone331: It can be seen from Fig. 5(c), DeepCrack Crack, which fuses the multi-scale convolutional features in
outperforms other comparison methods, which has an ODS both the encoder and decoder networks. More results of the
value of 0.8559. Surprisingly, the second rank is achieved proposed method have been shown in Fig. 11.
by SegNet with an ODS value of 0.7938, which achieves
a weak improvement of 0.52% than RCF. The other deep C. Constructing DeepCrack With Different Scales
learning methods, such as HED, SRN, and U-Net, obtain better In previous work, the experimental results demonstrate
performance than traditional low-level feature based methods. that DeepCrack has notable advantages over other compared
We also make visual comparisons on the results. In Fig. 6, methods in crack detection. In this part, we study the effect
crack-detection results of six typical input images are given of fusing the multi-scale convolutional features. Specifically,
for the proposed method and the comparison methods. we want to know how important each scale is in the multi-
In the first two columns, the input images selected from scale fusion architecture. So at each time, we remove one scale
CRKWH100 contain shadows and obvious noise. DeepCrack connection of all five scales, and re-train the modified model
can still generate a crack map very close to the ground with the same parameter setting. We repeat this modification
truth. In the middle two columns, two images are selected five times and get five ‘incomplet’ multi-scale DeepCrack
from CrackLS315, one contains tiny cracks and the other models. Finally, we test these models on the above three test
contains cracks embedded in the road lane, which can hardly datasets.
be observed without a careful inspection. It can be seen It can be seen from Fig. 7, removing a skip-layer connection
that, all these methods can detect the tiny crack. However, of any scale will result in a decreased performance. It indicates
except for DeepCrack, the other methods produce many false that each scale makes a contribution to improve the final
detections. For the stone surface images in the last two results. Meanwhile, the connection in scale one is observed to
columns, DeepCrack obtains crack-detection results close to have significant contribution to the final result. It is because
the ground truth, while the comparison methods suffer from that, the scale one has the same resolution with the input image
many false positives. and holds most of the crack details.
Among the deep models, RCF, HED and SRN produce thick For further exploration, we set different weights to different
crack maps, while DeepCrack, SegNet and U-net produce thin scales to test how it influences the performance of DeepCrack.

Authorized licensed use limited to: Hebei University of Technology. Downloaded on November 21,2023 at 11:16:41 UTC from IEEE Xplore. Restrictions apply.
1506 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 3, MARCH 2019

Fig. 6. Comparison of results obtained by different methods on six sample images (from left to right) selected from CRKWH100, CrackLS315 and Stone331,
respectively (with two images from each dataset). Note that, the results of HED, SRN, RCF and SE have been post-processed by NMS. The ground-truth
cracks have been highlighted in blue.

Authorized licensed use limited to: Hebei University of Technology. Downloaded on November 21,2023 at 11:16:41 UTC from IEEE Xplore. Restrictions apply.
ZOU et al.: DEEPCRACK: LEARNING HIERARCHICAL CONVOLUTIONAL FEATURES FOR CRACK DETECTION 1507

Fig. 7. Comparison of DeepCrack with its modified versions by removing the information from a convolution scale.

TABLE II
P ERFORMANCE OF D EEP C RACK BY S ETTING D IFFERENT W EIGHTS TO THE L OSS AT D IFFERENT S CALES

For a clear presentation, we rewrite the loss function Eq. (2)


as:

I  K
(k)
α (k) · l(Fi ; W ) + l(Fi
f use
L(W ) = ( ; W )), (3)
i=1 k=1

where α (k) denotes the weight placed on the scale k (1≤ k ≤5)
. While it is very difficult to find an optimal parameter
setting, we choose several representative parameter settings
to explore the influence of different weights on the scales.
The parameter settings of α (k) and the corresponding results
are listed in Table II. Four equal-ratio series are used, with
the ratio of 1/3, 1/2, 1, and 2, respectively. In the first two
cases, larger weights are set to smaller scales, while in the
last case, larger weights are set to larger scales. In Table II we
can see that, the ratio of 1/3 produces lower results than the
ratio of 1/2, and the ratio of 1/2 produces lower results than Fig. 8. DeepCrack results with and without pre-trained model. Note that,
the ratio of 1. It indicates that, the information fused at larger the ‘(ft)’ denotes the version with fine tune.
scales do make some contributions to the final results. When
giving larger weights to smaller scales, a ratio of 2 brings the crack images and the natural images, especially the nature
no performance improvement, and on the contrary leads to a images of ImageNet and Passcal. The natural images are
subtle lower performance than the standard version, i.e., a ratio often colorful and contain visually recognizable object(s),
of 1. It simply indicates that the scale one holds a dominant while the crack images are always grayscale and often contain
influence on predicting the cracks and setting larger weights heavy impulse noises. We compare the results of DeepCrack
on other scales will do no good to improve the performance. trained from scratch and fine-tuned on pre-trained SegNet
model on PASCAL VOC2012. The results have been plotted
D. Training DeepCrack With and Without
in Fig. 8. It shows that the model trained from scratch obtains
Pre-Trained Model
better performance than that trained from pre-trained model,
In this part, we study with experiments to find whether it is on all the three test datasets. It may be because that the pre-
better to train DeepCrack from the pre-trained model or from trained model is well fit for nature image segmentation and is
scratch. As a matter of fact, great difference exists between impossible or very difficult to be fine-tuned for crack detection.

Authorized licensed use limited to: Hebei University of Technology. Downloaded on November 21,2023 at 11:16:41 UTC from IEEE Xplore. Restrictions apply.
1508 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 3, MARCH 2019

TABLE III
Q UANTITATIVE E VALUATION OF D EEP C RACK BY U SING D IFFERENT S ETTINGS

Fig. 10. Detection of bright cracks using DeepCrack.


Fig. 9. Different loss weights on the crack and the background.
interpolation results in a declined performance as compared to
E. Influence of Uncorrect Labels and Upsampling Strategies the case with max-pooling indices (in the last row of Table III).
We also conduct several experiments to explore the sensitiv- It indicates that the max-pooling indices are helpful to locate
ity of DeepCrack to noisy ground-truth crack labeling. First, the crack pixels in the upsampling procedure.
for each image, we randomly reduce 20% of the ground-truth
crack pixels, and add 20% of noisy ground-truth crack pixels, F. Different Weights on the Crack and Non-Crack
and name the retrained models as ‘DeepCrack-reduce (20%)’ Background
and ‘DeepCrack-noise (20%)’, respectively. Second, for each In section III-B, we formulate the loss function by adding
image, we random shift the crack labeling left, right, up and the same weight to the crack label and the background,
down, with 4 and 6 pixels, and name the retrain models as although the distribution of them are imbalanced. We will
‘DeepCrack-bias (4 pixels)’ and ‘DeepCrack-bias (6 pixels)’, show the advantage of this setting with experiments. Specifi-
respectively. The test results are presented in the Table III. cally, we re-define the weighted loss function as Eq. (4),
From Table III we can see, on CRKWH100 and ⎧ 2α
CrackLS315, reducing 20% of the ground-truth pixels or ⎪
⎨ · log(1 − P(Fi ; W )), if yi = 0,
adding 20% noisy crack labels has very little influence on l(Fi ; W ) = α 2β+β (4)
DeepCrack’s performance. On Stone331, adding noisy crack ⎪
⎩ · log(P(Fi ; W )), otherwise,
labels will bring little affect, but reducing ground-truth crack α+β
labels leads to a declined performance. The results show that where α and β are different weights adding to the background
DeepCrack is generally not sensitive to noisy crack labels, and the cracks, respectively. We set the label of pixel belonging
and is less sensitive on CRKWH100 and CrackLS315 than on to background as yi = 0 and crack as yi = 1. For a convenient
Stone331. The reason may be that, the DeepCrack is trained comparison, we set the weight of background α to be 1 and
on pavement images, therefore it is more robust in handling set different values of β with {1, 10, 50, 100}. Notice that,
pavement images than in handling stone images. when β = 1, the weighted loss function is equivalent to the
From Table III we can also see, shifting the crack labels loss function defined by Eq. (1), which is called the ‘standard
with 4 pixels leads to largely decreased performance on all weight’. We also make comparison with the balance weight
three datasets, and shifting with 6 pixels leads to even worse setting used in [12], where α is the number of ground-truth
results. It indicates that the proposed method is sensitive to crack pixels and β is the number of background pixels. We call
the spatial bias of ground truth. it the ‘balance weight’.
To explore the influence of max-pooling indices used in It can be seen from Fig. 9, DeepCrack equipped with
the upsampling operation, we replace the max-pooling indices small weight (β <1) will have decreased performance, which
with bilinear interpolation for upsampling the feature maps. indicates that it is not good to allocate more weight to
We retrain the model and predict cracks on the three test non-crack background. When larger weights are set to the
datasets. As shown in Table III, the upsampling with bilinear crack, lower ODSs can be observed on CRKWH100 and

Authorized licensed use limited to: Hebei University of Technology. Downloaded on November 21,2023 at 11:16:41 UTC from IEEE Xplore. Restrictions apply.
ZOU et al.: DEEPCRACK: LEARNING HIERARCHICAL CONVOLUTIONAL FEATURES FOR CRACK DETECTION 1509

Fig. 11. More results of DeepCrack on the three test datasets. Full resolution results can be accessed at our website.

CrackLS315. Compared with the balance weight, the standard


weight generally obtains higher ODS. The reason is that, when
a larger weight is given to the crack, false negative prediction
will receive heavier punishment. As a result, more pixels will
be predicted as crack, while not bringing much impact on
the whole loss. Thus, when placing larger weights to the
crack, the overall performance will not be improved but get
undermined.
On the Stone331, irregular variations of ODS can be
observed, as compared with that on CRKWH100 and
CrackLS315. It may be because the DeepCrack model is
trained on pavement images, the rule got on pavement images
is not strictly consistent with that on stone images. Such
results indicate that adding different weights to crack and non-
crack background will not guarantee a stable improvement on
DeepCrack’s performance.
Fig. 12. Sample edge-detection results on BSDS500.
G. Detection of Bright Cracks
CRKWH100 dataset that contain bright cracks, and perform
In the experiments we find that, the DeepCrack model
crack detection using the newly trained model. The results
trained on CrackTree260 cannot detect bright cracks. The
are shown in Fig. 10. It can be seen that, DeepCrack can
reason we guess is that there are very few bright cracks
well handle the bright cracks.
in the training dataset. To justify this point, we inverse
the brightness of the training images, such that cracks in
them will have higher intensity than the background and H. Running Efficiency
display as bright cracks. We retrain DeepCrack with the The proposed DeepCrack, as well as HED, SRN, RCF
new training dataset. We select four original images from and SegNet, does not have fully connected layers, which

Authorized licensed use limited to: Hebei University of Technology. Downloaded on November 21,2023 at 11:16:41 UTC from IEEE Xplore. Restrictions apply.
1510 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 3, MARCH 2019

leads to largely reduced weight parameters. In the test, these


networks do not have to suffer the heavy computation load of
gradient calculation as in the training. Thus, DeepCrack and
the four others can efficiently predict crack maps. It can be
seen from Table I (last column), DeepCrack handles images
of 512×512 at a speed of 6 FPS, exactly 0.153 second per
image. SegNet is a little faster than DeepCrack, which needs
0.141 second per images. With less network layers, HED, RCF
and SRN can achieve even faster speeds at about 25 FPS,
20 FPS and 17 FPS. For the traditional methods, SE and
CrackForest can process about 5 images and 4 images in
one second, respectively and CrackTree needs 2 second to
process one image in average. Note that, the running time for
HED, RCF, SRN and DeepCrack is based on a GeForce GTX
TITAN-X GPU, and the running time for SE, CrackTree and
CrackForest is based on a 2.3GHz E5-2630 CPU.

V. C ONCLUSION
Fig. 13. Edge-detection performance on BSDS500.
In this work, a novel end-to-end trainable convolutional
network - DeepCrack - was proposed for crack detection.
In DeepCrack, convolutional features at each scale were pair-
wisely fused, and the fused feature maps at all scales were
further fused into a multi-scale feature-fusion map for crack
detection. For performance evaluation, four crack datasets
were constructed. Under the same evaluation protocol, one
dataset was used for training, and the other three datasets were
used for test. Experimental results showed that, the proposed
DeepCrack achieved over 0.87 ODS F-measure value on
the test datasets in average, and outperformed the competing
methods that do not have a decoder network. It indicates that
the convolutional features in the encoder and decoder networks
are both useful for crack detection. Experimental results also
showed that the DeepCrack was not sensitive to noisy crack Fig. 14. Results on DRIVE dataset. Row 1: retinal vessel images. Row 2:
labeling and could well handle bright cracks. ground truth labeled by human expert. Row 3: results produced by DeepCrack.

A PPENDIX B. Vessel Detection


D EEP C RACK ’ S P ERFORMANCE ON OTHER TASKS Vessel detection/segmentation is an important task in med-
ical image processing. We run the proposed DeepCrack on the
We also examine the capability of DeepCrack on two other
DRIVE dataset [49] for retinal vessel detection. The DRIVE
line-detection tasks. One is for edge detection, and the other
contains 20 images for training and 20 images for test. Since
is for vessel detection.
the number of training samples is too small, we random select
15 images from the test set and add them to the training set.
A. Edge Detection Then, the remaining 5 images are used for test, as shown in
On BSDS500, we augmented the 300 training images to the top row of Fig. 14. We conduct data augmentation to
train DeepCrack, and used the other 200 images for test. the 35 training images, where one image is augmented into
In Fig. 13, we can see DeepCrack got an ODS of 0.778, 54 images, and a number of 1,890 images are used to train the
which is slightly higher than DeepEdge and DeepContour, but DeepCrack. The results are displayed in Fig. 14. It is surprising
lower than RCF (with NMS) and HED (with NMS). However, that the DeepCrack model trained on such a small-scale dataset
DeepCrack obtains better results than RCF and HED on some presents very good performance on detecting the main blood
images, for example the ones shown in Fig. 12. From Fig. 12 vessels structures. Some small vessel branches are found to
we can see, DeepCrack produces clean edge maps while the be missed. We think this would be solved by giving enough
HED and RCF produce thick ones. And the DeepCrack is training data to DeepCrack.
found to be talent in detecting thin edges, and would omit
fine structures, which leads to relatively lower recall in edge ACKNOWLEDGEMENT
detection. However, this characteristic makes DeepCrack more The authors would like to thank Yuanhao Yue and Qin Sun
suitable for crack detection from noise and grain-like texture from Wuhan University for their help in labeling the crack
background. ground-truth and plotting some of the figures.

Authorized licensed use limited to: Hebei University of Technology. Downloaded on November 21,2023 at 11:16:41 UTC from IEEE Xplore. Restrictions apply.
ZOU et al.: DEEPCRACK: LEARNING HIERARCHICAL CONVOLUTIONAL FEATURES FOR CRACK DETECTION 1511

R EFERENCES [26] P. Martin, P. Refregier, F. Goudail, and F. Guerault, “Influence of


the noise model on level set active contour segmentation,” IEEE
[1] C. Koch, K. Georgieva, V. Kasireddy, B. Akinci, and P. Fieguth, Trans. Pattern Anal. Mach. Intell., vol. 26, no. 6, pp. 799–803,
“A review on computer vision based defect detection and condition Jun. 2004.
assessment of concrete and asphalt civil infrastructure,” Adv. Eng. [27] C. Li, C. Xu, C. Gui, and M. D. Fox, “Level set evolution without
Inform., vol. 29, no. 2, pp. 196–210, 2015. re-initialization: A new variational formulation,” in Proc. IEEE Conf.
[2] L. Zhang, F. Yang, Y. D. Zhang, and Y. J. Zhu, “Road crack detection Comput. Vis. Pattern Recognit., vol. 1, Jun. 2005, pp. 430–436.
using deep convolutional neural network,” in Proc. IEEE Int. Conf. [28] Q. Zhu, G. Song, and J. Shi, “Untangling cycles for contour grouping,”
Image Process., Sep. 2016, pp. 3708–3712. in Proc. IEEE Int. Conf. Comput. Vis., Oct. 2007, pp. 1–8.
[3] S. J. Schmugge, L. Rice, J. Lindberg, R. Grizziy, C. Joffey, and [29] G. Bertasius, J. Shi, and L. Torresani, “DeepEdge: A multi-scale
M. C. Shin, “Crack segmentation by leveraging multiple frames of bifurcated deep network for top-down contour detection,” in Proc. IEEE
varying illumination,” in Proc. IEEE Winter Conf. Appl. Comput. Conf. Comput. Vis. Pattern Recognit., Jun. 2015, pp. 4380–4389.
Vis. (WACV), Mar. 2017, pp. 1045–1053. [30] Y. Ganin and V. S. Lempitsky, “N4 -fields: Neural network nearest
[4] Z. Qu, L. Bai, S.-Q. An, F.-R. Ju, and L. Liu, “Lining seam elimina- neighbor fields for image transforms,” in Proc. Asian Conf. Comput.
tion algorithm and surface crack detection in concrete tunnel lining,” Vis., 2014, pp. 536–551.
J. Electron. Imag., vol. 25, no. 6, p. 063004, 2016. [31] V. Bismuth, R. Vaillant, H. Talbot, and L. Najman, “Curvilinear structure
[5] J. Geusebroek, A. W. M. Smeulders, and H. Geerts, “A minimum cost enhancement with the polygonal path image—Application to guide-wire
approach for segmenting networks of lines,” Int. J. Comput. Vis., vol. 43, segmentation in X-ray fluoroscopy,” in Proc. Int. Conf. Med. Image
no. 2, pp. 99–111, 2001. Comput. Comput. Assist. Intervent. (MICCAI), 2012, pp. 9–16.
[6] A. Sironi E. Türetken, V. Lepetit, and P. Fua, “Multiscale centerline [32] A. Sironi, V. Lepetit, and P. Fua, “Multiscale centerline detection by
detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 7, learning a scale-space distance transform,” in Proc. IEEE Conf. Comput.
pp. 1327–1341, Jul. 2016. Vis. Pattern Recognit., Jun. 2014, pp. 2697–2704.
[7] Z. Zhang, F. Xing, X. Shi, and L. Yang, “Semicontour: A semi- [33] L.-C. Chen, J. T. Barron, G. Papandreou, K. Murphy, and A. L. Yuille,
supervised learning approach for contour detection,” in Proc. IEEE Conf. “Semantic image segmentation with task-specific edge detection using
Comput. Vis. Pattern Recognit., Jun. 2016, pp. 251–259. CNNs and a discriminatively trained domain transform,” in Proc. IEEE
[8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification Conf. Comput. Vis. Pattern Recognit., Jun. 2016, pp. 4545–4554.
with deep convolutional neural networks,” in Proc. Adv. Neural Inf. [34] W. Ke, J. Chen, J. Jiao, G. Zhao, and Q. Ye, “SRN: Side-output residual
Process. Syst., 2012, pp. 1097–1105. network for object symmetry detection in the wild,” in Proc. IEEE Conf.
[9] R. Girshick, “Fast R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis., Comput. Vis. Pattern Recognit., Jul. 2017, pp. 302–310.
Dec. 2015, pp. 1440–1448. [35] Q. Li, Q. Zou, D. Zhang, and Q. Mao, “FoSA: F∗ seed-growing approach
[10] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for crack-line detection from pavement images,” Image Vis. Comput.,
for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern vol. 29, no. 12, pp. 861–872, 2011.
Recognit., Jun. 2015, pp. 3431–3440. [36] M. Kamaliardakani, L. Sun, and M. K. Ardakani, “Sealed-crack detec-
[11] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks tion algorithm using heuristic thresholding approach,” J. Comput. Civil
for biomedical image segmentation,” in Proc. Int. Conf. Med. Image Eng., vol. 30, no. 1, p. 04014110, 2014.
Comput. Comput.-Assist. Intervent., 2015, pp. 234–241. [37] P. Subirats, J. Dumoulin, V. Legeay, and D. Barba, “Automa-
[12] S. Xie and Z. Tu, “Holistically-nested edge detection,” in Proc. IEEE tion of pavement surface crack detection using the continuous
Int. Conf. Comput. Vis., Dec. 2015, pp. 1395–1403. wavelet transform,” in Proc. Int. Conf. Image Process., Oct. 2006,
[13] Y. Liu, M.-M. Cheng, X. Hu, K. Wang, and X. Bai, “Richer convolu- pp. 3037–3040.
tional features for edge detection,” in Proc. IEEE Conf. Comput. Vis. [38] G. Zhao, T. Wang, and J. Ye, “Anisotropic clustering on surfaces
Pattern Recognit., Jul. 2017, pp. 5872–5881. for crack extraction,” Mach. Vis. Appl., vol. 26, no. 5, pp. 675–688,
[14] W. Shen, X. Wang, Y. Wang, X. Bai, and Z. Zhang, “Deepcontour: 2015.
A deep convolutional feature learned by positive-sharing loss for con- [39] M. Salman, S. Mathavan, K. Kamal, and M. Rahman, “Pavement crack
tour detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., detection using the Gabor filter,” in Proc. IEEE Conf. Intell. Transp.
Jul. 2015, pp. 3982–3991. Syst., Oct. 2013, pp. 2039–2044.
[15] J. Yang, B. Price, S. Cohen, H. Lee, and M.-H. Yang, “Object contour [40] H. Oliveira and P. L. Correia, “Automatic road crack detection and
detection with a fully convolutional encoder-decoder network,” in Proc. characterization,” IEEE Trans. Intell. Transp. Syst., vol. 14, no. 1,
IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2016, pp. 193–202. pp. 155–168, Mar. 2013.
[16] K.-K. Maninis, J. Pont-Tuset, P. Arbeláez, and L. Van Gool, “Convo- [41] R. Amhaz, S. Chambon, J. Idier, and V. Baltazart, “Automatic crack
lutional oriented boundaries,” in Proc. Eur. Conf. Comput. Vis., 2016, detection on two-dimensional pavement images: An algorithm based
pp. 580–596. on minimal path selection,” IEEE Trans. Intell. Transp. Syst., vol. 17,
[17] A. Khoreva, R. Benenson, M. Omran, M. Hein, and B. Schiele, “Weakly no. 10, pp. 2718–2729, Oct. 2016.
supervised object boundaries,” in Proc. IEEE Conf. Comput. Vis. Pattern [42] Q. Zou, Q. Li, F. Zhang, Z. Xiong, and Q. Wang, “Path voting based
Recognit., Jun. 2016, pp. 183–192. pavement crack detection from laser range images,” in Proc. Int. Conf.
[18] B. Yang, J. Yan, Z. Lei, and S. Z. Li, “Convolutional channel features,” Digit. Signal Process., Oct. 2016, pp. 432–436.
in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2015, pp. 82–90. [43] V. Kaul, A. Yezzi, and Y. C. Tsai, “Detecting curves with unknown
[19] A. Khoreva, R. Benenson, F. Galasso, M. Hein, and B. Schiele, endpoints and arbitrary topology using minimal paths,” IEEE Trans.
“Improved image boundaries for better video segmentation,” in Proc. Pattern Anal. Mach. Intell., vol. 34, no. 10, pp. 1952–1965,
Eur. Conf. Comput. Vis. Workshops, Nov. 2016, pp. 773–788. Oct. 2012.
[20] V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A deep [44] W. Xu, Z. Tang, J. Zhou, and J. Ding, “Pavement crack detection based
convolutional encoder-decoder architecture for image segmentation,” on saliency and statistical features,” in Proc. IEEE Int. Conf. Image
IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 12, pp. 2481–2495, Process., Sep. 2013, pp. 4093–4097.
Dec. 2017. [45] Q. Zou, Y. Cao, Q. Li, Q. Mao, and S. Wang, “CrackTree: Automatic
[21] J. Canny, “A computational approach to edge detection,” IEEE Trans. crack detection from pavement images,” Pattern Recognit. Lett., vol. 33,
Pattern Anal. Mach. Intell., vol. PAMI-8, no. 6, pp. 679–698, Nov. 1986. no. 3, pp. 227–238, 2012.
[22] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik, “Contour detection [46] Y. Shi, L. Cui, Z. Qi, F. Meng, and Z. Chen, “Automatic road crack
and hierarchical image segmentation,” IEEE Trans. Pattern Anal. Mach. detection using random structured forests,” IEEE Trans. Intell. Transp.
Intell., vol. 33, no. 5, pp. 898–916, May 2011. Syst., vol. 17, no. 12, pp. 3434–3445, Dec. 2016.
[23] J. J. Lim, C. L. Zitnick, and P. Dollár, “Sketch tokens: A learned mid- [47] K. Simonyan and A. Zisserman. (2014). “Very deep convolutional
level representation for contour and object detection,” in Proc. IEEE networks for large-scale image recognition.” [Online]. Available:
Conf. Comput. Vis. Pattern Recognit., Jun. 2013, pp. 3158–3165. https://arxiv.org/abs/1409.1556
[24] P. Dollár and C. L. Zitnick, “Fast edge detection using structured [48] Y. Jia et al., “Caffe: Convolutional architecture for fast feature embed-
forests,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 8, ding,” in Proc. ACM Int. Conf. Multimedia, Nov. 2014, pp. 675–678.
pp. 1558–1570, Aug. 2015. [49] J. Staal, M. D. Abramoff, M. Niemeijer, M. A. Viergever, and
[25] S. Wang, T. Kubota, J. M. Siskind, and J. Wang, “Salient closed B. Van Ginneken, “Ridge-based vessel segmentation in color images
boundary extraction with ratio contour,” IEEE Trans. Pattern Anal. of the retina,” IEEE Trans. Med. Imag., vol. 23, no. 4, pp. 501–509,
Mach. Intell., vol. 27, no. 4, pp. 546–561, Apr. 2005. Apr. 2004.

Authorized licensed use limited to: Hebei University of Technology. Downloaded on November 21,2023 at 11:16:41 UTC from IEEE Xplore. Restrictions apply.
1512 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 3, MARCH 2019

Qin Zou (M’13) received the B.E. degree in infor- Xianbiao Qi received the B.E. degree in information
mation engineering and the Ph.D. degree in pho- engineering and the Ph.D. degree in information and
togrammetry and remote sensing (computer vision) signal processing from the Beijing University of
from Wuhan University, China, in 2004 and 2012, Posts and Telecommunications, in 2008 and 2015,
respectively. From 2010 to 2011, he was a Visiting respectively. He was an Intern with the Web Search
Ph.D. Student with the Computer Vision Lab, Uni- and Mining Group, Microsoft Research Asia, from
versity of South Carolina, USA. He is currently an 2011 to 2012. He was also a Researcher with the
Associate Professor with the School of Computer University of Oulu, Finland, from 2014 to 2016. He
Science, Wuhan University. His research activities held a post-doctoral position with the Department of
involve computer vision, pattern recognition, and Computing, The Hong Kong Polytechnic University,
machine learning. He is a member of the ACM. He from 2016 to 2018. He is currently a Research
was a co-recipient of the National Technology Invention Award of China 2015. Scientist with the Shenzhen Research Institute of Big Data. His current
research interests include face analysis, object detection, and scene text
detection.

Zheng Zhang received the B.S. degree in computer Qian Wang received the Ph.D. degree from the
science from Wuhan University, China, in 2015, Illinois Institute of Technology, USA. He is cur-
where he is currently pursuing the master’s degree rently a Professor with the School of Computer
with the School of Computer Science. He received Science, Wuhan University. His research interests
the first prize from the China Undergraduate Contest include search and computation outsourcing secu-
in Internet of Things in 2015. His research interest rity, wireless systems security, big data security and
includes deep learning and its applications in image privacy, and applied cryptography. He is an Expert
classification and retrieval. of the national 1000 Young Talents Program of
China. He received the National Science Fund for
Excellent Young Scholars of China. He was a recip-
ient of the 2016 IEEE Asia-Pacific Outstanding
Young Researcher Award. He serves as an Associate Editor for the IEEE
T RANSACTIONS ON D EPENDABLE AND S ECURE C OMPUTING and the IEEE
T RANSACTIONS ON I NFORMATION F ORENSICS AND S ECURITY.
Qingquan Li received the Ph.D. degree in geo-
graphic information science and photogrammetry Song Wang (M’02–SM’13) received the Ph.D.
from the Wuhan Technical University of Surveying degree in electrical and computer engineering from
and Mapping, China. From 1988 to 1996, he was an the University of Illinois at Urbana–Champaign
Assistant Professor with Wuhan University, where (UIUC) in 2002. From 1998 to 2002, he was
he became an Associate Professor. Since 1998, he a Research Assistant with the Image Formation
has been a Professor with Wuhan University. He is and Processing Group, Beckman Institute, UIUC.
currently the President and a Professor with Shen- In 2002, he joined the Department of Computer
zhen University, China. He is a Professor with the Science and Engineering, University of South Car-
State Key Laboratory of Information Engineering in olina, where he is currently a Professor. His research
Surveying, Mapping and Remote Sensing, Wuhan interests include computer vision, medical image
University. He is also the Director of the Shenzhen Key Laboratory of Spatial processing, and machine learning. He is a Senior
Smart Sensing and Service, Shenzhen University. He is an Academician of Member of the IEEE Computer Society. He is currently serving as the
the International Academy of Sciences for Europe and Asia. His research Publicity/Web portal Chair for the Technical Committee of Pattern Analysis
areas include precision engineering survey, pattern recognition, and intelligent and Machine Intelligence, IEEE Computer Society. He is currently serving as
transportation systems. an Associate Editor for PATTERN R ECOGNITION L ETTERS .

Authorized licensed use limited to: Hebei University of Technology. Downloaded on November 21,2023 at 11:16:41 UTC from IEEE Xplore. Restrictions apply.

You might also like