Pavement Crack Detection Using Partially Accurate Ground Truth Based On GANs
Pavement Crack Detection Using Partially Accurate Ground Truth Based On GANs
Abstract— Fully convolutional network is a powerful tool for It is known that deep learning is a data driven approach
per-pixel semantic segmentation/detection. However, it is prob- which heavily relies on the training data with accurate GTs.
lematic when coping with crack detection using partially accurate Due to the domain sensitivity (i.e., the performance of a “well-
ground truths (GTs): the network may easily converge to the sta-
tus that treats all the pixels as background (BG) and still achieves trained” network may decrease when utilizing the datasets
a very good loss, named “All Black” phenomenon, due to the obtained from different road sections and/or during differ-
unavailability of accurate GTs and the data imbalance. To tackle ent periods), it is necessary to manually mark the GTs to
this problem, we propose crack-patch-only (CPO) supervised re-train the models for new pavement crack detection tasks.
generative adversarial learning for end-to-end training, which In industry, the pavement images are captured using a camera
forces the network to always produce crack-GT images while
reserves both crack and BG-image translation abilities by feeding mounted on top of a vehicle running on the road. Under
a larger-size crack image into an asymmetric U-shape generator such setting, most cracks are very thin and crack boundaries
to overcome the “All Black” issue. The proposed approach is are vague, which makes the annotation of pixel-level GTs
validated using four crack datasets; and achieves state-of-the-art very difficult. Instead of the labor-intensive per-pixel crack
performance comparing with that of the recently published works annotation, marking the cracks as 1-pixel curves is more
in efficiency and accuracy.
feasible and preferable in practice because of its simplicity
Index Terms— Pavement crack detection, fully convolutional and low labor-cost, and such GT is named labor-light GT.
networks, generative adversarial learning, partially accurate However, such GTs may not completely match the cracks at
ground truths.
pixel-level accurately; i.e., they are partially accurate GTs, and
that makes the loss computation inaccurate. Moreover, as a
I. I NTRODUCTION long-narrow target, a crack can only occupies a very small
area in a full image. Since patch-wise training is equivalent to
Authorized licensed use limited to: University of Melbourne. Downloaded on May 16,2020 at 10:32:14 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
an intensity-difference measuring function to find an optimal to select good candidate regions from the noisy pavement
threshold for crack segmentation; however, the robustness was images, and it was also inefficient because a great number
poor and the method was easy to fail when working on of candidate regions had to be processed for a full-size
different datasets. Many works introduced some crack linking image. Zhang et al. [30] employed parallel processing to
method to enhance the crack continuity [12]–[17]. However, improve the computation efficiency of region-based methods;
these methods did not solve the problem well and usually however, the computational resource costs were expensive.
produced intolerable false positives for linking together the Zhang et al. [5] addressed the issue by generalizing a classifi-
noises. In addition, Tsai et al. [18] performed a comprehensive cation network to an end-to-end detection network with FCN.
study on the performances of six low-level image segmentation FCN is a one-stage pixel-level semantic segmentation method
algorithms, and Abdel-Qader et al. [19] discussed different without window-sliding. Recently, Yang et al. [7] employed
edge detectors, including Sobel, Canny, and fast Haar transfor- FCN for pixel-level crack detection and achieved good results
mation [20]. The rule-based approaches are easy to implement; on concrete-wall images and pavement images with clear
however, they are sensitive to noise, which results in poor cracks; however, it failed to detect thin cracks. Moreover,
generalizability. the method relied on accurate pixel-level GTs which were
Machine learning-based methods have attracted increasing labor-intensive and often infeasible under industrial setting.
attentions during the past two decades. These methods per- In addition, deep learning-based crack detection articles have
form crack detection following two steps: feature extraction been keeping on appearing. Chen and Jahanshahi [31] and
and pattern classification. Cheng et al. [21] and Oliveira and Park et al. [32] proposed NB-CNN and patch-CNN for crack
Correia [22] utilized mean and variance of an image block as detection, respectively. Tong et al. [33] utilized deep convo-
the features to train classifiers for pavement crack detection. lutional neural network (DCNN) for crack length estimation.
However, the good performances heavily relied on complex Hoang et al. [34] employed CNN and edge detector for crack
post processing. Hu et al. [23] and Gavilan et al. [24] utilized recognition. Gopalakrishnan et al. [35] performed pavement
textural information to set up the feature vectors and employed distress detection with a DCNN. Zou et al. [36] introduced
support vector machine (SVM) for the classification. However, DCNN for crack detection with hierarchical feature learning.
they could not handle the problem well when processing the The methods were either based on the traditional classification
images with complex pavement textures. Zalama et al. [25] network with fully connected layers which only could handle
employed Gabor filters for feature extraction and AdaBoost- fixed input-size images, or based on the FCN architecture
ing [25] for crack identification; and Shi et al. [3] combined which relied on the accurate, labor-intensive GTs.
multi-channel information to set up the feature vector, and In this paper, we propose CrackGAN for pavement crack
employed random structure forest [26] for crack-token map- detection with the following contributions: (1) it solves a
ping. These methods tried to solve the problem by extract- practical and essential problem,“All Black” issue, existing in
ing some hand-crafted features and training a classifier to deep learning-based pixel-level crack detection methods; (2) it
discriminate cracks from the noisy BG; however, they did proposes the crack-patch-only (CPO) supervised adversarial
not address the issue well because the hand-crafted feature learning and the asymmetric U-Net architecture to perform
descriptors usually calculated statistics locally and lacked good the end-to-end training; (3) the network can be trained with
global view, even the statistics from different locations were partially accurate GTs generated by labor-light method which
combined together. Thus, they could not represent the global can reduce the workload of preparing GTs significantly;
structural pattern well which was important to discriminate (4) furthermore, it can solve data imbalance problem which
cracks from the noisy textures. is the byproduct of the proposed approach. Moreover, even
As one of the most important branches in machine learning, the network is trained with small image patches and partially
deep learning has achieved great success during the past ten accurate GTs, it can deal with full-size images and achieve
years, and it is the most promising way to solve challenging great performance.
object detection problems, including pavement crack detection. The rest of the paper is organized as follows: In section II,
Initially, deep learning-based object detection methods relied it discusses the related works. In section III, it introduces the
on window-sliding or region-proposal; and these methods tried proposed method. In section IV, it describes the evaluation
to find a bounding box for each possible object in an image. metrics and the experimental results. At the end, it provides
R-CNN (region-based convolutional neural networks) [27] the conclusion.
was the early work which utilized selective search [28] to
generate candidate regions, and then sent the regions into II. R ELATED W ORKS
a CNN for classification. Based on R-CNN, Cha et al. [29] In this section, it discusses the techniques related to the
designed a convolutional network for pavement crack detection proposed method.
which worked with window-sliding mode. Zhang et al. [4]
employed a CNN for pre-classification which removed most
of the noise areas before performing crack and sealed crack A. Generative Adversarial Networks
detection. Problems of these methods were: (1) window- Goodfellow et al. [37] proposed generative adversarial net-
sliding-based strategy was impractical due to the huge time work (GAN) which could be trained to generate real-like
complexity, especially when processing large images [5]; images by conducting a max-min two-player game. Based on
(2) traditional region-proposal methods [28] were unable GAN, Mirza and Osindero [38] proposed conditional GAN
Authorized licensed use limited to: University of Melbourne. Downloaded on May 16,2020 at 10:32:14 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
ZHANG et al.: CrackGAN: PAVEMENT CRACK DETECTION USING PARTIALLY ACCURATE GTs 3
Authorized licensed use limited to: University of Melbourne. Downloaded on May 16,2020 at 10:32:14 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 2. “All Black” issue encountered when using FCN-based method for
pixel-level crack detection: (a) the industrial pavement crack image; (b) the
dilated GT-image utilized in the training; and (c) the detection result with the
“well-trained” U-Net (see Fig. 3).
Fig. 4. Pre-train a one-class DC-GAN with augmented GTs based on
CPO-supervision. The real crack-GT data are augmented with manually
marked “crack” curves.
Fig. 3. The loss and accuracy curves when training a regular U-Net using where x is the image from the real data (crack-GT-like
industrial pavement images with partially accurate GTs. patches) with distribution pd (x); z is the noise vector gener-
ated randomly from Gaussian distribution pd (z); and D is the
discriminator and G is the generator set up with convolutional
used in this work are originally designed for crack patch
and deconvolutional kernels, respectively. In practice, whether
sampling [4] which are highly biased, and that makes the
a sample is real or fake depends on the data setting. In accor-
problem more serious. The network simply classified all the
dance with the CPO-supervision, only crack-GT patches are
pixels as BG, and still achieved quite a “good” accuracy (since
contained in the real image-set for training the DC-GAN.
BG pixels dominate the whole images), that was the “All
With such setting, the discriminator will only recognize
Black” issue. As shown in Figs. 2 and 3, during training,
crack-GT patch as real and treat all-black patch as fake, which
the loss decreases rapidly and approaches to a very low value;
prevents the network to generate all-black (fake) image as
however, the detection results are all blacks (i.e., all BGs).
the detection result, thus overcoming the “All Black” issue.
Moreover, it is worth to mention that other FCN architectures
Such discriminator is named one-class discriminator. In the
also encounter such problem; here it takes U-Net as an
implementation, the crack-GT data are further augmented by
example. Since the GTs are only partially accurate, existing
manually marking a bunch of “crack” curves and sampling the
approaches for solving data imbalance cannot work here.
patches accordingly, as indicated in Fig. 4.
In Fig. 5, after training, the discriminator of the well-trained
B. CPO-Supervision and One-Class Discriminator DC-GAN is concatenated to the end of the asymmetric U-Net
Regular FCN-based methods may only produce all-black generator to provide the adversarial loss for end-to-end train-
images as the detection results [5]. In order to address this ing. Since the output of the generator serves as a fake image,
problem, it adds a new constraint, generative adversarial loss, the adversarial loss is:
to regularize the objective function, which will make the
network always generate crack-GT detection result; accord- L adv = −E x∈I [log D(G(x))] (4)
ingly, the training data are prepared with crack patches Here, different from pre-training the DC-GAN in Eq. (2),
only (i.e., CPO-supervision), without involving any non-crack x is the crack-patch, and I is the training set containing
patches and “all black” patches [51]. As shown in Fig. 1, crack patches only; G is set up with the asymmetric U-Net
the adversarial loss is provided by a one-class discriminator architecture illustrated in Fig. 5, and D is the pre-trained
obtained via pre-training the DC-GAN [39] only with crack- one-class discriminator.
GT-like patches. It is well-known that the DC-GAN can
generate real-like images from random noise by conducting the
training with a max-min two-player game, in which a generator C. Asymmetric U-Net Generator
is used to generate real-like images and a discriminator is In subsection III-B, it introduced the CPO-supervision and
used to distinguish between real and fake images. As verified generative adversarial learning to force the network to always
Authorized licensed use limited to: University of Melbourne. Downloaded on May 16,2020 at 10:32:14 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
ZHANG et al.: CrackGAN: PAVEMENT CRACK DETECTION USING PARTIALLY ACCURATE GTs 5
Authorized licensed use limited to: University of Melbourne. Downloaded on May 16,2020 at 10:32:14 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: University of Melbourne. Downloaded on May 16,2020 at 10:32:14 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
ZHANG et al.: CrackGAN: PAVEMENT CRACK DETECTION USING PARTIALLY ACCURATE GTs 7
Fig. 10. Illustration of pixel-level mismatching: (a) a crack image; (b) detec-
tion result overlapped with GT-image dilated once using a disk structure with
radius 3; and (c) detection result overlapped with GT-image dilated four times.
The transparent areas represent the dilated GT-cracks with different dilation
scales.
Fig. 9. Grid search board with HD-scores utilized to determinate the optimal
parameters λ and dilation-scale. The optimal parameters are determined CFD contains 118 pavement crack images (480 × 320-pixel
according to the best HD-score. Testing results of the three main failure cases each) obtained by people standing on the road using an
are also present in the figure including the “All Black” result, over-dilated iPhone, the ground truths are carefully marked at pixel level
result, and the non-sense patterns which overlooked the pixel-level loss.
which is labor-intensive. The image quality is high and the
background is smooth and clean. CGD is a dataset with
low-level feature maps of a classification network trained with 400 pavement crack images (2048×4096-pixel each) collected
crack and non-crack patches [4]. The classification network is by the authors using a line-scan industrial camera mounted
configured by adding a fully connected layer at the end of on the top of a vehicle running at 100km/h; and the camera
the encoding part (bottleneck) of the asymmetric U-Net and scans 4.096-meter width road surface and produces a pave-
the output dimension is 2 representing crack and non-crack ment image of 2048 × 4096-pixel for every 2048 line-scans
with labels 0 and 1. The training samples are crack and (i.e., 1 pixel represents 1 × 1 mm2 area). Most of the cracks
non-crack patches of 256 × 256. It shows that the network are thin, and sometimes even hard to be recognized by human.
extracted same crack pattern as the original image, i.e., the Furthermore, it is infeasible to obtain the accurate GTs at
network is able to learn useful information with the weakly pixel-level; thus, the cracks are represented by 1-pixel curves
supervised information, crack/non-crack image labels only. roughly marked by the engineers in a labor-light way, and it
Then the well-trained parameters are used to initialize the is named partially accurate GTs). However, such GTs may
encoding part of the generator for the end-to-end training; and not match the true crack locations accurately, and processing
the other settings are same with the DC-GAN except replacing them is much more challenging. The proposed algorithm can
the generator with the asymmetric U-Net and changing the achieve the best results using both “accurate” and “partially
objective function according to Eq. (6). accurate” datasets that demonstrate its robustness as well.
b) Parameter selection: The parameters, λ in Eq. (6) and For CFD and CGD, the data are augmented following [4]
the dilation scale, are determined by grid search. From the grid to facilitate the training; and the training-test ratio is 2:1.
search board in Fig. 9, the dilation scale (number of dilation Dataset [17] has industrial images from five different capture
times with the disk structure) should be between 3 and 7, and systems: Aigle-RN has 38 images with annotation, ESAR has
the λ should be between 0.15 and 0.4 for an effective detection. 15 images with annotations, and LCMS has 5, LRIS has 3 and
Dilation scale less than 3 will cause the “All Black” issue Tempest has 7 images with annotations, respectively. The GTs
due to the pixel-level mismatching and the data imbalance, are marked at pixel level. To our best knowledge, it is the only
while too big dilation (>8) could not provide meaningful public pavement crack dataset from industry; and it contains
crack location information and would produce useless output. relatively few images, they are used for testing only.
In addition, a larger λ tends to fail the task, but a small λ Different from most object detection tasks [55], the inter-
(0.15 < λ <0.4) can work well, which indicates that the section over union (IOU) is not suitable for evaluating crack
adversarial loss is very important to succeed the training under detection algorithms [56]. As shown in Fig. 10, crack, as a
the industrial setting. It can be observed from the grid board long-narrow target, only occupies a very small area, and the
that the HD-score is either a good one (>80) or a very small image consists mainly of BG pixels. With the fact that the
one (0 or 1) which indeed represents the two different model precise pixel-level GTs are difficult to obtain, it is impossible
statuses, well-trained or failed. However, the causes of the to obtain the accurate intersection area. As shown in Fig. 10
failures could be grouped into three main cases: over-dilated, (b) and Fig. 10 (c), it is obvious that the detection results are
“All Black” problem, or the useless output pattern which very good; however, the IOU values are very low, 0.13 and 0.2,
overlooks the pixel-level loss with a small λ, as indicated respectively. According to [56], it employs Hausdorff distance
in Fig. 9. to evaluate the crack localization accuracy. For two sets of
points A and B, the Hausdorff distance can be calculated with:
IV. E XPERIMENTS
H (A, B) = max[h(A, B), h(A, B)] (7)
A. Dataset and Metrics
where
CFD [3], the CrackGAN dataset (CGD) collected by
the authors, and dataset [17] are utilized for evaluation. h(A, B) = max a∈Ami n b∈B a − b (8)
Authorized licensed use limited to: University of Melbourne. Downloaded on May 16,2020 at 10:32:14 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
TABLE I
Q UANTITATIVE E VALUATIONS ON CFD
Fig. 11. Region-based evaluation: (a) the original crack image; (b) illustration
of counting the crack and non-crack regions. The squares with label “1s” are
the crack regions, and with label “0s” are BG-regions.
B. Overall Performance
The comparisons are performed on CFD [3], CGD, and
The penalty is defined as: dataset [17] to justify the state-of-the-art performance. Since
some of the papers only provided the final detection results,
h p (A, B) = 1/(|A|) satu mi n b∈B a − b (9) the PR-curves are plotted out only for the methods with the
a∈A source codes.
1) CFD: The proposed method is compared with
Here, parameter u is the upper limit of the saturation function CrackIT-v1 [57], MFCD [58], CrackForest [3], [29],
sat which is used to directly get rid of the false positives that FCN-VGG [7], Pix2pix GAN (with U-Net as the genera-
are far away from the GTs. Instead of setting u as 1/5 of tor) [40], and DeepCrack [36] on CFD; and the related results
the image width [56], it is set as 50-pixel to emphasize the are shown in Fig. 12, Fig. 13 (a), and Table I. CrackIT
localization accuracy by eliminating the influence of possible introduced the traditional mean and standard deviation (STD)
noises from the large BG areas. A is the detected crack set for crack patch selection, and utilized some post-processing
and B is the GT set, the overall score is: for pixel-based crack detection. However, the features with
mean and STD are not able to select the crack patches well,
B H (A, B)
scor e B H (A, B) = 100 − × 100 (10) especially when the cracks are thin; thus, the false negative rate
u is high, and it cannot even detect any cracks in the second and
where third sample images from Fig. 12. MFCD developed a complex
path verification algorithm to link candidate crack seeds for
B H (A, B) = max[h p (A, B), h p (B, A)] (11) the detection; however, it might also connect the false positives
and generate fake cracks. As shown in Fig. 12, it produces
The Hausdorff distance score (HD-score) can reflect the many noises in the third image with non-smooth background.
overall crack localization accuracy, and it is insensitive to CrackForest employed integral channel information with 3 col-
the foreground-background imbalance inherent in long-narrow ors, 2 magnitudes and 8 orientations for feature extraction
object detection. and applied random forest for crack token mapping; and the
In addition, the region-based precision rate (p-rate) and histogram difference between crack and non-crack regions was
recall rate (r-rate) are used for evaluation, which can measure used for noise removal. As shown in Fig. 12, it achieves very
the false-detection severity and the missed-detection severity, good results on the images whose backgrounds are smooth and
respectively. In Fig. 11, a pavement image of 400 × 400-pixel clean. However, the performance deteriorates when processing
is divided into small image patches (50×50-pixel); if there is a the industrial images as shown in Figs. 14 and 15. [29]
crack detected in a patch, marked as “1s”, it is positive. In the was a patch-level crack detection method which trained a
same way, for GT images, if there is a marked curve in a patch, deep classification network for crack and non-crack patch
it is a crack patch. Then the region based true positive (TP), classification; it could not provide accurate crack locations
false positive (FP) and false negative (FN) can be obtained as shown in Fig. 12. FCN-VGG was a pixel-level crack
by counting the corresponding squares, and further be used to detection method of which the accurate pixel-level GTs were
calculate the region based precision and recall rates: needed to train the FCN-based network end-to-end. Similar
to the results reported in the original papers, it failed when
T Pregion detecting thin cracks. DeepCrack achieves very good results
Pregion = (12)
T Pregion + F Pregion on CFD due to the multi-scale hierarchical fusion; however,
T Pregion the training relied on accurate GTs and the method would
Rregion = (13) fail easily when the GTs are biased. Pix2pix GAN [40]
T Pregion + F Nregion
was an image-to-image translation network with U-Net as
Then the region-based F1 score can be computed as: the generator which introduced generative adversarial learn-
ing for image style translation originally. However, as dis-
2 ∗ Pregion ∗ Rregion cussed in section III-C, the discriminator treats both crack
F1region = (14)
Pregion + Rregion and non-crack as real which can immediately weaken the
Authorized licensed use limited to: University of Melbourne. Downloaded on May 16,2020 at 10:32:14 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
ZHANG et al.: CrackGAN: PAVEMENT CRACK DETECTION USING PARTIALLY ACCURATE GTs 9
Fig. 12. Comparison of the detection results on CFD using different methods. From top to bottom are: original images, GT images, results of CrackIT,
results of MFCD, results of CrackForest, results of [29], results of FCN-VGG, results of DeepCrack, results of Pix2pix GAN, and results of CrackGAN,
respectively.
crack-patch generation ability, that makes the network similar problem as shown in Figs. 14 and 15. CrackGAN introduces
to the regular U-Net; therefore, it achieves similar results CPO-supervision and the asymmetric U-Net architecture to
as FCN-based methods, and also encounters “All Black” build the one-class discriminator for generative adversarial
Authorized licensed use limited to: University of Melbourne. Downloaded on May 16,2020 at 10:32:14 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: University of Melbourne. Downloaded on May 16,2020 at 10:32:14 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
ZHANG et al.: CrackGAN: PAVEMENT CRACK DETECTION USING PARTIALLY ACCURATE GTs 11
Fig. 14. Comparison of the detection results on CGD. From top to bottom are: original images, GT images, results of CrackIT, results of CrackForest, results
of [29], results of FCN-VGG, results of DeepCrack-1, results of DeepCrack-2, results of Pix2pix GAN, and results of CrackGAN, respectively.
memory and twelve i7 cores. For the deep learning methods, because it is based on the window-sliding. FCN [7], Deep-
they are implemented with the same computer but run on an Crack, Pix2pix GAN, and CrackGAN take much less time
Nvidia 1080Ti GPU with Pytorch. [29] takes 10.2 seconds due to the FCN architecture; moreover, the CrackGAN
Authorized licensed use limited to: University of Melbourne. Downloaded on May 16,2020 at 10:32:14 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 15. Comparison of the detection results on dataset [17]. From top to bottom are: original images, GT images, results of CrackIT, results of CrackForest,
results of FCN-VGG, results of Pix2pix GAN, results of MPS [17], and results of CrackGAN, respectively.
TABLE IV
C OMPARISONS OF C OMPUTATIONAL E FFICIENCY
V. C ONCLUSION
In this work, we propose a novel deep generative adversarial
Fig. 16. Crack detection on concrete pavement images and concrete wall network, named CrackGAN, for pavement crack detection.
images: (a) and (b) are concrete wall images; (c) and (d) are concrete
pavement images; (e)-(h) are the detection results by CrackGAN. The method solves a practical and essential problem, “All
Black” issue, existing in FCN-based pixel-level crack detection
takes much less time (i.e., 1.6 seconds) because it cuts off when using partially accurate GTs. More important, the net-
the last de-convolutional layer for the asymmetric U-Net work can solve crack detection tasks in a labor-light way. It can
design. reduce the workload of preparing GTs significantly, and create
Authorized licensed use limited to: University of Melbourne. Downloaded on May 16,2020 at 10:32:14 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
ZHANG et al.: CrackGAN: PAVEMENT CRACK DETECTION USING PARTIALLY ACCURATE GTs 13
the new idea for object detection/segmentation using partially [17] R. Amhaz, S. Chambon, J. Idier, and V. Baltazart, “Automatic crack
accurate GTs. Moreover, the method can solve the data detection on two-dimensional pavement images: An algorithm based
on minimal path selection,” IEEE Trans. Intell. Transp. Syst., vol. 17,
imbalance problem which is the byproduct of the proposed no. 10, pp. 2718–2729, Oct. 2016.
approach. In addition, the network is trained with small image [18] Y.-C. Tsai, V. Kaul, and R. M. Mersereau, “Critical assessment of
patches, but can deal with any size images. The experiments pavement distress segmentation methods,” J. Transp. Eng., vol. 136,
no. 1, pp. 11–19, Jan. 2010.
demonstrate the effectiveness and superiority of the proposed
[19] I. Abdel-Qader, O. Abudayyeh, and M. E. Kelly, “Analysis of edge-
method, and the proposed approach achieves state-of-the-art detection techniques for crack identification in bridges,” J. Comput. Civil
performance comparing with the recently published works. Eng., vol. 17, no. 4, pp. 255–263, Oct. 2003.
The theoretical analysis of neuron’s property concerning [20] R. C. Gonzalez, R. E. Woods, and S. L. Steven, Digital Image Processing
Using MATLAB, Boston, MA, USA: Addison-Wesley, 2009.
receptive field can be employed to explain many phenomena [21] H. D. Cheng, J. Wang, Y. G. Hu, C. Glazier, X. J. Shi, and
in deep learning, such as the boundary vagueness in seman- X. W. Chen, “Novel approach to pavement cracking detection based
tic segmentation [6], blurry of the generated images with on neural network,” Transp. Res. Rec., J. Transp. Res. Board, vol. 1764,
no. 1, pp. 119–127, Jan. 2001.
GAN [40], [41], etc. We believe that the analysis of each
[22] H. Oliveira and P. L. Correia, “Automatic road crack detection and
neuron’s property discussed in this paper could become a characterization,” IEEE Trans. Intell. Transp. Syst., vol. 14, no. 1,
routine for designing effective neural networks in the future. pp. 155–168, Mar. 2013.
[23] Y. Hu, C.-X. Zhao, and H.-N. Wang, “Automatic pavement crack
detection using texture and shape descriptors,” IETE Tech. Rev., vol. 27,
no. 5, pp. 398–405, 2010.
R EFERENCES [24] M. Gavilán et al., “Adaptive road crack detection system by pavement
classification,” Sensors, vol. 11, no. 10, pp. 9628–9657, 2011.
[1] N. F. Hawks and T. P. Teng, “Distress identification manual for the long- [25] E. Zalama, J. Gómez-García-Bermejo, R. Medina, and J. Llamas,
term pavement performance project,” Nat. Acad. Sci., Washington, DC, “Road crack detection using visual features extracted by Gabor filters,”
USA, Tech. Rep. SHRP-P-338, 2014. Comput.-Aided Civil Infrastruct. Eng., vol. 29, no. 5, pp. 342–358,
[2] L. Zhang, F. Yang, Y. Daniel Zhang, and Y. J. Zhu, “Road crack detection May 2014.
using deep convolutional neural network,” in Proc. IEEE Int. Conf. [26] P. Dollar and C. L. Zitnick, “Structured forests for fast edge detection,”
Image Process. (ICIP), Sep. 2016, pp. 3708–3712. in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2013, pp. 1841–1848.
[3] Y. Shi, L. Cui, Z. Qi, F. Meng, and Z. S. Chen, “Automatic road crack [27] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierar-
detection using random structured forests,” IEEE Trans. Intell. Transp. chies for accurate object detection and semantic segmentation,” in Proc.
Syst., vol. 17, no. 12, pp. 3434–3445, Dec. 2016. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 580–587.
[4] K. Zhang, H. D. Cheng, and B. Zhang, “Unified approach to pave- [28] J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and
ment crack and sealed crack detection using preclassification based on A. W. M. Smeulders, “Selective search for object recognition,” Int.
transfer learning,” J. Comput. Civil Eng., vol. 32, no. 2, Mar. 2018, J. Comput. Vis., vol. 104, no. 2, pp. 154–171, Sep. 2013.
Art. no. 04018001. [29] Y.-J. Cha, W. Choi, and O. Büyüköztürk, “Deep learning-based crack
[5] K. Zhang, H.-D. Cheng, and S. Gai, “Efficient dense-dilation network damage detection using convolutional neural networks,” Comput.-Aided
for pavement cracks detection with large input image size,” in Proc. Civil Infrastruct. Eng., vol. 32, no. 5, pp. 361–378, May 2017.
21st Int. Conf. Intell. Transp. Syst. (ITSC), Maui, HI, USA, Nov. 2018,
[30] A. Zhang et al., “Automated pixel-level pavement crack detection on 3D
pp. 884–889.
asphalt surfaces using a deep-learning network,” Comput.-Aided Civil
[6] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks Infrastruct. Eng., vol. 32, no. 10, pp. 805–819, 2017.
for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit. (CVPR), Boston, MA, USA, Jun. 2015, pp. 3431–3440. [31] F. C. Chen and M. R. Jahanshahi, “NB-CNN: Deep learning-based
crack detection using convolutional neural network and Naïve Bayes
[7] X. Yang, H. Li, Y. Yu, X. Luo, T. Huang, and X. Yang, “Automatic
data fusion,” IEEE Trans. Ind. Electron., vol. 65, no. 5, pp. 4392–4400,
pixel-level crack detection and measurement using fully convolutional
May 2018.
network,” Comput.-Aided Civil Infrastruct. Eng., vol. 33, no. 12,
pp. 1090–1109, Dec. 2018. [32] S. Park, S. Bang, H. Kim, and H. Kim, “Patch-based crack detection
in black box images using convolutional neural networks,” J. Comput.
[8] H. D. Cheng, J. Chen, C. Glazier, and Y. Hu, “Novel approach to
Civil Eng., vol. 33, no. 3, May 2019, Art. no. 04019017.
pavement cracking detection based on fuzzy set theory,” J. Comput.
Civil Eng., vol. 13, no. 4, pp. 270–280, 1999. [33] Z. Tong, J. Gao, Z. Han, and Z. Wang, “Recognition of asphalt pavement
[9] K. Wang, Q. Li, and W. Gong, “Wavelet-based pavement distress image crack length using deep convolutional neural networks,” Road Mater.
edge detection with a trous algorithm,” Transp. Res. Rec., vol. 2024, Pavement Des., vol. 19, no. 6, pp. 1334–1349, Aug. 2018.
no. 1, pp. 24–32, 2000. [34] H. Nhat-Duc, Q.-L. Nguyen, and V.-D. Tran, “Automatic recognition
[10] H. Oliveira and P. L. Correia, “Automatic road crack segmentation using of asphalt pavement cracks using Metaheuristic optimized edge detec-
entropy and image dynamic thresholding,” in Proc. 17th Eur. Signal tion algorithms and convolution neural network,” Automat. Construct.,
Process. Conf., Glasgow, U.K., Aug. 2009, pp. 622–626. vol. 94, pp. 203–213, Oct. 2018.
[11] Q. Zou, Y. Cao, Q. Li, Q. Mao, and S. Wang, “CrackTree: Automatic [35] K. Gopalakrishnan, S. K. Khaitan, A. Choudhary, and A. Agrawal,
crack detection from pavement images,” Pattern Recognit. Lett., vol. 33, “Deep convolutional neural networks with transfer learning for com-
no. 3, pp. 227–238, Feb. 2012. puter vision-based data-driven pavement distress detection,” Construct.
[12] Y. Huang, “Automatic inspection of pavement cracking distress,” J. Elec- Building Mater., vol. 157, pp. 322–330, Dec. 2017.
tron. Imag., vol. 15, no. 1, Jan. 2006, Art. no. 013017. [36] Q. Zou, Z. Zhang, Q. Li, X. Qi, Q. Wang, and S. Wang, “DeepCrack:
[13] G. Li, S. He, Y. Ju, and K. Du, “Long-distance precision inspection Learning hierarchical convolutional features for crack detection,” IEEE
method for bridge cracks with image processing,” Automat. Construct., Trans. Image Process., vol. 28, no. 3, pp. 1498–1512, Mar. 2019.
vol. 41, pp. 83–95, May 2014. [37] I. J. Goodfellow et al., “Generative adversarial nets,” in Proc. NIPS,
[14] J. Chen, M. Su, R. Cao, S. Hu, and C. Lu, “A self organizing map 2014, pp. 2672–2680.
optimization based image recognition and processing model for bridge [38] M. Mirza and S. Osindero, “Conditional generative adver-
crack inspection,” Automat. Construct., vol. 73, pp. 58–66, Jan. 2007. sarial nets,” 2014, arXiv:1411.1784. [Online]. Available:
[15] L. Wu, S. Mokhtari, A. Nazef, B. Nam, and H.-B. Yun, “Improvement of http://arxiv.org/abs/1411.1784
crack-detection accuracy using a novel crack defragmentation technique [39] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation
in image-based road assessment,” J. Comput. Civil Eng., vol. 30, no. 1, learning with deep convolutional generative adversarial networks,” 2015,
Jan. 2016, Art. no. 04014118. arXiv:1511.06434. [Online]. Available: https://arxiv.org/abs/1511.06434
[16] T. S. Nguyen, S. Begot, F. Duculty, and M. Avila, “Free-form anisotropy: [40] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation
A new method for crack detection on pavement surface images,” in Proc. with conditional adversarial networks,” in Proc. IEEE Conf. Comput. Vis.
18th IEEE Int. Conf. Image Process., Sep. 2011, pp. 1069–1072. Pattern Recognit. (CVPR), Jul. 2017, pp. 1125–1134.
Authorized licensed use limited to: University of Melbourne. Downloaded on May 16,2020 at 10:32:14 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
[41] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired Image-to-Image Yingtao Zhang (Member, IEEE) received the M.S.
translation using cycle-consistent adversarial networks,” in Proc. IEEE degree in computer science and the Ph.D. degree
Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2223–2232. in pattern recognition and intelligence system from
[42] S. Jialin Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans. the Harbin Institute of Technology, Harbin, China,
Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010. in 2004 and 2010, respectively. She is currently
[43] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable an Associate Professor with the School of Com-
are features in deep neural networks?” in Proc. NIPS, Montreal, QC, puter Science and Technology, Harbin Institute of
Canada, 2014, pp. 3320–3328. Technology. Her research interests include pattern
[44] M. Oquab, L. Bottou, I. Laptev, and J. Sivic, “Learning and transferring recognition, computer vision, and medical image
mid-level image representations using convolutional neural networks,” processing.
in Proc. CVPR, New York, NY, USA, Jun. 2014, pp. 1717–1724.
[45] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet:
A large-scale hierarchical image database,” in Proc. CVPR, Jun. 2009,
pp. 248–255.
[46] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classifica-
tion with deep convolutional neural networks,” in Proc. NIPS, 2012,
pp. 1097–1105.
[47] L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille,
“Semantic image segmentation with deep convolutional nets and
fully connected CRFs,” 2014, arXiv:1412.7062. [Online]. Available:
https://arxiv.org/abs/1412.7062
[48] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional net-
works for biomedical image segmentation,” in Proc. Int. Conf. Med.
Image Comput. Comput.-Assist. Intervent., 2015, pp. 234–241.
[49] S. Xie and Z. Tu, “Holistically-nested edge detection,” in Proc. ICCV,
Dec. 2015, pp. 1395–1403.
[50] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated
convolutions,” in Proc. ICLR, 2016, pp. 1–13.
[51] K. Zhang, Y. Zhang, and H. D. Cheng, “Self-supervised structure
learning for crack detection based on cycle-consistent generative adver-
sarial networks,” J. Comput. Civil Eng., vol. 34, no. 3, May 2020,
Art. no. 04020004.
[52] L. Tran, X. Yin, and X. Liu, “Disentangled representation learning GAN
for pose-invariant face recognition,” in Proc. IEEE Conf. Comput. Vis.
Pattern Recognit. (CVPR), Jul. 2017, pp. 1415–1424.
[53] V. Nair and G. E. Hinton, “Rectified linear units improve restricted
Boltzmann machines,” in Proc. ICML, 2010, pp. 1–8.
[54] D. P. Kingma and J. Ba, “Adam: A method for stochastic Heng-Da Cheng (Life Senior Member, IEEE)
optimization,” 2014, arXiv:1412.6980. [Online]. Available: received the Ph.D. degree in electrical engineering
http://arxiv.org/abs/1412.6980 from Purdue University, West Lafayette, IN, in 1985,
[55] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zis- under the supervision Prof. K. S. Fu.
serman. The PASCAL Visual Object Classes Challenge 2011 Results. He is currently a Full Professor with the Depart-
Accessed: 2018. [Online]. Available: http://www.pascalnetwork.org/ ment of Computer Science, and an Adjunct Full
challenges/VOC/voc2011 Professor with the Department of Electrical Engi-
[56] Y.-C. Tsai and A. Chatterjee, “Comprehensive, quantitative crack detec- neering, Utah State University, Logan, UT. He is
tion algorithm performance evaluation system,” J. Comput. Civil Eng., also an Adjunct Professor and a Ph.D. Supervisor
vol. 31, no. 5, Sep. 2017, Art. no. 04017047. with the Harbin Institute of Technology. He is also
[57] H. Oliveira and P. L. Correia, “CrackIT—An image processing toolbox a Guest Professor with the Institute of Remote
for crack detection and characterization,” in Proc. IEEE Int. Conf. Image Sensing Application, Chinese Academy of Sciences, Wuhan University, and
Process. (ICIP), Paris, France, Oct. 2014, pp. 798–802. Shantou University, and a Visiting Professor with Northern Jiaotong Uni-
[58] H. Li, D. Song, Y. Liu, and B. Li, “Automatic pavement crack detection versity, Huazhong Science and Technology University, and Huanan Normal
by multi-scale image fusion,” IEEE Trans. Intell. Transp. Syst., vol. 20, University. He has authored over 350 technical articles. He is also the
no. 6, pp. 2025–2036, Jun. 2019, doi: 10.1109/TITS.2018.2856928. Co-Editor of the book entitled Pattern Recognition: Algorithms, Architectures,
and Applications (World Scientific Publishing Company, 1991). His research
interests include image processing, pattern recognition, computer vision,
Kaige Zhang (Member, IEEE) received the B.S. artificial intelligence, medical information processing, fuzzy logic, genetic
degree in electronic engineering from the Harbin algorithms, neural networks, parallel processing, parallel algorithms, and VLSI
Institute of Technology, China, in 2011, the M.S. architectures.
degree in signal and information processing from Dr. Cheng was the General Chair of the Eighth JCIS in 2005, the Ninth
Harbin Engineering University, China, in 2014, and JCIS in 2006, the tenth JCIS in 2007, and the 11th Joint Conference on
the Ph.D. degree in computer science from Utah Information Sciences (JCIS) in 2008. He has served as a Program Committee
State University, USA, in 2019. His research inter- Member and the Session Chair for many conferences, and as a Reviewer for
ests include computer vision, machine learning, many scientific journals and conferences. He has been listed in Who’s Who in
and the applications on intelligent transportation the World, Who’s Who in America, and Who’s Who in Communications and
systems, precision agriculture, and biomedical data Media. He is also an Associate Editor of Pattern Recognition, Information
analytics. Sciences, and New Mathematics and Natural Computation.
Authorized licensed use limited to: University of Melbourne. Downloaded on May 16,2020 at 10:32:14 UTC from IEEE Xplore. Restrictions apply.