0% found this document useful (0 votes)
26 views5 pages

Zhang 2016

The document proposes a deep convolutional neural network approach for detecting cracks in pavement images. A convolutional neural network is trained on labeled image patches to classify patches as containing cracks or not. The network architecture and training process are described. Evaluation on a dataset of over 500 pavement images shows the deep learning method provides superior crack detection compared to existing hand-crafted feature methods.

Uploaded by

SANE KARIMI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views5 pages

Zhang 2016

The document proposes a deep convolutional neural network approach for detecting cracks in pavement images. A convolutional neural network is trained on labeled image patches to classify patches as containing cracks or not. The network architecture and training process are described. Evaluation on a dataset of over 500 pavement images shows the deep learning method provides superior crack detection compared to existing hand-crafted feature methods.

Uploaded by

SANE KARIMI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

ROAD CRACK DETECTION USING DEEP CONVOLUTIONAL NEURAL NETWORK

Lei Zhang, Fan Yang, Yimin Daniel Zhang, and Ying Julie Zhu

Department of Electrical and Computer Engineering, Temple University, Philadelphia, PA 19122, USA

ABSTRACT A fully integrated system for crack detection and characteri-


zation is proposed in [9] and a comprehensive set of image
Automatic detection of pavement cracks is an important task
processing algorithms for detection and characterization of
in transportation maintenance for driving safety assurance.
road pavement surface crack distresses is introduced in [5].
However, it remains a challenging task due to the intensity
Although hand-crafted features are widely used and support
inhomogeneity of cracks and complexity of the background,
top-ranking algorithms on the well acquired data set [4, 5, 10],
e.g., the low contrast with surrounding pavement and possi-
it is important to note that they are not discriminative enough
ble shadows with similar intensity. Inspired by recent success
to differentiate the crack and complex background in low lev-
on applying deep learning to computer vision and medical
el image cues.
problems, a deep-learning based method for crack detection
On the other hand, the impressive performances for many
is proposed in this paper. A supervised deep convolutional
medical imaging and computer vision tasks have evidently
neural network is trained to classify each image patch in the
showcased the effectiveness of deep features learned by deep
collected images. Quantitative evaluation conducted on a data
neural networks [11–16] which are likely to replace the con-
set of 500 images of size 3264 × 2448, collected by a low-
ventional hand-crafted features [17]. Restricted Boltzmann
cost smart phone, demonstrates that the learned deep features
machine (RBM), autoencoder and their variants are popular
with the proposed deep learning framework provide superi-
for unsupervised deep learning when the number of labelled
or crack detection performance when compared with features
examples is small, while deep convolutional neural network-
extracted with existing hand-craft methods.
s (ConvNets) are popular for feature learning and supervised
Index Terms— Deep learning, convolution neural net- classification [17]. Such promising results motivate the ap-
works, road crack detection, road survey plication of deep learning techniques into the crack detection
problems.
Successful application of deep learning techniques for
1. INTRODUCTION
crack detection rely on discriminative and representative
Keeping roads in a good condition is vital to safe driving deep features. In this paper, we develop a novel crack detec-
and is an important task of both state and local transportation tion method in which the discriminative features are learned
maintenance departments. One important component of this directly from raw image patches using the ConvNets. To
task is to monitor the degradation of road conditions, which the best of our knowledge, this work is the first attempt to
is labor intensive and requires domain expertise. Recently, bridge the gap between deep convolution neural networks
computer vision and machine learning techniques have been and transportation research. The proposed approach differs
successfully applied to automate road surface survey [1–5]. from recent works on crack detection in the following four
In this work, we focus on detecting cracks on the pavemen- important aspects: 1) The proposed approach leverages deep
t surface, because they represent the most prevalent type of learning based detectors instead of filter-based detectors as in
road damage and exhibit strong texture cues. A large number [3]; 2) It does not make any assumption of the geometry of
of recent literature in crack detection and characterization of the pavement as required in [10]; 3) We use discriminative
pavement surface distresses clearly demonstrates an increas- features, which are automatically learned from images, rather
ing interest in this research area [3, 4, 6, 7]. than hand-crafted features [8, 10]; 4) Unlike existing meth-
The traditional framework for crack detection designs a ods that require specific optical devices [5, 9], the proposed
variety of gradient features for each image pixel, which are approach is successfully applied to images that are collected
followed by a binary classifier to determine whether an im- using a low-cost smart phone with complex background.
age pixel contains a crack or not. A local binary pattern-
s (LBP) based algorithm for crack detection is developed in 2. PROPOSED METHOD
[8], whereas a crack detection method using the Gabor filter
is proposed in [3]. In [4], an automatic crack detection based Given a pavement image, the objective of a crack detection
on the tree structure, referred to as CrackTree, is introduced. problem is to determine whether a specific pixel is a part of a

‹,(((  ,&,3


Fig. 1: Illustration of the architecture of the proposed ConvNet.

crack. To solve this problem, the proposed solution is based 2.2. ConvNet Architecture
on a ConvNet, which is trained on square image patches with
given ground truth information, for the classification of patch- The architecture of the ConvNet is illustrated in Fig. 1, where
es with and without cracks. For notational convenience, crack conv, mp, and fc represent convolutional, max-pooling and
and non-crack patches are also referred to as positive and neg- fully-connected layers, respectively. In general, the ConvNet
ative patches, respectively. In this paper, a patch whose center is considered as a hierarchical feature extractor, which ex-
is itself a crack pixel, or is within the close vicinity of a crack tracts features of different abstract levels and maps raw pixel
pixel, is considered as a positive patch. Otherwise, this patch intensities of the crack patch into a feature vector by several
is considered as a negative patch. fully connected layers. All parameters are jointly optimized
through minimization of the misclassification error over the
2.1. Data preparation training set via the back propagation method [18].
Data set with more than 500 pavement pictures of size 3264 All convolutional filter kernel elements are trained from
× 2448 are collected at the Temple University campus by us- the data in a supervised fashion by learning from the labeled
ing a smart phone as the data sensor. Each image is anno- set of examples introduced in Section 2.1. In each convolu-
tated by multiple annotators. In this study, to achieve a good tional layer, the ConvNet performs max-pooling operations in
compromise between computational cost and accuracy of the order to summarize feature responses across neighboring pix-
detection results [12, 13], each sample is a 3-channel (RGB) els. Such operations allow the ConvNet to learn features that
99×99 pixel image patch generated by the sampling strategy are spatially invariant, i.e., they do not change with respect to
described in the following steps: the location of objects in the images. Finally, fully-connected
layers are used for classification. Due to the mutually ex-
1. A patch whose center is within f = 5 pixels of the clusive property of the underlying crack detection problem
crack centroid is regards as a positive patch; otherwise (crack or non-crack), a softmax layer is used as the last lay-
it is considered as a negative patch. er of the ConvNets to compute the probability of each class
2. To reduce the similarity between training samples, the given an input patch.
overlap of two positive patches P1 and P2 , expressed Given a training set S = {x(i) , y (i) } which contains m
as O = area(P1 ∩ P2 )/area(P1 ∪ P2 ), should be kept image patches, where x(i) is the i-th image patch and y (i) ∈
at a low level. In this study, we choose the distance {0, 1} is the corresponding class label. If y (i) = 1, then x(i)
(i)
between the centers of two adjacent patches to be is a positive patch, otherwise x(i) is a negative patch. Let zj
d=0.75w, where w is the width of a patch. For the (i)
be the output of unit j in the last layer for x . Then, the
negative patches, two adjacent patches should have no probability that the label y (i) of x(i) is j can be calculated by
overlap.
(i)
3. Given a patch center c, each candidate patch is rotat- (i) (i) ezj
p(y = j|zj ) = k , (1)
ed around c by a random angle α ∈ [0◦ , 360◦ ]. This ezl
(i)

l=1
plays an important role to increase the number of crack
samples because crack patches only consist of a small and the corresponding cost function is given by
proportion of the collected images.
⎡ ⎤
1 ⎣   (i) 
m k (i)
Out of the generated samples from the above steps, ezj
640,000 samples are used as the training set, 160,000 sam- J =− 1 y = j log k ⎦ (2)
m i=1 j=1 (i)
ezl
ples are used as the validation set for cross-validation when l=1
training the ConvNets, and 200,000 samples are used as the
testing samples. The numbers of crack and non-crack patches where k = 2, m is the total number of the patches, and 1{·}
are set to equal in all three data sets. stands for the indicator function.


2.3. ConvNet Training Table 1: Hand-crafted features of image patches

The goal of training a ConvNet is to increase the variation Feature Descriptions Number
of the training data and to avoid overfitting analogous to the Mean RGB 3
training data set. The dropout method is used between t-
HSV for mean RGB 3
wo fully connected layers to reduce overfitting by preventing
Hue histogram 5
complex co-adaptations on training data [19]. The output of
each neuron is set to zero with a probability of 0.5. Saturation histogram 3
The training of the ConvNet is accelerated by graphic- LBP 59
s processing units (GPUs). Further speed-ups are achieved Texton histogram 20
by using rectified linear units (ReLU) as the activation func-
tion [14], which is more effective than the hyperbolic tangent
functions tanh(x) and the sigmoid function (1+e−x )−1 used
in traditional neuron models, in both training and evaluation
phases. The ConvNets are trained using the stochastic gradi-
ent descent (SGD) method with a batch size of 48 examples,
momentum of 0.9, and weight decay of 0.0005. Less than 20
epochs are needed to reach a minimum on validation set.

2.4. Processing a Testing Image


To process a testing image, the ConvNet can provide each
point centered within the image a probability of being a crack
or non-crack. This procedure yields a probability map. In- Fig. 2: ROC curves.
spired by the method proposed in [11], the probability of a
point can be calculated by averaging probability {P1 , ..., PN } 3. EXPERIMENTAL EVALUATION
of each patch generated by randomly rotating it around its
center pixel c, i.e., All experiments are performed using an Intel(R) Xeon(R) E3-
1241 V3 @ 3.5GHz CPU with 8 GB RAM and NVidia Quadro
N
1  K220 GPU. The ConvNet was constructed via the Caffe [20]
p(c|{P1 (c), ..., PN (c)}) = Pi (c), (3)
N i=1 framework and trained by using 5-fold cross-validation. The
proposed method is compared against the support vector ma-
where Pi (c) is the classification probability of the ConvNet chine (SVM) and the Boosting methods. The SVM is trained
computed for the i-th individual patch, and N is set to 5 for with LIBSVM [21] and the Gaussian radial basis function
a computing efficiency. The ConvNet has a higher number (RBF) kernel is used with C and γ determined using 5-fold
degrees of freedom and thus tends to exhibit a large variance cross-validation. The Boosting method [22] composed of 100
and a small bias [13]. As such, the number of crack patch- weak classifiers with a maximum depth of 5 is trained via the
es are far less than that of background patches in an image. OpenCV toolkit. All parameters with the minimal test error of
This fact makes the ConvNet to be likely to overestimate the 5-fold cross-validation is used for comparison. The features
crack probability. Therefore, an appropriate threshold has to for training the SVM and the Boosting are based on color and
be used. Define the precision and recall as texture of each patch which are associated with a binary label
indicating the presence or absence of cracked pavement. The
true positive
P = , (4) feature vector is 93-dimensional, and is composed of color el-
true positive + false positive ements, histograms of textons and LBP descriptor within the
true positive patch. The detailed description of the feature vector is shown
R= . (5) in Table 1. Some of the features are adopted from [23] and
true positive + false negative
[10]. Different from [10], the geometry information is not
Then, the F1 score is expressed as considered in this work, since we aim to provide a crack de-
2P R tection method without specific geometry information. The
F1 = . (6) Receiver operating characteristic (ROC) curves are shown in
P +R
Fig. 2 and a summary of the statistics is given in Table 2. It
The threshold used to re-estimate the final probability is deter- is clear from these results that the ConvNet outperforms the
mined such that it yields the largest F1 score on the validation other two detectors.
data set [13]. In this study, the threshold t is set to 0.64, at Figs. 3 and 4 show the images, together with the respec-
which the F1 score is maximized. tive probability of correct classification, of selected patch-


Table 2: Performance comparison of different methods

Method Precision Recall F1 score


SVM 0.8112 0.6734 0.7359
Boosting 0.7360 0.7587 0.7472 Original Ground truth SVM Boosting Proposed
ConvNets 0.8696 0.9251 0.8965 (a) Scene 1

Original Ground truth SVM Boosting Proposed


(b) Scene 2

Original Ground truth SVM Boosting Proposed


(c) Scene 3

Fig. 5: Probability maps.

window lies partly outside of the image boundary, the miss-


ing pixels are synthesized by mirroring. Fig. 5 shows the
crack detection results for three different scenes. For each
Fig. 3: Detection of crack: test probabilities of the ConvNet scene, each row shows the original image with crack, ground
for being crack. TP denotes true positive. truth, probability maps generated by the SVM and the Boost-
ing methods, and that by the ConvNet. The pixels in green
and in blue denote the crack and the non-crack, respectively,
and a higher brightness means a higher confidence. The SVM
cannot distinguish the crack from the background, and some
of the cracks have be misclassified. Compared to the SVM,
the Boosting method can detect the cracks with a higher ac-
curacy. However, some of the background patches are clas-
sified as cracks, resulting in isolated green parts in Fig. 5. In
contrast to these two methods, the proposed method provides
superior performance in correctly classify crack patches from
background ones.

4. CONCLUSIONS

We proposed an automatic detection method based on deep


convolutional neural networks in which the features are au-
tomatically learned from manually annotated image patches
Fig. 4: Detection of non-crack: test probabilities of the Con- acquired by a low-cost sensor, i.e., smart phone. To the best
vNet for being non-crack. TN denotes true negative. of our knowledge, this is the first study that applies deep-
learning based method to road crack detection problem. In the
es that are only correctly classified by the proposed method future, we will optimize the proposed detection method and
based on ConvNet. These results evidently demonstrate that build an integrated low-cost system for real-time road crack
the discriminative features learned from the ConvNet outper- detection.
form the hand-crafted features in describing complex patch
context.
We further compare the proposed method with the SVM 5. ACKNOWLEDGEMENTS
and the Boosting methods using images of size 300×300. The first author would like to thank Dr. Wangmeng Zuo and
Cracks are detected by the trained ConvNet, SVM and Boost- Dr. Feng Li from Harbin Institute of Technology for their
ing method on a sliding window with step of 1 pixel. If a helpful discussions.


6. REFERENCES [12] D. Ciresan, A. Giusti, L. M Gambardella, and
J. Schmidhuber, “Deep neural networks segment neu-
[1] A. Jahangiri, H.A. Rakha, and T.A. Dingus, “Adopt-
ronal membranes in electron microscopy images,” in
ing machine learning methods to predict red-light run-
Advances in Neural Nnformation Processing Systems,
ning violations,” in Proceedings of IEEE International
2012, pp. 2843–2851.
Conference on Intelligent Transportation Systems, Sept.
2015, pp. 650–655. [13] D. C Cireşan, A. Giusti, L. M. Gambardella, and
J. Schmidhuber, “Mitosis detection in breast cancer his-
[2] A. Jahangiri and H.A. Rakha, “Applying machine learn-
tology images with deep neural networks,” in Medical
ing techniques to transportation mode recognition using
Image Computing and Computer-Assisted Intervention
mobile phone sensor data,” IEEE Transactions on Intel-
(MICCAI), pp. 411–418. 2013.
ligent Transportation Systems, vol. 16, no. 5, pp. 2406–
2417, 2015. [14] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet
classification with deep convolutional neural networks,”
[3] M. Salman, S. Mathavan, K. Kamal, and M. Rahman,
in Advances in Neural Nnformation Processing Systems,
“Pavement crack detection using the gabor filter,” in
2012, pp. 1097–1105.
Proceedings of IEEE International Conference on In-
telligent Transportation Systems, Oct. 2013, pp. 2039– [15] Y. Zhang, K. Sohn, R. Villegas, G. Pan, and H. Lee, “Im-
2044. proving object detection with deep convolutional net-
works via Bayesian optimization and structured predic-
[4] Q. Zou, Y. Cao, Q. Li, Q. Mao, and S. Wang, “Crack-
tion,” in Proceedsing of IEEE Conference on Computer
tree: Automatic crack detection from pavement im-
Vision and Pattern Recognition, 2015, pp. 249–258.
ages,” Pattern Recognition Letters, vol. 33, no. 3, pp.
227–238, 2012. [16] J.J. Kivinen, C. K. Williams, and N. Heess, “Visual
boundary prediction: A deep neural prediction network
[5] H. Oliveira and P.L. Correia, “Crackit-an image process-
and quality dissection,” in Proceedings of Internation-
ing toolbox for crack detection and characterization,” in
al Conference on Artificial Intelligence and Statistics,
Proceedings of IEEE International Conference on Im-
2014, pp. 512–521.
age Processing, Oct. 2014, pp. 798–802.
[17] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,”
[6] S. Mathavan, K. Kamal, and M. Rahman, “A review of
Nature, vol. 521, no. 7553, pp. 436–444, 2015.
three-dimensional imaging technologies for pavement
distress detection and measurements,” IEEE Transac- [18] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner,
tions on Intelligent Transportation Systems, vol. 16, no. “Gradient-based learning applied to document recogni-
5, pp. 2353–2362, 2015. tion,” Proceedings of the IEEE, vol. 86, no. 11, pp.
[7] R. Medina, J. Llamas, E. Zalama, and J. Gomez-Garcia- 2278–2324, 1998.
Bermejo, “Enhanced automatic detection of road sur- [19] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever,
face cracks by combining 2d/3d image processing tech- and R. Salakhutdinov, “Dropout: A simple way to pre-
niques,” in Proceedings of IEEE International Confer- vent neural networks from overfitting,” The Journal of
ence on Image Processing, 2014, pp. 778–782. Machine Learning Research, vol. 15, no. 1, pp. 1929–
[8] Y. Hu and C. Zhao, “A local binary pattern based meth- 1958, 2014.
ods for pavement crack detection,” Journal of Pattern [20] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long,
Recognition Research, vol. 5, no. 1, pp. 140–147, 2010. R. Girshick, S. Guadarrama, and T. Darrell, “Caffe:
[9] H. Oliveira and P. L. Correia, “Automatic road crack Convolutional architecture for fast feature embedding,”
detection and characterization,” IEEE Transactions on arXiv preprint arXiv:1408.5093, 2014.
Intelligent Transportation Systems, vol. 14, no. 1, pp. [21] C. Chang and C. Lin, “Libsvm: A library for support
155–168, 2013. vector machines,” ACM Transactions on Intelligent Sys-
[10] S. Varadharajan, S. Jose, K. Sharma, L. Wander, and tems and Technology, vol. 2, no. 3, pp. 27, 2011.
C. Mertz, “Vision for road inspection,” in Proceed- [22] Y. Freund and R. Schapire, “A short introduction to
ings of 2014 IEEE Winter Conference on Applications boosting,” Journal-Japanese Society For Artificial In-
of Computer Vision, 2014, pp. 115–122. telligence, vol. 14, no. 771-780, pp. 1612, 1999.
[11] H. Roth, L. Lu, J. Liu, J. Yao, A. Seff, C. Kevin, L. Kim, [23] D. Hoiem, A. Efros, and M Hebert, “Geometric context
and R. Summers, “Improving computer-aided detection from a single image,” in Proceedings of International
using convolutional neural networks and random view Conference on Computer Vision, 2005, vol. 1, pp. 654–
aggregation,” IEEE Transactions on Medical Imaging, 661.
2015.



You might also like