Zhang 2016
Zhang 2016
Lei Zhang, Fan Yang, Yimin Daniel Zhang, and Ying Julie Zhu
Department of Electrical and Computer Engineering, Temple University, Philadelphia, PA 19122, USA
crack. To solve this problem, the proposed solution is based 2.2. ConvNet Architecture
on a ConvNet, which is trained on square image patches with
given ground truth information, for the classification of patch- The architecture of the ConvNet is illustrated in Fig. 1, where
es with and without cracks. For notational convenience, crack conv, mp, and fc represent convolutional, max-pooling and
and non-crack patches are also referred to as positive and neg- fully-connected layers, respectively. In general, the ConvNet
ative patches, respectively. In this paper, a patch whose center is considered as a hierarchical feature extractor, which ex-
is itself a crack pixel, or is within the close vicinity of a crack tracts features of different abstract levels and maps raw pixel
pixel, is considered as a positive patch. Otherwise, this patch intensities of the crack patch into a feature vector by several
is considered as a negative patch. fully connected layers. All parameters are jointly optimized
through minimization of the misclassification error over the
2.1. Data preparation training set via the back propagation method [18].
Data set with more than 500 pavement pictures of size 3264 All convolutional filter kernel elements are trained from
× 2448 are collected at the Temple University campus by us- the data in a supervised fashion by learning from the labeled
ing a smart phone as the data sensor. Each image is anno- set of examples introduced in Section 2.1. In each convolu-
tated by multiple annotators. In this study, to achieve a good tional layer, the ConvNet performs max-pooling operations in
compromise between computational cost and accuracy of the order to summarize feature responses across neighboring pix-
detection results [12, 13], each sample is a 3-channel (RGB) els. Such operations allow the ConvNet to learn features that
99×99 pixel image patch generated by the sampling strategy are spatially invariant, i.e., they do not change with respect to
described in the following steps: the location of objects in the images. Finally, fully-connected
layers are used for classification. Due to the mutually ex-
1. A patch whose center is within f = 5 pixels of the clusive property of the underlying crack detection problem
crack centroid is regards as a positive patch; otherwise (crack or non-crack), a softmax layer is used as the last lay-
it is considered as a negative patch. er of the ConvNets to compute the probability of each class
2. To reduce the similarity between training samples, the given an input patch.
overlap of two positive patches P1 and P2 , expressed Given a training set S = {x(i) , y (i) } which contains m
as O = area(P1 ∩ P2 )/area(P1 ∪ P2 ), should be kept image patches, where x(i) is the i-th image patch and y (i) ∈
at a low level. In this study, we choose the distance {0, 1} is the corresponding class label. If y (i) = 1, then x(i)
(i)
between the centers of two adjacent patches to be is a positive patch, otherwise x(i) is a negative patch. Let zj
d=0.75w, where w is the width of a patch. For the (i)
be the output of unit j in the last layer for x . Then, the
negative patches, two adjacent patches should have no probability that the label y (i) of x(i) is j can be calculated by
overlap.
(i)
3. Given a patch center c, each candidate patch is rotat- (i) (i) ezj
p(y = j|zj ) = k , (1)
ed around c by a random angle α ∈ [0◦ , 360◦ ]. This ezl
(i)
l=1
plays an important role to increase the number of crack
samples because crack patches only consist of a small and the corresponding cost function is given by
proportion of the collected images.
⎡ ⎤
1 ⎣ (i)
m k (i)
Out of the generated samples from the above steps, ezj
640,000 samples are used as the training set, 160,000 sam- J =− 1 y = j log k ⎦ (2)
m i=1 j=1 (i)
ezl
ples are used as the validation set for cross-validation when l=1
training the ConvNets, and 200,000 samples are used as the
testing samples. The numbers of crack and non-crack patches where k = 2, m is the total number of the patches, and 1{·}
are set to equal in all three data sets. stands for the indicator function.
2.3. ConvNet Training Table 1: Hand-crafted features of image patches
The goal of training a ConvNet is to increase the variation Feature Descriptions Number
of the training data and to avoid overfitting analogous to the Mean RGB 3
training data set. The dropout method is used between t-
HSV for mean RGB 3
wo fully connected layers to reduce overfitting by preventing
Hue histogram 5
complex co-adaptations on training data [19]. The output of
each neuron is set to zero with a probability of 0.5. Saturation histogram 3
The training of the ConvNet is accelerated by graphic- LBP 59
s processing units (GPUs). Further speed-ups are achieved Texton histogram 20
by using rectified linear units (ReLU) as the activation func-
tion [14], which is more effective than the hyperbolic tangent
functions tanh(x) and the sigmoid function (1+e−x )−1 used
in traditional neuron models, in both training and evaluation
phases. The ConvNets are trained using the stochastic gradi-
ent descent (SGD) method with a batch size of 48 examples,
momentum of 0.9, and weight decay of 0.0005. Less than 20
epochs are needed to reach a minimum on validation set.
Table 2: Performance comparison of different methods
4. CONCLUSIONS
6. REFERENCES [12] D. Ciresan, A. Giusti, L. M Gambardella, and
J. Schmidhuber, “Deep neural networks segment neu-
[1] A. Jahangiri, H.A. Rakha, and T.A. Dingus, “Adopt-
ronal membranes in electron microscopy images,” in
ing machine learning methods to predict red-light run-
Advances in Neural Nnformation Processing Systems,
ning violations,” in Proceedings of IEEE International
2012, pp. 2843–2851.
Conference on Intelligent Transportation Systems, Sept.
2015, pp. 650–655. [13] D. C Cireşan, A. Giusti, L. M. Gambardella, and
J. Schmidhuber, “Mitosis detection in breast cancer his-
[2] A. Jahangiri and H.A. Rakha, “Applying machine learn-
tology images with deep neural networks,” in Medical
ing techniques to transportation mode recognition using
Image Computing and Computer-Assisted Intervention
mobile phone sensor data,” IEEE Transactions on Intel-
(MICCAI), pp. 411–418. 2013.
ligent Transportation Systems, vol. 16, no. 5, pp. 2406–
2417, 2015. [14] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet
classification with deep convolutional neural networks,”
[3] M. Salman, S. Mathavan, K. Kamal, and M. Rahman,
in Advances in Neural Nnformation Processing Systems,
“Pavement crack detection using the gabor filter,” in
2012, pp. 1097–1105.
Proceedings of IEEE International Conference on In-
telligent Transportation Systems, Oct. 2013, pp. 2039– [15] Y. Zhang, K. Sohn, R. Villegas, G. Pan, and H. Lee, “Im-
2044. proving object detection with deep convolutional net-
works via Bayesian optimization and structured predic-
[4] Q. Zou, Y. Cao, Q. Li, Q. Mao, and S. Wang, “Crack-
tion,” in Proceedsing of IEEE Conference on Computer
tree: Automatic crack detection from pavement im-
Vision and Pattern Recognition, 2015, pp. 249–258.
ages,” Pattern Recognition Letters, vol. 33, no. 3, pp.
227–238, 2012. [16] J.J. Kivinen, C. K. Williams, and N. Heess, “Visual
boundary prediction: A deep neural prediction network
[5] H. Oliveira and P.L. Correia, “Crackit-an image process-
and quality dissection,” in Proceedings of Internation-
ing toolbox for crack detection and characterization,” in
al Conference on Artificial Intelligence and Statistics,
Proceedings of IEEE International Conference on Im-
2014, pp. 512–521.
age Processing, Oct. 2014, pp. 798–802.
[17] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,”
[6] S. Mathavan, K. Kamal, and M. Rahman, “A review of
Nature, vol. 521, no. 7553, pp. 436–444, 2015.
three-dimensional imaging technologies for pavement
distress detection and measurements,” IEEE Transac- [18] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner,
tions on Intelligent Transportation Systems, vol. 16, no. “Gradient-based learning applied to document recogni-
5, pp. 2353–2362, 2015. tion,” Proceedings of the IEEE, vol. 86, no. 11, pp.
[7] R. Medina, J. Llamas, E. Zalama, and J. Gomez-Garcia- 2278–2324, 1998.
Bermejo, “Enhanced automatic detection of road sur- [19] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever,
face cracks by combining 2d/3d image processing tech- and R. Salakhutdinov, “Dropout: A simple way to pre-
niques,” in Proceedings of IEEE International Confer- vent neural networks from overfitting,” The Journal of
ence on Image Processing, 2014, pp. 778–782. Machine Learning Research, vol. 15, no. 1, pp. 1929–
[8] Y. Hu and C. Zhao, “A local binary pattern based meth- 1958, 2014.
ods for pavement crack detection,” Journal of Pattern [20] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long,
Recognition Research, vol. 5, no. 1, pp. 140–147, 2010. R. Girshick, S. Guadarrama, and T. Darrell, “Caffe:
[9] H. Oliveira and P. L. Correia, “Automatic road crack Convolutional architecture for fast feature embedding,”
detection and characterization,” IEEE Transactions on arXiv preprint arXiv:1408.5093, 2014.
Intelligent Transportation Systems, vol. 14, no. 1, pp. [21] C. Chang and C. Lin, “Libsvm: A library for support
155–168, 2013. vector machines,” ACM Transactions on Intelligent Sys-
[10] S. Varadharajan, S. Jose, K. Sharma, L. Wander, and tems and Technology, vol. 2, no. 3, pp. 27, 2011.
C. Mertz, “Vision for road inspection,” in Proceed- [22] Y. Freund and R. Schapire, “A short introduction to
ings of 2014 IEEE Winter Conference on Applications boosting,” Journal-Japanese Society For Artificial In-
of Computer Vision, 2014, pp. 115–122. telligence, vol. 14, no. 771-780, pp. 1612, 1999.
[11] H. Roth, L. Lu, J. Liu, J. Yao, A. Seff, C. Kevin, L. Kim, [23] D. Hoiem, A. Efros, and M Hebert, “Geometric context
and R. Summers, “Improving computer-aided detection from a single image,” in Proceedings of International
using convolutional neural networks and random view Conference on Computer Vision, 2005, vol. 1, pp. 654–
aggregation,” IEEE Transactions on Medical Imaging, 661.
2015.