2019multi-Task Enhanced Dam Crack Image Detection Based On Faster R-CNN
2019multi-Task Enhanced Dam Crack Image Detection Based On Faster R-CNN
Abstract—To improve the detection accuracy for multiple algorithms, ME-Faster R-CNN has outstanding performance
small targets with Raster R-CNN model, we propose a Multi- in respect of precision in detection of multiple and small
task Enhanced dam crack image detection method based on targets, and improves the efficiency of dam crack detection.
Faster R-CNN (ME-Faster R-CNN) to adapt the detection of It is a target detection framework that can truly realize end-
dam cracks in different lighting environments and lengths. To to-end.
solve the problem of insufficient samples of dam cracks,
transfer learning methods are utilized to assist network
training and data enhancement. In the ME-Faster R-CNN,
ResNet-50 network is firstly adopted to extract features of
original images and obtain the feature map. Then, the features
map is input into multi-task enhanced RPN module to generate
the candidate regions through adopting the appropriate size
and dimension of anchor box. At last, the features map and
candidate regions are processed to detect the dam cracks.
Experimental results demonstrate that ME Faster R-CNN with
transfer learning can obtain 82.52% average IoU and 80.08%
average precision mAP, respectively. Compared with Faster R- (a) Multiple targets. (b) Small targets.
CNN detection method with the same parameters, the average Figure 1. Test results of Faster R-CNN.
IoU and mAP can increase 1.06% and 1.56%, respectively.
The main contributions of this paper are as follows:
Keywords-crack image detection; faster R-CNN; multi-task
detection; dam safety
ME-Faster R-CNN is proposed to deal with the low
detection precision for multiple small targets and
insufficient samples through multi-task enhanced
I. INTRODUCTION RPN module.
China has built over 98,000 reservoir dams in the world The appropriate sizes and dimensions of anchor
[1]. The over 20-year operation of dams is approximately boxes are selected to enhance the local search
95%. Under the impact of human activities and concrete capability of RPN.
structures, various defects such as crack, leakage, calcium The experiments demonstrate that the proposed ME-
compound, and deformation occurred. These defects will Faster R-CNN is superior in detection accuracy and
threaten the dam structural health. Crack is one of major improves the efficiency of dam crack detection.
defects for the dam safety operation. This paper is organized as follows: Section II discusses
Faster R-CNN has the best comprehensive performance the related works; Section III presents the ME-Faster R-CNN
in series of target detection algorithms based on region model structure; Section IV performs a series of comparative
convolutional neural network [2]. However, it presents a low experiments and analyses the results; Section V draws the
detection precision in case of multiple (a) and small targets conclusion.
(b), as shown in Fig. 1. To solve problems above, this paper
puts forward a method of multi-task enhanced dam crack II. RELATED WORK
image detection based on Faster R-CNN to accommodate CNN-based target detection algorithms [3] can be
detection under different light environments for cracks in divided into two categories: target detection algorithms
varying lengths. Compared to traditional target detection based on regional proposal and which based on regression.
337
RPN module behind Conv3_x of ResNet-50. Its receptive specific, candidate regions output by three RPNs are
field is 146*146 , so that it is capable to detect small targets. equipped with suggested scores. The score corresponds to
Then add the second RPN module with a receptive field the possibility of being the target. At a certain location, we
of 229*229 behind Conv4_x of ResNet-50. It is used to select the candidate regions with the highest score. If the IoU
detect big targets. Finally, the last RPN after Conv5_x of between other two candidate regions at corresponding
locations and the selected one is greater than 0.7, they are
ResNet-50 will output the result of the generation process.
considered as the same ROI. Only the array with the highest
Given that each RPN will output separate ROI array,
ROI-Merge Layer is proposed to accept separate ROI array score will be output by ROI-Merge Layer at the
and output only one array to aggregate effective regions. To corresponding locations. After NMS method, the first 100
ROIs with higher value will be selected. Hence, ROI-Merge
avoid repetitive ROI, method of Non-Maxima Suppression
Layer only needs to adjust hyper-parameter to control the
(NMS) [10] is adopted in our work. In candidate regions
quantity of ROI.
output by different RPN, if IoU in corresponding positions is
bigger than 0.7, we consider the two ROI as the same. In
Conv1_x Conv2_x Conv3_x Conv4_x Conv5_x Bbox
ROI Pooling
FC6 FC7 FC8
Layer
Class Score
Input ROI
RPN
619×619
(a) Original RPN module.
ROI Pooling
FC6 FC7 FC8
Layer
Class Score
Input ROI
RPN1
146×146 ROI-Merge Layer
RPN2
229×229
RPN3
619×619
338
Google Images to construct an auxiliary dataset. The dataset than that of ResNet-101, it can effectively reduce the number
contains 8,135 images, covering cracks of 3 fields, concrete of network weights parameters and accelerate the model
wall crack (a), bridge crack (b) and dam crack (c), as shown training. Experimental results show that ResNet networks
in Fig. 4. can be used to extract deeper features of images. It is
beneficial to improve the recognition accuracy of crack
images.
TABLE II. ACCURACY OF DIFFERENT BASELINE NETWORK MODELS
339
TABLE III. COMPARISON OF DIFFERENT TARGET DETECTION ALGORITHMS
Target Detection Algorithms Average IoU (%) Recall (%) Precision (%) mAP (%)
SSD Algorithms 81.11 68.25 79.63 66.64
YOLO V2 Algorithms 73.96 69.39 79.07 76.83
Faster R-CNN Algorithms 81.21 67.06 82.45 78.52
ME-Faster R-CNN Algorithms 82.52 65.63 83.51 80.08
In addition, the size of crack also influences precision. HNKJ13_H17_04; the Fundamental Research Funds for the
Therefore, we divide the dam crack images into three groups Central Universities under Grant No. 2017B20914. The
according to their sizes. The first group contains 100 samples, authors are grateful to the reviewers for their comments
whose sizes are within the range of [0,50]. The second group which greatly improved the quality of the paper.
contains 100 samples, and the sizes are within the range of
[50,200]. The third group contains 100 samples, and the REFERENCES
sizes exceed 200 pixels. The recognition precision of each [1] South of Jiangsu. China’s 200 meter high dam is dense, and the safety
target detection algorithm on cracks of different sizes is risk should not be ignored [EB/OL]. Available:
shown in Fig. 6. http://www.thepaper.cn/newsDetail_forward_1858088
As shown in Fig. 6, the overall precision of Faster R- [2] S Q Ren, K M He, R Girshick, and J Sun. “Faster R-CNN: Towards
real-time object detection with region proposal networks,”
CNN is better than SSD and YOLO V2. All the algorithms International Conference on Neural Information Processing Systems,
perform well in detecting large cracks, but poor, especially vol. 1, Jun. 2015, pp. 91-99, doi: 10.1109/TPAMI.2016.2577031.
SSD and YOLO V2, in detecting small ones. Faster R-CNN [3] D Gerber, S Meier, and W Kellermann. “Efficient target activity
is comparable with ME-Faster R-CNN in large crack detection based on recurrent neural networks,” Hands-free Speech
detection. While in the detection of small cracks, ME-Faster Communications & Microphone Arrays, Dec. 2016, doi:
R-CNN is more accurate than Faster R-CNN. In conclusion, 10.1109/HSCMA.2017.7895559.
ME-Faster R-CNN not only maintains a good precision in [4] R Girshick , J Donahue , T Darrell , and J Malik. “Rich Feature
Hierarchies for Accurate Object Detection and Semantic
the detection of large targets, but also obtains good results in Segmentation,” Proceedings of the IEEE conference on computer
ones of small targets which are more difficult to detect. vision and pattern recognition, vol. 1, Jun. 2014, pp. 580-587, doi:
mAP(small) Overall mAP 10.1109/CVPR.2014.81.
100
mAP(medium) mAP(large) [5] R Girshick. “Fast r-cnn,” Proceedings of the IEEE international
conference on computer vision, vol. 1, Apr. 2015, pp. 1440-1448, doi:
80
10.1109/CVPR.2014.81.
mAP(%)
340