0% found this document useful (0 votes)
13 views

2019multi-Task Enhanced Dam Crack Image Detection Based On Faster R-CNN

Uploaded by

yh3huang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

2019multi-Task Enhanced Dam Crack Image Detection Based On Faster R-CNN

Uploaded by

yh3huang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2019 IEEE 4th International Conference on Image, Vision and Computing

Multi-task Enhanced Dam Crack Image Detection Based on Faster R-CNN

Jianghong Tang Yingchi Mao


College of Computer and Information College of Computer and Information
Hohai University Hohai University
Nanjing, China Nanjing, China
e-mail: [email protected] e-mail: [email protected]

Jing Wang Longbao Wang


College of Computer and Information College of Computer and Information
Hohai University Hohai University
Nanjing, China Nanjing, China
e-mail: [email protected] e-mail: [email protected]

Abstract—To improve the detection accuracy for multiple algorithms, ME-Faster R-CNN has outstanding performance
small targets with Raster R-CNN model, we propose a Multi- in respect of precision in detection of multiple and small
task Enhanced dam crack image detection method based on targets, and improves the efficiency of dam crack detection.
Faster R-CNN (ME-Faster R-CNN) to adapt the detection of It is a target detection framework that can truly realize end-
dam cracks in different lighting environments and lengths. To to-end.
solve the problem of insufficient samples of dam cracks,
transfer learning methods are utilized to assist network
training and data enhancement. In the ME-Faster R-CNN,
ResNet-50 network is firstly adopted to extract features of
original images and obtain the feature map. Then, the features
map is input into multi-task enhanced RPN module to generate
the candidate regions through adopting the appropriate size
and dimension of anchor box. At last, the features map and
candidate regions are processed to detect the dam cracks.
Experimental results demonstrate that ME Faster R-CNN with
transfer learning can obtain 82.52% average IoU and 80.08%
average precision mAP, respectively. Compared with Faster R- (a) Multiple targets. (b) Small targets.
CNN detection method with the same parameters, the average Figure 1. Test results of Faster R-CNN.
IoU and mAP can increase 1.06% and 1.56%, respectively.
The main contributions of this paper are as follows:
Keywords-crack image detection; faster R-CNN; multi-task
detection; dam safety
 ME-Faster R-CNN is proposed to deal with the low
detection precision for multiple small targets and
insufficient samples through multi-task enhanced
I. INTRODUCTION RPN module.
China has built over 98,000 reservoir dams in the world  The appropriate sizes and dimensions of anchor
[1]. The over 20-year operation of dams is approximately boxes are selected to enhance the local search
95%. Under the impact of human activities and concrete capability of RPN.
structures, various defects such as crack, leakage, calcium  The experiments demonstrate that the proposed ME-
compound, and deformation occurred. These defects will Faster R-CNN is superior in detection accuracy and
threaten the dam structural health. Crack is one of major improves the efficiency of dam crack detection.
defects for the dam safety operation. This paper is organized as follows: Section II discusses
Faster R-CNN has the best comprehensive performance the related works; Section III presents the ME-Faster R-CNN
in series of target detection algorithms based on region model structure; Section IV performs a series of comparative
convolutional neural network [2]. However, it presents a low experiments and analyses the results; Section V draws the
detection precision in case of multiple (a) and small targets conclusion.
(b), as shown in Fig. 1. To solve problems above, this paper
puts forward a method of multi-task enhanced dam crack II. RELATED WORK
image detection based on Faster R-CNN to accommodate CNN-based target detection algorithms [3] can be
detection under different light environments for cracks in divided into two categories: target detection algorithms
varying lengths. Compared to traditional target detection based on regional proposal and which based on regression.

978-1-7281-2325-7/19/$31.00 ©2019 IEEE 336


The main algorithms for target detection based on regional generation of candidate regions; and detection processing.
proposal are R-CNN algorithm [4], Fast R-CNN algorithm [5] We present the three steps in Part A, Part B and Part C.
and Faster R-CNN algorithm. In R-CNN model, neural
network was applied to target detection algorithm for the A. Feature Extraction
first time. R-CNN adopts the traditional target detection ResNet-50 [9] is selected as feature extractor of dam
steps. But in the step of extracting features of candidate crack image. ResNet-50 converts the pictures into feature
boxes, traditional feature extraction method is converted into map and sends it to Region Proposal Network. Structure of
convolution. Fast R-CNN combines recognition ResNet-50 network is shown in Table I.
classification of candidate boxes and position regression into
TABLE I. STRUCTURE OF RESNET-50
a network by using the multi-task loss function. It no longer
needs step-by-step training, and does not need a large Layer name Output size 50-layer
amount of memory to store the feature data generated in the Conv1_x 112 112 7  7,64, stride  2
training process. Compared with R-CNN, it improves the 3  3max pool, stride  2
training speed. The biggest difference between Faster R-
 1 1 64 
CNN and Fast R-CNN is that the Region Proposal Networks Conv2_x 56  56 3  3 64   3
(RPN) is proposed by Faster R-CNN. It is used to replace  
 1 1 256 
selective search [6] to suggest the candidate regions, which
greatly improves the speed of generating candidate boxes.  1 1 128 
The main algorithms for target detection based on regression 3  3 128   4
Conv3_x 28  28  
are SSD algorithm [7] and YOLO V2 algorithm [9]. Both  1 1 512 
SSD algorithm and YOLO algorithms do not have the
 1 1 256 
regional suggestion process, and greatly improves the 3  3 256   6
detection speed. However, their accuracy is insufficient to Conv4_x 14 14  
 1 1 1024
meet the requirements of multiple and small targets detection.
In the fourth part of Section IV, we select SSD algorithm,  1 1 512 
YOLO V2 algorithm, Faster R-CNN algorithm and ME- 3  3 512   3
Conv5_x 77  
Faster R-CNN algorithm as target detection network  1 1 2048
respectively for comparison experiments.
1 1 Average pool, 1000_d, Softmax
III. ME-FASTER R-CNN FLOPs 3.8 109
To solve the problem of low accuracy of Faster R-CNN B. Feature Fusion and Generation of Candidate Regions
for multiple and small target detection, a multi-task enhanced
crack image detection model based on Faster R-CNN is In this part, a module of multi-task enhanced RPN is
proposed in our work. ME-Faster R-CNN has been improved proposed to increase search capability of Faster R-CNN.
based on Faster R-CNN model, as shown in Fig. 2. Meanwhile, we improve the size and dimensions of anchor
box to raise the precision in detection recognition. We input
Basic framework Improvements the acquired feature map into multi-task enhanced RPN
module, for which the size and dimensions of anchor box are
improved to enhance the precision in detection recognition,
Data input Image finally generate candidate bounding box.
 Multi-task enhanced RPN module:
Feature
extraction CNN
ResNet-50 deep In Faster R-CNN, there is just one RPN initially. This
residual network
RPN uses the feature map of the last convolutional layer as
input [2]. It is called original RPN, and the structure of
Multi-task enhanced
Feature fusion and RPN model which is shown in Fig. 3(a). The size of input image is
generation of
candidate regions
224*224 pixel, but the receptive field of original RPN in
RPN
network shall be far greater than that. Thus, only a few
Change the size of typical crack features can be obtained. However, there are
anchor box
different sizes and proportions of cracks in image. If the size
Detection
ROI pooling
of detected crack is too big, other crack around the candidate
processing
box may be regarded as noise. In contrast, if the detected size
FC layer of crack is too small, RPN is incapable of generating ROI.
Hence, the original RPN is incapable to detect cracks of
regression SVM
different sizes and proportions.
To solve the above problem, ME-Faster R-CNN proposes
Figure 2. ME-Faster R-CNN Framework a multi-task enhanced RPN module. Its structure is shown in
Fig. 3(b). This method harnesses three RPN modules to
The detection processes of ME-Faster R-CNN is mainly generate ROI based on ResNet-50, so that we can extracts
divided into three steps: feature extraction; feature fusion and feature map in different sizes. In specific, we add the first

337
RPN module behind Conv3_x of ResNet-50. Its receptive specific, candidate regions output by three RPNs are
field is 146*146 , so that it is capable to detect small targets. equipped with suggested scores. The score corresponds to
Then add the second RPN module with a receptive field the possibility of being the target. At a certain location, we
of 229*229 behind Conv4_x of ResNet-50. It is used to select the candidate regions with the highest score. If the IoU
detect big targets. Finally, the last RPN after Conv5_x of between other two candidate regions at corresponding
locations and the selected one is greater than 0.7, they are
ResNet-50 will output the result of the generation process.
considered as the same ROI. Only the array with the highest
Given that each RPN will output separate ROI array,
ROI-Merge Layer is proposed to accept separate ROI array score will be output by ROI-Merge Layer at the
and output only one array to aggregate effective regions. To corresponding locations. After NMS method, the first 100
ROIs with higher value will be selected. Hence, ROI-Merge
avoid repetitive ROI, method of Non-Maxima Suppression
Layer only needs to adjust hyper-parameter to control the
(NMS) [10] is adopted in our work. In candidate regions
quantity of ROI.
output by different RPN, if IoU in corresponding positions is
bigger than 0.7, we consider the two ROI as the same. In
Conv1_x Conv2_x Conv3_x Conv4_x Conv5_x Bbox

ROI Pooling
FC6 FC7 FC8
Layer

Class Score
Input ROI

RPN
619×619
(a) Original RPN module.

Conv1_x Conv2_x Conv3_x Conv4_x Conv5_x Bbox

ROI Pooling
FC6 FC7 FC8
Layer

Class Score
Input ROI
RPN1
146×146 ROI-Merge Layer

RPN2
229×229
RPN3
619×619

(b) Multi-task enhanced RPN module.


Figure 3. Original RPN module &multi-task enhanced RPN module.

 Improving the size and dimensions of anchor box for


C. Detection Processing
RPN module:
Faster R-CNN input the feature map into RPN module At last, feature map and candidate region are sent to the
for features fusion and generation of candidate regions. At ROI pool, FC layers, then a boundary regressor. As the dam
this moment each pixel on the feature map will map anchor crack classifier can only make the judgment for whether
points in different proportions and sizes. Several anchor there is a crack or not, soft max classifier of Faster R-CNN is
boxes in different sizes will be put at each anchor point. changed into SVM classifier in this paper. The ROI pool
Since anchor boxes in different dimensions have unbalanced converts different size inputs to ones in a fixed length. The
search capability, this paper designs a novel anchor box with boundary regressor determines the location of candidate
dimensions of 50*50 , 200*200 , 350*350 and 500*500 . bounding box. SVM classifier determines whether the
Among these dimensions, 50*50 and 200*200 are suitable candidate box is a target or not.
for small crack detection, while 350*350 and 500*500 are IV. EXPERIMENTS AND RESULTS
adopted for large crack detection. Each of these four
dimensions is scaled in length-to-width ratios of 1:1 , 1: 2 , A. Dataset
and 2 :1 . A total of 12 dimensions are obtained as anchor The original data are derived from the measured data of a
boxes to be evaluated by RPN. These anchor boxes are super-high arch dam. Since only a few dam crack images are
selected in fixed sequence during prediction. RPN aims to available, training network with small samples would lead to
predict whether the target exists or not. In case IoU between overfitting. Therefore, this paper introduces transfer learning
an anchor box and a certain ground truth box is greater than [11] into the dam crack object detection to achieve the aim of
0.7, the box is considered as a candidate, otherwise it is not. constructing a good model under small dataset. We collect
the labeled crack images from daily dam monitoring and

338
Google Images to construct an auxiliary dataset. The dataset than that of ResNet-101, it can effectively reduce the number
contains 8,135 images, covering cracks of 3 fields, concrete of network weights parameters and accelerate the model
wall crack (a), bridge crack (b) and dam crack (c), as shown training. Experimental results show that ResNet networks
in Fig. 4. can be used to extract deeper features of images. It is
beneficial to improve the recognition accuracy of crack
images.
TABLE II. ACCURACY OF DIFFERENT BASELINE NETWORK MODELS

# Baseline Network Models mAP(%)


(a) Concrete wall crack. 1 ZF-Net 66.51
2 VGG-16 71.9
3 ResNet-50 79.2
4 ResNet-101 80.6

(b) Bridge crack.

Original Picture Faster R-CNN ME-Faster R-CNN


(a) First set.
(c) Dam crack.
Figure 4. Crack images in four domains.

B. Results and Analysis


 Visual comparative analysis
This part randomly displays three groups of results that
faster R-CNN and ME-Faster R-CNN applied to the crack Original Picture Faster R-CNN ME-Faster R-CNN
(b) Second set.
image for target detection.
As is shown in Fig. 5(a), 5(b), 5(c), these three images
compare the detection result of ME-Faster R-CNN with the
Faster R-CNN. Take Fig. 5(a) as an example. It can be seen
from the original crack image that there is an obvious crack
in the center, and, underneath, there is a short crack. The
result of Faster R-CNN shows that it can detect the obvious
one but fails to detect the other accurately. From the result Original Picture Faster R-CNN ME-Faster R-CNN
of ME-Faster R-CNN, we can see that it not only improves (c) Third set.
the IoU, but also finds out both the long and short cracks Figure 5. Visual contrast analysis.
more accurately. In conclusion, the ME-Faster R-CNN can
not only improve the detection precision, but also obtain  Analysis of different target detections algorithms
good results when dealing with small targets and multi- ResNet-50 network is selected as the baseline network
target under the same experimental conditions. and select SSD algorithm, YOLO V2 algorithm, Faster R-
 Comparative analysis of different baseline networks CNN algorithm and ME-Faster R-CNN algorithm as target
This part adopts ME-Faster R-CNN as target detection detection network respectively.
network, and ZF Network, VGG-16 Network, ResNet-101 It is necessary to analyze the cracks if we can detect the
Network and ResNet-50 Network as baseline network crack location Precisely. Mean Average Precision (mAP)
respectively. The output of ZF Network is 6*6 , which the [12] and Intersection Over Union (IoU) [13] are selected as
other three baseline network are 7 *7 . Correspondingly, we evaluation criteria in our experiments. Table III shows the
should change the output size of ROI-pooling layer to 6*6 average IoUs, recalls, precisions and mAPs obtained by
and 7 *7 respectively. different target detection algorithms. It can be seen that the
Table II shows that mAP of ZF-Net and VGG-16 can average IoU of ME-Faster R-CNN is the highest, indicating
reach 66.51% and 71.9% respectively after the training with that ME-Faster R-CNN algorithm is better in the precision
the same dataset. ResNet can reach higher than 79%. The of crack location detection. Its mAP is also the highest,
detection precision of the network is improved with ResNet. reaching 80.08%, indicating that the comprehensive
While the detection precision of ResNet-50 is slightly lower performance of ME-Faster R-CNN is superior.

339
TABLE III. COMPARISON OF DIFFERENT TARGET DETECTION ALGORITHMS

Target Detection Algorithms Average IoU (%) Recall (%) Precision (%) mAP (%)
SSD Algorithms 81.11 68.25 79.63 66.64
YOLO V2 Algorithms 73.96 69.39 79.07 76.83
Faster R-CNN Algorithms 81.21 67.06 82.45 78.52
ME-Faster R-CNN Algorithms 82.52 65.63 83.51 80.08

In addition, the size of crack also influences precision. HNKJ13_H17_04; the Fundamental Research Funds for the
Therefore, we divide the dam crack images into three groups Central Universities under Grant No. 2017B20914. The
according to their sizes. The first group contains 100 samples, authors are grateful to the reviewers for their comments
whose sizes are within the range of [0,50]. The second group which greatly improved the quality of the paper.
contains 100 samples, and the sizes are within the range of
[50,200]. The third group contains 100 samples, and the REFERENCES
sizes exceed 200 pixels. The recognition precision of each [1] South of Jiangsu. China’s 200 meter high dam is dense, and the safety
target detection algorithm on cracks of different sizes is risk should not be ignored [EB/OL]. Available:
shown in Fig. 6. http://www.thepaper.cn/newsDetail_forward_1858088
As shown in Fig. 6, the overall precision of Faster R- [2] S Q Ren, K M He, R Girshick, and J Sun. “Faster R-CNN: Towards
real-time object detection with region proposal networks,”
CNN is better than SSD and YOLO V2. All the algorithms International Conference on Neural Information Processing Systems,
perform well in detecting large cracks, but poor, especially vol. 1, Jun. 2015, pp. 91-99, doi: 10.1109/TPAMI.2016.2577031.
SSD and YOLO V2, in detecting small ones. Faster R-CNN [3] D Gerber, S Meier, and W Kellermann. “Efficient target activity
is comparable with ME-Faster R-CNN in large crack detection based on recurrent neural networks,” Hands-free Speech
detection. While in the detection of small cracks, ME-Faster Communications & Microphone Arrays, Dec. 2016, doi:
R-CNN is more accurate than Faster R-CNN. In conclusion, 10.1109/HSCMA.2017.7895559.
ME-Faster R-CNN not only maintains a good precision in [4] R Girshick , J Donahue , T Darrell , and J Malik. “Rich Feature
Hierarchies for Accurate Object Detection and Semantic
the detection of large targets, but also obtains good results in Segmentation,” Proceedings of the IEEE conference on computer
ones of small targets which are more difficult to detect. vision and pattern recognition, vol. 1, Jun. 2014, pp. 580-587, doi:
mAP(small) Overall mAP 10.1109/CVPR.2014.81.
100
mAP(medium) mAP(large) [5] R Girshick. “Fast r-cnn,” Proceedings of the IEEE international
conference on computer vision, vol. 1, Apr. 2015, pp. 1440-1448, doi:
80
10.1109/CVPR.2014.81.
mAP(%)

[6] J R R Uijlings, K E A van de Sande, T Gevers, and A W M


60 Smeulders. “Selective search for object recognition,” International
Journal of Computer Vision, Mar 2013, vol. 104, pp. 154-171, doi:
40 DOI: 10.1007/s11263-013-0620-5.
[7] W Liu, D Anguelov, D Erhan, et al. “SSD: Single shot multibox
20 detector,” European Conference on Computer Vision, vol. 9905, Oct.
SSD YOLO V2 Faster R-CNN ME-Faster R- 2016, pp. 21-37, doi: 10.1007/978-3-319-46448-0_2.
CNN
[8] J Redmon, and A Farhadi. “YOLO9000: Better, Faster, Stronger,”
Figure 6. Accuracy of crack images with different sizes. Proceedings of the IEEE conference on computer vision and pattern
recognition, Nov. 2017, pp. 7263-7271, doi: 10.1109/CVPR.2017.690.
[9] K M He, X Y Zhang, S Q Ren, and J Sun. “Deep residual learning for
V. CONCLUSION image recognition,” Proceedings of the IEEE conference on computer
vision and pattern recognition, vol. 1, Jun. 2016, pp. 770-778, doi:
In this paper, ME-Faster R-CNN is proposed to solve the 10.1109/CVPR.2016.90.
problem of low detection precision of Faster R-CNN when [10] N Alexander, and V G Luc. “Efficient non-maximum suppression,”
dealing with multiple and small targets. ME-Faster R-CNN 18th International Conference on Pattern Recognition, vol. 3, Aug.
inputs image to ResNet-50 network to extract features, 2006, pp. 850-855, doi: 10.1109/ICPR.2006.479.
increases search capability of Faster R-CNN by using multi- [11] F Z Zhuang, P Luo, Q He, and Z Z Shi. “Research progress of
task enhanced RPN module and improving the size of anchor transfer learning,” Journal of Software, vol. 26, Mar. 2015, pp. 26-39,
box. The experimental results show that ME-Faster R-CNN doi: 10.13328/j.cnki.jos.004631.
has better precision in multiple and small targets detection, [12] S Akcay, M E Kundegorski, C G Willcocks, and T Breckon. “Using
deep convolutional neural network architectures for object
and improves the efficiency of dam crack detection. classification and detection within x-ray baggage security imagery,”
IEEE Transactions on Information Forensics and Security, vol. 13,
ACKNOWLEDGMENT Sep. 2018, pp. 2203-2215, doi: 10.1109/TIFS.2018.2812196.
This study was supported by the National Key [13] M A Rahman, and W Yang. “Optimizing intersection-over-union in
Technology Research and Development Program of the deep neural networks for image segmentation,” International
Symposium on Visual Computing, Dec. 2016, pp. 234-244, doi:
Ministry of Science and Technology of China under Grant 10.1007/978-3-319-50835-1_22.
No. 2018YFC0407105, 2016YFC0400910, Key Technology
Project of China Hueneng Group under Grant No.

340

You might also like