Vehicle Damage Detection Segmentation Algorithm Based On Improved Mask
Vehicle Damage Detection Segmentation Algorithm Based On Improved Mask
Received December 17, 2019, accepted December 29, 2019, date of publication January 6, 2020, date of current version January 14, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.2964055
Vehicle-Damage-Detection Segmentation
Algorithm Based on Improved Mask RCNN
QINGHUI ZHANG 1,2 , XIANING CHANG 2, AND SHANFENG BIAN 2
1 Key Laboratory of Grain Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, China
2 College of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001, China
ABSTRACT Traffic congestion due to vehicular accidents seriously affects normal travel, and accurate
and effective mitigating measures and methods must be studied. To resolve traffic accident compensation
problems quickly, a vehicle-damage-detection segmentation algorithm based on transfer learning and an
improved mask regional convolutional neural network (Mask RCNN) is proposed in this paper. The
experiment first collects car damage pictures for preprocessing and uses Labelme to make data set labels,
which are divided into training sets and test sets. The residual network (ResNet) is optimized, and feature
extraction is performed in combination with Feature Pyramid Network (FPN). Then, the proportion and
threshold of the Anchor in the region proposal network (RPN) are adjusted. The spatial information of
the feature map is preserved by bilinear interpolation in ROIAlign, and different weights are introduced
in the loss function for different-scale targets. Finally, the results of self-made dedicated dataset training and
testing show that the improved Mask RCNN has better Average Precision (AP) value, detection accuracy
and masking accuracy, and improves the efficiency of solving traffic accident compensation problems.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
VOLUME 8, 2020 6997
Q. Zhang et al.: Vehicle-Damage-Detection Segmentation Algorithm Based on Improved Mask RCNN
FIGURE 4. ROI generated by the original RPN and the improved RPN.
and then the RPN can extract the RoI features from different
levels of the feature pyramid according to the size of the target
object. Thereby, the simple network structure changes, with-
out substantially increasing the calculation amount, greatly
improving the detection performance of small objects and
achieving excellent improvement in accuracy and speed.
RPN is equivalent to a sliding-window-based classless tar-
get detector. It is based on the structure of a convolutional
neural network. The sliding frame scan produces anchor
frame anchors. A suggested area can generate a large num-
ber of anchors of different sizes and aspect ratios, and they
FIGURE 3. ResNet structure and improved ResNetV2 structure.
overlap to cover as many images as possible; the size of
Because the size of the car damage in the images will be the suggested area and the desired area overlap (IOU) will
different, only a single convolutional neural network cannot directly affect the classification effect. To be able to adapt
extract all the image attributes well. Therefore, the backbone to more damaged car areas, the algorithm adjusts the scaling
structure of ResNet50 and a FPN feature pyramid network scale of the ‘‘anchor point’’ to {32 × 32, 64 × 64, 128 × 128,
is used in this paper. FPN [21] uses a top-down hierarchy 256 × 256, 512 × 512}, and the aspect ratio of the anchor
with lateral connections, from single-scale input to build- point is changed to {1:2,1:1,3:1}, as shown in Figure 4. The
ing a network feature pyramid, which solves the multi-scale so-called IoU is the coverage of the predicted box and the
problem of extracting target objects in images. This structure real box, the value of which is equal to the intersection of
has strong robustness and adaptability, and requires fewer the two boxes divided by the union of the two boxes. In
parameters. this paper, the value of IoU is set to 0.8; that is, when the
To further improve the detection accuracy, the backbone overlap ratio of the area corresponding to the anchor frame
network structure is improved and the order of each layer is and the real target area is greater than 0.8, it is the foreground;
adjusted, as shown in Figure 3. The right-hand part of the when the overlap rate is less than 0.2, it is the background;
diagram in each figure is called the ‘‘residual’’ branch and between the two values, it is discarded. This reduces the
the left-hand part the ‘‘identity’’ branch. The value of the amount of computation underlying the model, saves time,
‘‘identity’’ branch cannot be easily changed. Keep the input and the improved RPN produces less ROI, which, in turn,
and output consistent, otherwise it will affect the information increases the efficiency of the model.
transmission and hinder the loss of loss. Adjust the order of D. RoIAlign MODEL
the layers on the ‘‘residual’’ branch, the improved ResNet In the Mask RCNN network structure, the mask branch must
structure has two advantages. First, back-propagation basi- determine whether a given pixel is part of the target, and
cally meets the requirements, and information transmission the accuracy must be at the pixel level. After the original
is unimpeded. Second, the BN layer acts as a pre-activation, image is heavily convolved and pooled, the size of the image
and the concept of ‘‘pre’’ is relative to the weight (conv) layer. has changed. When the pixel-level segmentation is directly
This can enhance the regularization of the model, and the performed, the image target object cannot be accurately posi-
generalization performance is better. tioned, so the Mask RCNN is improved on the basis of
Faster RCNN, and the Rol Pooling layer is changed into
C. RPN MODEL IMPROVEMENT the interest-region alignment layer (RoIAlign). The bi-linear
In this paper, the Feature Pyramid Networks structure is interpolation [23] method preserves the spatial information
adopted, and the images are made into different sizes to gen- on the feature map, which largely solves the error caused by
erate features corresponding to different sizes. The shallow the two quantizations of the feature map in the RoI Pooling
features can distinguish simple large targets and the deep layer, and solves the problem of regional mismatch of the
features can distinguish small targets. The different-size fea- image object. Pixel-level detection segmentation can thus be
ture maps generated by the FPN are input into the RPN [22], achieved.
datasets, and then the trained weight files are migrated to the
dedicated datasets collected in this article for training, fine-
tuning the network parameters. This allows the convolutional
neural networks to achieve good results on small datasets,
thereby alleviating the problem of insufficient data sources.
Ideally, a comparison of successful Transfer Learning with
Starting From Scratch is shown in Figure 6.
It can be seen that using migration learning can bring three
advantages. First, the initial performance of the model is
higher. Second, the rate of performance improvement of the
model is greater during the training process. Third, the final TABLE 2. Experimental part parameter table.
performance of the trained model is better.
B. BUILDING A DATASET
The main research object of this paper is a picture of a vehicle
that is scratched. The experiment collected 2,000 damaged
vehicle images (1600 training sets and 400 test sets) from
online downloads and daily photographs.
The main steps in acquiring a dedicated dataset for detect-
ing vehicle scratches in complex environments include the
following two parts.
1) Image collection: Images of damaged vehicles at dif-
ferent angles and of different sizes in different scenes
are photographed and downloaded from the Internet. D. PARAMETER SETTINGS
Because the downloaded images vary in size, and the See Table 2.
sample of Mask RCNN must be normalized to a uni-
form size, a script is used to normalize the images to E. EVALUATION INDEX
1024 × 1024 pixels, and the insufficient portions are
The evaluation index of the experimental results consists of
filled with 0.
two aspects: detection performance and segmentation perfor-
2) Image processing: The captured images are marked
mance. In this experiment, the P-R curve and the AP value
using the marking tool Labelme and divided into train-
were used to evaluate the performance of the target detec-
ing sets and test sets. The specific steps in this process
tion, and the mean intersection over union (MIoU) and the
are the following.
running speed were used to evaluate the image segmentation
First, a folder ‘‘datasets’’ is created, and then two subfolders, performance.
‘‘train’’ and ‘‘val’’, are created for storing training samples
and test samples. The images in each folder correspond to TP TP
P= , R= (5)
a.json annotation information file with the same name. The TP + FP TP + FN
labeling interface is shown in Figure 7. where TP is the correct number of samples correctly
classified. FP is the number of negative samples of a positive
C. EXPERIMENTAL ENVIRONMENT sample that is incorrectly marked. FN is the number of posi-
See Table 1. tive samples that are incorrectly marked as negative samples.
carried out to increase the size of the dataset, collect more [21] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie,
car damage images under different weather conditions and ‘‘Feature pyramid networks for object detection,’’ in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 320–329.
different levels of illumination, enhance the data, improve the [22] X. Zhang, J. Zou, X. Ming, K. He, and J. Sun, ‘‘Efficient and accurate
edge-contour enhancement of images, and make the masking approximations of nonlinear convolutional networks,’’ in Proc. IEEE Conf.
of the damaged areas of the car more accurate. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 1984–1992.
[23] S. Wang and K. Yang, ‘‘An image scaling algorithm based on bilinear inter-
polation with VC++,’’ in Proc. Techn. Autom. Appl., 2008, pp. 168–176.
REFERENCES [24] A. Mathew, J. Mathew, M. Govind, and A. Mooppan, ‘‘An improved
transfer learning approach for intrusion detection,’’ Procedia Comput. Sci.,
[1] R. Girshick, J. Donahue, T. Darrell, and J. Malik, ‘‘Rich feature hierar- vol. 115, pp. 251–257, Jan. 2017.
chies for accurate object detection and semantic segmentation,’’ in Proc. [25] G. Han, J. Su, and C. Zhang, ‘‘A method based on multi-convolution layers
IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014, vol. 13, no. 1, joint and generative adversarial networks for vehicle detection,’’ in Proc.
pp. 580–587. KSII Trans. Internet Inf. Syst., 2019, pp. 1795–1811.
[2] R. Girshick, ‘‘Fast R-CNN,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), [26] Y. Yu, K. Zhang, L. Yang, and D. Zhang, ‘‘Fruit detection for strawberry
Dec. 2015. pp. 1440–1448. harvesting robot in non-structural environment based on mask-RCNN,’’
[3] S. Ren, K. He, R. Girshick, and J. Sun, ‘‘Faster R-CNN: Towards real- Comput. Electron. Agricult., vol. 163, Aug. 2019, Art. no. 104846.
time object detection with region proposal networks,’’ IEEE Trans. Pat- [27] Y. Liu, P. Zhang, Q. Song, A. Li, P. Zhang, and Z. Gui, ‘‘Automatic
tern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017, segmentation of cervical nuclei based on deep learning and a conditional
doi: 10.1109/tpami.2016.2577031. random field,’’ IEEE Access, vol. 6, pp. 53709–53721, 2018.
[4] W. Liu, D. Anguelov, and D. Erhan, ‘‘SSD: Single shot multi-
box detector,’’ in Proc. IEEE Eur. Conf. Comput. Vision, Jun. 2016,
pp. 21–37.
[5] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for image
recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
Jun. 2016, pp. 770–778.
[6] K. He, G. Gkioxari, P. Dollar, and R. Girshick, ‘‘Mask R-
CNN,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, QINGHUI ZHANG received the B.S.E. degree
pp. 2980–2988. from the College of Fire Control, Zhengzhou Insti-
[7] N. Kumar and R. Verma, ‘‘A multi-organ nucleus segmentation chal- tute of Anti-Aircraft, in 1996, the M.E. degree
lenge,’’ IEEE Trans. Med. Imag., vol. 11, no. 1, pp. 34–39, Oct. 2019, in navigation guidance and control from the Ord-
doi: 10.1109/TMI.2019.2947628. nance Engineering College, Shijiazhuang, in 2003,
[8] A. K. Jaiswal, P. Tiwari, S. Kumar, D. Gupta, A. Khanna, and and the Ph.D. degree from the Beijing Institute of
J. J. Rodrigues, ‘‘Identifying pneumonia in chest X-rays: A deep learning
Technology, Beijing, China, in 2006. He is cur-
approach,’’ Measurement, vol. 145, pp. 511–518, Oct. 2019, doi: 10.1016/
rently a Professor with the College of Informa-
j.measurement.2019.05.076.
tion Science and Engineering, Henan University
[9] P. Pinheiro and R. Collobert, ‘‘Learning to segment object candidates,’’ in
Proc. Adv. Neural Inf. Process. Syst., 2015, pp. 1990–1998.
of Technology. His research interests include arti-
ficial intelligence information processing and embedded systems.
[10] W. Tang, H.-L. Liu, L. Chen, K. C. Tan, and Y.-M. Cheung, ‘‘Fast hyper-
volume approximation scheme based on a segmentation strategy,’’ Inf. Sci.,
vol. 509, pp. 320–342, Jan. 2020, doi: 10.1016/j.ins.2019.02.054.
[11] Y. Li, H. Qi, J. Dai, X. Ji, and Y. Wei, ‘‘Fully convolutional instance-
aware semantic segmentation,’’ in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit. (CVPR), Jul. 2017, pp. 4438–4446.
[12] X. Rong, C. Yi, and Y. Tian, ‘‘Unambiguous scene text segmentation
with referring expression comprehension,’’ IEEE Trans. Image Process.,
XIANING CHANG received the B.S.E. degree
vol. 29, pp. 591–601, Jul. 2019, doi: 10.1109/tip.2019.2930176.
from the College of Science, Henan Agricultural
[13] Y. L. Qiao, M. Truman, and S. Sukkarieh, ‘‘Cattle segmentation and
University, in 2018. She is currently pursuing
contour extraction based on mask R-CNN for precision livestock farming,’’
Comput. Electron. Agricult., vol. 165, Oct. 2019, Art. no. 104958, doi: 10.
the master’s degree in computer technology with
1016/j.compag.2019.104958. the College of Information Science and Engi-
[14] C. Shuhong, Z. Shijun, and Z. Dianfan, ‘‘Water quality monitoring neering, Henan University of Technology, China.
method based on feedback self correcting dense connected convolu- Her research interests include artificial intelli-
tion network,’’ Neurocomputing, vol. 349, pp. 301–313, Jul. 2019, gence information processing, road scene target
doi: 10.1016/j.neucom.2019.03.023. detection, and deep learning.
[15] J. Yang, L. Ji, X. Geng, X. Yang, and Y. Zhao, ‘‘Building detection in high
spatial resolution remote sensing imagery with the U-rotation detection
network,’’ Int. J. Remote Sens., vol. 40, no. 15, pp. 6036–6058, Aug. 2019,
doi: 10.1080/01431161.2019.1587200.
[16] E. K. Wang, X. Zhang, L. Pan, C. Cheng, A. Dimitrakopoulou-Strauss,
Y. Li, and N. Zhe, ‘‘Multi-path dilated residual network for nuclei
segmentation and detection,’’ Cells, vol. 8, no. 5, p. 499, May 2019,
doi: 10.3390/cells8050499.
[17] X. Lin, S. Zhu, and J. Zhang, ‘‘Rice planthopper image classification SHANFENG BIAN received the B.S.E. degree
method based on transfer learning and mask R-CNN,’’ Trans. Chin. Soc. from the College of Science, Huanghuai Univer-
Agricult. Mach., vol. 13, no. 4, pp. 181–184, Dec. 2019. sity, in 2017. He is currently pursuing the master’s
[18] G. Wang and S. Liang, ‘‘Ship object detection based on mask RCNN,’’ in degree in signal and information processing with
Proc. Radio Eng., 2018, pp. 947–952. the College of Information Science and Engineer-
[19] J. Shi, Y. Zhou, and Q. Zhang, ‘‘Service robot item recognition system ing, Henan University of Technology, China. His
based on improved mask RCNN and Kinect,’’ in Proc. Appl. Res. Comput., research interests include intelligent information
Jun. 2019, pp. 1–9. processing and embedded systems, vehicle detec-
[20] J. Li and W. He, ‘‘Building target detection algorithm based on mask tion, and deep learning.
RCNN,’’ in Proc. Sci. Surv. Mapping, Apr. 2019, pp. 1–13.