Vehicle-Damage-Detection Segmentation Algorithm Based On Improved Mask RCNN
Vehicle-Damage-Detection Segmentation Algorithm Based On Improved Mask RCNN
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2964055, IEEE Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.Doi Number
Vehicle-Damage-Detection Segmentation
Algorithm Based on Improved Mask RCNN
Qinghui Zhang1,2, Xianing Chang2, and Shanfeng Bian2
1
Key Laboratory of Grain Information Processing and Control (Henan University of Technology), Ministry of Education, Zhengzhou 450001,China
2
College of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001,China
ABSTRACT Traffic congestion due to vehicular accidents seriously affects normal travel, and accurate
and effective mitigating measures and methods must be studied. To resolve traffic accident compensation
problems quickly, a vehicle-damage-detection segmentation algorithm based on transfer learning and an
improved mask regional convolutional neural network (Mask RCNN) is proposed in this paper. The
experiment first collects car damage pictures for preprocessing and uses Labelme to make data set labels,
which are divided into training sets and test sets. The residual network (ResNet) is optimized, and feature
extraction is performed in combination with Feature Pyramid Network (FPN). Then, the proportion and
threshold of the Anchor in the region proposal network (RPN) are adjusted. The spatial information of the
feature map is preserved by bilinear interpolation in ROIAlign, and different weights are introduced in the
loss function for different-scale targets. Finally, the results of self-made dedicated dataset training and
testing show that the improved Mask RCNN has better Average Precision (AP) value, detection accuracy
and masking accuracy, and improves the efficiency of solving traffic accident compensation problems.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2964055, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)
widely used in agriculture[14], construction[15] , divided into a training set and a test set. The data are sent to
Medical image segmentation[16] and other fields. Lin et the Mask RCNN for feature extraction and classification
al. [17] used Mask RCNN to classify rice planthoppers, and prediction and segmentation masking, and the
realized the effective and rapid identification of rice car-damage-detection result is output.
planthoppers and non-rice planthoppers, achieving an
average recognition accuracy of 0.923. Wang et al. [18] A. Mask RCNN algorithm
used Mask RCNN to ship-target detection, which shows Mask RCNN is an instance segmentation framework
that Mask RCNN has better performance in solving the extended by Faster RCNN. It is divided into two stages: the
problem of closely aligned targets and multi-scale targets. first stage scans the image and generates the proposal, and
Shi et al. [19] used Mask RCNN to the existing the second classifies the proposal and generates the bounding
home-service-robot platform to obtain category information, box and mask. The network structure block diagram of the
location information, and item-mask information of the Mask RCNN algorithm is shown in Figure 2.
target, and obtained an 85% mAP value. Li et al. [20]
proposed a building target detection algorithm based on
Mask RCNN. In remote sensing images of different scenes,
the detection of building targets can achieve an accuracy of
94.6%. The application field of Mask RCNN algorithm is
very wide, but no one has used it in the field of automobile
damage detection.
The paper uses Mask RCNN algorithm to detect and
segment automobile damaged areas in traffic accidents. It
has very important research value and has broad application
scenarios in the field of transportation. Due to the
complexity of car damage detection and segmentation,
there are problems such as lower detection segmentation
accuracy and slower detection speed. This paper improves
the model's network structure by reducing the number of FIGURE 2. Mask RCNN network framework model
layers in the residual network, and adjusting the internal The algorithm flow is the following.
structure to strengthen the regularization of the model , (1) Input the image to be processed into a pre-trained
enhance the generalization ability, and then adjust the ResNet50+FPN network model to extract features and obtain
parameters of the anchor box and the loss loss function to corresponding feature maps.
improve the accuracy of car damage detection and (2) This feature map obtains a large number of candidate
segmentation. In this paper, the improved Mask RCNN is frames (i.e., the region of interest, or ROI) through RPN, and
applied to the field of automobile damage detection, and a then uses the softmax classifier to perform binary
model based on it proposed for detecting and segmenting classification of foreground and background, using frame
the damaged area of a vehicle in an accident. Photos can be regression to obtain more accurate candidate-frame position
taken from both sides of the accident and uploaded for information, and filtering out part of the ROI by
assessment. Insurance companies can also use this model to non-maximum suppression.
process claims quickly. (3) The feature map and the last remaining ROI are sent to
II. Car-damage-detection algorithm framework
the RoIAlign layer, so that each ROI generates a fixed-size
The vehicle-damage-detection and segmentation system feature map.
based on the Mask RCNN model designed in this paper is (4) Finally, the flow goes through two branches, one branch
shown in Figure 1. entering the fully connected layer for object classification
and frame regression, and the other entering the full
convolution network (FCN) for pixel segmentation.
VOLUME XX, 20 9
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2964055, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)
backbone structure of ResNet50 and a FPN feature pyramid amount, greatly improving the detection performance of
network is used in this paper. FPN[21] uses a top-down small objects and achieving excellent improvement in
hierarchy with lateral connections, from single-scale input to accuracy and speed.
building a network feature pyramid, which solves the RPN is equivalent to a sliding-window-based classless
multi-scale problem of extracting target objects in images. target detector. It is based on the structure of a convolutional
This structure has strong robustness and adaptability, and neural network. The sliding frame scan produces anchor
requires fewer parameters. frame anchors. A suggested area can generate a large number
To further improve the detection accuracy, the of anchors of different sizes and aspect ratios, and they
backbone network structure is improved and the order of overlap to cover as many images as possible; the size of the
each layer is adjusted, as shown in Figure 3. The right-hand suggested area and the desired area overlap (IOU) will
part of the diagram in each figure is called the “ residual ” directly affect the classification effect. To be able to adapt to
branch and the left-hand part the “ identity ” branch. The more damaged car areas, the algorithm adjusts the scaling
value of the "identity" branch cannot be easily changed. Keep scale of the "anchor point" to {32×32, 64×64, 128×128, 256×
the input and output consistent, otherwise it will affect the 256, 512 × 512}, and the aspect ratio of the anchor point is
information transmission and hinder the loss of loss. Adjust changed to {1:2,1:1,3:1}, as shown in Figure 4. The so-called
the order of the layers on the "residual" branch, the improved IoU is the coverage of the predicted box and the real box, the
ResNet structure has two advantages. First, back-propagation value of which is equal to the intersection of the two boxes
basically meets the requirements, and information divided by the union of the two boxes. In this paper, the
transmission is unimpeded. Second, the BN layer acts as a value of IoU is set to 0.8; that is, when the overlap ratio of
pre-activation, and the concept of “ pre ” is relative to the the area corresponding to the anchor frame and the real target
weight (conv) layer. This can enhance the regularization of area is greater than 0.8, it is the foreground; when the overlap
the model, and the generalization performance is better. rate is less than 0.2, it is the background; between the two
values, it is discarded. This reduces the amount of
computation underlying the model, saves time, and the
improved RPN produces less ROI, which, in turn, increases
the efficiency of the model.
FIGURE 4. ROI generated by the original RPN and the improved RPN
D. RoIAlign model
In the Mask RCNN network structure, the mask branch must
determine whether a given pixel is part of the target, and the
accuracy must be at the pixel level. After the original image
is heavily convolved and pooled, the size of the image has
changed. When the pixel-level segmentation is directly
FIGURE 3. ResNet structure and Improved ResNetV2 structure performed, the image target object cannot be accurately
positioned, so the Mask RCNN is improved on the basis of
Faster RCNN, and the Rol Pooling layer is changed into the
C. RPN model improvement interest-region alignment layer (RoIAlign). The bi-linear
In this paper, the Feature Pyramid Networks structure is interpolation[23] method preserves the spatial information on
adopted, and the images are made into different sizes to the feature map, which largely solves the error caused by the
generate features corresponding to different sizes. The two quantizations of the feature map in the RoI Pooling layer,
shallow features can distinguish simple large targets and the and solves the problem of regional mismatch of the image
deep features can distinguish small targets. The different-size object. Pixel-level detection segmentation can thus be
feature maps generated by the FPN are input into the achieved.
RPN[22], and then the RPN can extract the RoI features from The interest-area alignment layer RoIAlign differs from
different levels of the feature pyramid according to the size the ROI pooling in that it eliminates the quantization
of the target object. Thereby, the simple network structure operation and does not quantize the ROI boundary and the
changes, without substantially increasing the calculation unit, but uses bi-linear interpolation to calculate the exact
VOLUME XX, 20 9
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2964055, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)
position of the sample points in each unit, retaining its In the Faster RCNN,a hyperparameter = 10 control
decimal, and then uses the maximum pooling or average balance is introduced between the classification loss and
pooling operation to output the last fixed-size RoI. As shown regression loss, and the large-scale target and small-scale
in Fig. 5, the blue dotted line is the 5×5 feature map after target share this one parameter.
convolution, the solid line is the feature small block The error function of the Class prediction branch in the
corresponding to the ROI in the feature map, and RoIAlign Mask RCNN can be calculated by the following formula:
maintains the floating-point number boundary, without
L( p, u , t u , v) Lcls ( p, u ) [u 1]Lloc (t u , v)
quantization processing. First, the feature small block is (3)
divided into 2×2 units (each unit boundary is not quantized) Where p is the predicted class, u is GT class, tu is the
and then divided into four small blocks in each unit; the predicted bounding box for class u, v is GT bounding box.
center point is taken as four coordinate positions, as shown If the hyperparameter = 10 is still introduced in the
by the blue dot in the figure. Then, the values of the four Mask RCNN, it will cause a phenomenon. The high-level
positions are calculated by bi-linear interpolation, and finally semantic information is introduced on the underlying feature.
the maximum pooling or average pooling operation The small-scale target has obvious rise points and the
performed to obtain the feature map of 2×2 size. large-scale target is not obvious. On the high-level feature,
more underlying feature information is introduced or
maintained. The large-scale target has obvious rise points and
the small scale is not obvious. The frame of the large target is
actually more accurate, but the position drift is more serious,
so the underlying information that is good for positioning is
needed, which contributes to the improvement of the large
target on the map indicator. The possible position of the
small target is more accurate. However, the judgment of
semantic information is relatively weak, so high-level
semantic information is needed to assist the discrimination,
which contributes to the improvement of the small target in
FIGURE 5. RoIAlign schematic
the map index. To summarize, the focus is on optimizing the
location information for large targets. For small targets, the
E. Improvement of loss function focus is on optimizing category prediction. That is, for
The multitasking loss function of Mask RCNN is different scale targets, different weights should be
L Lcls Lbox LMask introduced in the loss function to improve the detection
accuracy of the detection branches.
(1)
The above equation is the same as the loss function in the
Faster RCNN model, which represents the classification error III. Experimental results and analysis
and detection error, respectively. The mask branch and the To reduce the number of steps in making dataset labels and
class prediction branch are decoupled, and a binary mask is to improving the detection accuracy of car-damage images,
independently predicted for each category, without relying transfer learning and Mask RCNN are used in this paper to
on the prediction results of the classification branch. The loss process and detect images showing damage.
function in Faster RCNN:
1 1 A. Transfer learning
L({ pi }, {ti }) L cls ( pi , pi* ) p L*
i reg (ti , ti* )
N cls i N reg
(2) i Deep learning requires a significant amount of data, but in
In the above formula, i is the index of the anchor box in the most cases it is difficult to find enough training data for a
specific problem within a certain range. To solve this
mini-batch; N cls and N reg indicate the number of
problem, a solution is proposed, namely to use transfer
classification layers and regression layers respectively; Pi learning[24].
represents the predicted probability value of anchor i being Transfer learning includes a source domain and a target
* domain, defined as
an object ; Pi is 0 if the anchor box is negative, and is 1 if the D ( s ) {x, P ( x)}, D (t ) {x, P ( x)} (4)
anchor box is positive; ti indicates 4 parameterized where D (s ) is the source domain, D (t ) the target
coordinates of the prediction candidate box; ti
*
refers to 4 domain, x the feature space, and P(X) the marginal
parameterized coordinates of the true value region; Lcls and probability distribution, X {x1 , K , xn } x
.
Lreg represent classification loss and regression loss, It can be seen from the above formula that transfer
learning is used to transfer the model parameters already
respectively. Represents the balance coefficient, which is trained in the source domain to new models in the target
used to control the proportion of the two loss functions. domain to help the new model training. Considering that
VOLUME XX, 20 9
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2964055, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)
C. Experimental environment
TABLE I
EXPERIMENTAL ENVIRONMENT INFORMATION TABLE
VOLUME XX, 20 9
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2964055, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)
TABLE 3
1 k pii
MIoU k COMPARISON OF TEST RESULTS ACCURACY AND TIME
k 1 j 0 pij kj 0 p ji pii
i 1 (FPS DENOTES FRAME PER SECOND)
(6) Detection
Mask accuracy Running speed
Where k is the total number of output classes in the Algorithm accuracy
(MIoU) (%) (fps)
(%)
model, pij represents the number of pixels that belong Mask RCNN 94.53 81.25 4.26
to category i but have been misjudged as category j. Improved 96.68 83.14 4.78
Mask RCNN
pii indicates the number of pixels correctly classified, As can be seen from Table 3, compared with the Mask
while pij and p ji represent pixels that are misclassified. RCNN, the improved Mask RCNN improves the detection
accuracy by 2.15%, the mask accuracy by 1.89%, and the
running speed by 0.52fps. It can be seen that the improved
F. Experimental results and analysis algorithm not only improves the accuracy, but also speeds
In order to study the detection performance of the improved up the detection speed, has better performance advantages,
algorithm on the car damage data set, it is compared with and has higher applicability in the damaged area of the
the advanced detection algorithm Mask RCNN algorithm. automobile.
Figure 8 shows the P-R curve obtained using two To verify the accuracy and reliability of the improved
algorithms. Then, the area under the P-R curve is obtained Mask RCNN for automobile damage detection, experiments
by integration, and the average accuracy of the two were conducted on images under the following conditions:
algorithms for car damage detection, that is, the AP value, normal illumination, weak illumination, close distance,
is obtained, and the result is shown in Figure 9. multiple damage, strong exposure, and insignificant
damage. These conditions map to images (a)–(f),
respectively, in the original Mask RCNN algorithm and the
improved Mask RCNN for testing, and the test results are
shown in Table 4 and Table 5. The rectangular box
indicates the detected target position, the number on the
rectangular frame the probability of belonging to the
damaged area of the car, and the binary mask the
approximate outline of the damaged area of the car.
VOLUME XX, 20 9
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2964055, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)
VOLUME XX, 20 9
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2964055, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)
high, but the mask instance segmentation cannot be [19] J. Shi, Y. Zhou, Q. Zhang, “Service robot item recognition system
based on improved Mask RCNN and Kinect,”Application Research of
completely correct, and some areas in which the damage is Computers,Jun. 2019, pp.1-9.
not obvious cannot be segmented. In future work, data [20] J. Li, W. He, et al.,”Building target detection algorithm based on
expansion can be carried out to increase the size of the Mask RCNN,”Science of Surveying and Mapping, Apr. 2019, pp. 1-13.
dataset, collect more car damage images under different [21] Y. Lin, R. Girshick, K. He, et al.,”Feature pyramid networks for
object detection,”Computer Vision and Pattern Recognition, Jun. 2016,
weather conditions and different levels of illumination,
pp.320-329.
enhance the data, improve the edge-contour enhancement [22] X. Zhang, J. Zou, X. Ming, et al.,”Efficient and accurate
of images, and make the masking of the damaged areas of approximations of nonlinear convolutional networks,”IEEE conference on
the car more accurate. computer vision and pattern recognition, Oct. 2015, pp.1984-1992.
[23] S. Wang, K. Yang, “An Image Scaling Algorithm Based on Bilinear
Interpolation with VC++,”Techniques of Automation and Applications,
REFERENCES 2008, pp 168-176.
[1] R. Girshick, J. Donahue, T. Darrell, et al.,”Rich feature hierarchies for [24] A. Mathew, J. Mathew, M. Govind, et al.,”An improved transfer
accurate object detection and semantic segmentation.”IEEE conference on learning approach for intrusion detection,”Procedia Computer Science,
computer vision and pattern recognition, vol.13, no.1, pp. 580–587,Jan. 2017,pp.251-257.
2014. [25] G. Han, J. Su, C. Zhang,”A method based on Multi-Convolution
[2]R. Girshick, “Fast R-CNN,” in Proc. IEEE international conference on Layers Joint and Generative Adversarial Networks for Vehicle
computer vision, Dec. 2015, pp 1440-1448. Detection,”KSII Transactions on Internet and Information Systems, 2019,
[3] S. Ren, K. He, R. Girshick, et al.,”Faster R-CNN: Towards real-time pp.1795-1811.
object detection with region proposal networks,”IEEE Transactions on [26]Y. Yu, K. L. Zhang, Y. Li, et al.,”Fruit detection for strawberry
Pattern Analysis and Machine Intelligence ,pp.1137-1149.Jun.1,2017,DOI: harvesting robot in non-structural environment based on
10.1109/TPAMI.2016.2577031 Mask-RCNN,”Computers and Electronics in Agriculture, 2019, pp.
[4] W. Liu, D. Anguelov, D. Erhan, et al.,”SSD:Single shot multibox 163-172.
detector,”IEEE European Conference on Computer Vision,Jun. 2016, pp. [27] Y. Liu, P. Zhang, et al.,”Automatic Segmentation of Cervical Nuclei
21-37. Based on Deep Learning and a Conditional Random Field,”IEEE Access,
[5] K. He, X. Zhang, S. Ren, et al.,”Deep residual learning for image 2018, pp.53790-53721.
recognition,”Computer vision and pattern recognition, Dec. 2016, pp.
770-778.
[6] K. He, G.Georgia, D. Piotr, et al.,”Mask R-CNN,” in Proc. IEEE Int.
Conf. Comput. Vis. (ICCV), Oct. 2017, pp.2980-2988.
[7] N Kumar, R Verma, et al. “A Multi-organ Nucleus Segmentation
Challenge,” IEEE transactions on medical imaging, vol. ED-11, no. 1, pp. Qinghui Zhang received the B.S.E. in
34–39, Oct. 2019, 10.1109/TMI.2019.2947628. college of Fire Control from Zhengzhou
[8] AK Jaiswal, P Tiwari, et al. ”Identifying pneumonia in chest X-rays: A Institute of Anti-aircraft, Henan, M.E. in
deep learning approach.” Measurement, vol.145, pp. 511-518, Oct. 2019, Navigation Guidance and Control from
DOI:10.1016/j.measurement.2019.05.076. Ordnance Engineering College,
[9]PO Pinheiro, R Collobert, et al. “Learning to segment object Shijiazhuang, and Ph.D from Beijing
candidates,” Advances in Neural Information Processing Systems, Institute of Technology, Beijing, P.R.China
pp.1990-1998. in 1996,2003 and 2006, respectively. Now,
[10] WS Tang, HL Liu, et al. “Fast hypervolume approximation scheme he is a professor with the College of
based on a segmentation strategy,” Information sciences, vol.509, Information Science and Engineering,
pp.320-342, Jan. 2020, DOI:10.1016/j.ins.2019.02.054. Henan University of Technology. His
[11] Y Li, HZ Qi, et al. “Fully Convolutional Instance-aware Semantic research interests include artificial
Segmentation,” Computer Vision and Pattern Recognition, Apr. 2017, intelligence information processing and
embedded system.
pp.4438-4446.
[12] XJ Rong, CC Yi, et al. “Unambiguous Scene Text Segmentation With
Referring Expression Comprehension,” IEEE Transactions on image
processing, vol.29, pp 591-601, 2020, DOI: 10.1109/TIP.2019.2930176.
[13] YL Qiao, M Truman, S Sukkarieh. Cattle segmentation and contour
extraction based on Mask R-CNN for precision livestock farming.
Computers and electronics in agriculture, pp.165-173. Dec.20,2019, Xianing Chang received the B.S.E. in
DOI:10.1016/j.compag.2019.104958. college of science from Henan Agricultural
[14] SH Cheng, SJ Zhang, DF Zhang. “Water quality monitoring method University, Henan in 2018, she is pursuing
based on feedback self correcting dense connected convolution the master of computer technology at the
network,”Neurocomputing, vol.349, pp.301-313. Jul.2019, DOI: College of Information Science and
10.1016/j.neucom.2019.03.023. Engineering from Henan University of
[15] JR Yang, LY Ji, et al.“Building detection in high spatial resolution Technology, Henan, P.R. China. Her
remote sensing imagery with the U-Rotation Detection Network”, research interests include artificial
International Journal of Remote Sending, vol.40, pp.6036-6048. Aug.2019, intelligence information processing, road
DOI:10.1080/01431161.2019.1587200. scene target detection and Deeping
[16] EK Wang, X Zhang, et al. “Multi-Path Dilated Residual Network for learning.
Nuclei Segmentation and Detection,” Cells, vol.8, pp.109-120. May.2019,
DOI:10.3390/cells8050499.
[17] X. Lin, S. Zhu, J. Zhang, et al.,”Rice Planthopper Image
Classification Method Based on Transfer Learning and Mask
R-CNN,”Transactions of the Chinese Society for Agricultural Machinery,
vol. 13, no.4, pp. 181-184, Dec. 2019.
[18] G. Wang, S. Liang, et al.,”Ship Object Detection Based on Mask
RCNN,”Radio Engineering,2018, pp. 947-952.
VOLUME XX, 20 9
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2964055, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)
VOLUME XX, 20 9
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.