0% found this document useful (0 votes)
51 views9 pages

Vehicle-Damage-Detection Segmentation Algorithm Based On Improved Mask RCNN

This article proposes a vehicle damage detection segmentation algorithm based on an improved Mask RCNN model. The algorithm collects car damage images, labels the data set, and divides it into training and test sets. It then optimizes the ResNet, performs feature extraction with FPN, adjusts the RPN's anchor proportion and threshold, uses ROIAlign to preserve spatial information, and introduces different weights in the loss function for different scale targets. Testing shows the improved Mask RCNN has better AP, detection accuracy, and masking accuracy, improving efficiency for resolving traffic accident compensation issues.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views9 pages

Vehicle-Damage-Detection Segmentation Algorithm Based On Improved Mask RCNN

This article proposes a vehicle damage detection segmentation algorithm based on an improved Mask RCNN model. The algorithm collects car damage images, labels the data set, and divides it into training and test sets. It then optimizes the ResNet, performs feature extraction with FPN, adjusts the RPN's anchor proportion and threshold, uses ROIAlign to preserve spatial information, and introduces different weights in the loss function for different scale targets. Testing shows the improved Mask RCNN has better AP, detection accuracy, and masking accuracy, improving efficiency for resolving traffic accident compensation issues.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2964055, IEEE Access

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.Doi Number

Vehicle-Damage-Detection Segmentation
Algorithm Based on Improved Mask RCNN
Qinghui Zhang1,2, Xianing Chang2, and Shanfeng Bian2
1
Key Laboratory of Grain Information Processing and Control (Henan University of Technology), Ministry of Education, Zhengzhou 450001,China
2
College of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001,China

Corresponding author: Qinghui Zhang (e-mail: [email protected]).


This work is supported by the National Natural Science Foundation of China (No.U1404617).

ABSTRACT Traffic congestion due to vehicular accidents seriously affects normal travel, and accurate
and effective mitigating measures and methods must be studied. To resolve traffic accident compensation
problems quickly, a vehicle-damage-detection segmentation algorithm based on transfer learning and an
improved mask regional convolutional neural network (Mask RCNN) is proposed in this paper. The
experiment first collects car damage pictures for preprocessing and uses Labelme to make data set labels,
which are divided into training sets and test sets. The residual network (ResNet) is optimized, and feature
extraction is performed in combination with Feature Pyramid Network (FPN). Then, the proportion and
threshold of the Anchor in the region proposal network (RPN) are adjusted. The spatial information of the
feature map is preserved by bilinear interpolation in ROIAlign, and different weights are introduced in the
loss function for different-scale targets. Finally, the results of self-made dedicated dataset training and
testing show that the improved Mask RCNN has better Average Precision (AP) value, detection accuracy
and masking accuracy, and improves the efficiency of solving traffic accident compensation problems.

INDEX TERMS Mask RCNN, vehicle-damage-detection, loss function, detection accuracy

I. INTRODUCTION only accurately segment individuals in different categories,


Object detection is one of the main research contents of but also label each pixel in the image to distinguish
computer vision. It is to determine the category and location different individuals in the same category[8].
information of the object of interest in the image on the Most current instance segmentation algorithms are
instance level. Currently the most popular target detection based on candidate regions. Pinheiro et al.[9] proposed a
algorithms include RCNN[1], Fast RCNN[2], Faster DeepMask segmentation model, which outputs prediction
RCNN[3] and SSD[4]. However, these frameworks require candidate masks through the instances appearing in the
a large amount of training data, which cannot achieve input image to segment each instance object, but the
end-to-end detection. The positioning ability of the accuracy of boundary segmentation is low [10]; Li et al.[11]
detection frame is limited, and when the feature is extracted, proposed the first end-to-end instance segmentation
as the number of convolution layers increases, gradient framework, full convolutional instance segmentation
disappearance or gradient explosion often occurs. For these (FCIS). By improving the position-sensitive score map,
drawbacks, He Kaiming et al. proposed a residual network FCIS predicts both the bounding box and instance
segmentation, but it can only roughly detect the boundary
(ResNet)[5][25], which helps the model to converge by
of each instance object when processing overlapping object
using the residual module, accelerates the training of the
instances [12]; He et al.[6] proposed the Mask RCNN
neural network, and combines with the target detection
framework, which is an algorithm with relatively fine
model Mask RCNN[6][26][27] to realize object detection
instance segmentation results among existing segmentation
and segmentation, greatly improving the accuracy of the algorithms[13].
model detection. Mask RCNN is the first deep learning Compared with the traditional target detection method,
model that combines both target detection and the target detection model Mask RCNN not only has a great
segmentation in one network[7]. It can achieve improvement in detection accuracy, but also has great
challenging instance segmentation tasks, which can not advantages in the field of small target detection. It is

VOLUME XX, 2019 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2964055, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)

widely used in agriculture[14], construction[15] , divided into a training set and a test set. The data are sent to
Medical image segmentation[16] and other fields. Lin et the Mask RCNN for feature extraction and classification
al. [17] used Mask RCNN to classify rice planthoppers, and prediction and segmentation masking, and the
realized the effective and rapid identification of rice car-damage-detection result is output.
planthoppers and non-rice planthoppers, achieving an
average recognition accuracy of 0.923. Wang et al. [18] A. Mask RCNN algorithm
used Mask RCNN to ship-target detection, which shows Mask RCNN is an instance segmentation framework
that Mask RCNN has better performance in solving the extended by Faster RCNN. It is divided into two stages: the
problem of closely aligned targets and multi-scale targets. first stage scans the image and generates the proposal, and
Shi et al. [19] used Mask RCNN to the existing the second classifies the proposal and generates the bounding
home-service-robot platform to obtain category information, box and mask. The network structure block diagram of the
location information, and item-mask information of the Mask RCNN algorithm is shown in Figure 2.
target, and obtained an 85% mAP value. Li et al. [20]
proposed a building target detection algorithm based on
Mask RCNN. In remote sensing images of different scenes,
the detection of building targets can achieve an accuracy of
94.6%. The application field of Mask RCNN algorithm is
very wide, but no one has used it in the field of automobile
damage detection.
The paper uses Mask RCNN algorithm to detect and
segment automobile damaged areas in traffic accidents. It
has very important research value and has broad application
scenarios in the field of transportation. Due to the
complexity of car damage detection and segmentation,
there are problems such as lower detection segmentation
accuracy and slower detection speed. This paper improves
the model's network structure by reducing the number of FIGURE 2.  Mask RCNN network framework model
layers in the residual network, and adjusting the internal The algorithm flow is the following.
structure to strengthen the regularization of the model , (1) Input the image to be processed into a pre-trained
enhance the generalization ability, and then adjust the ResNet50+FPN network model to extract features and obtain
parameters of the anchor box and the loss loss function to corresponding feature maps.
improve the accuracy of car damage detection and (2) This feature map obtains a large number of candidate
segmentation. In this paper, the improved Mask RCNN is frames (i.e., the region of interest, or ROI) through RPN, and
applied to the field of automobile damage detection, and a then uses the softmax classifier to perform binary
model based on it proposed for detecting and segmenting classification of foreground and background, using frame
the damaged area of a vehicle in an accident. Photos can be regression to obtain more accurate candidate-frame position
taken from both sides of the accident and uploaded for information, and filtering out part of the ROI by
assessment. Insurance companies can also use this model to non-maximum suppression.
process claims quickly. (3) The feature map and the last remaining ROI are sent to
II. Car-damage-detection algorithm framework
the RoIAlign layer, so that each ROI generates a fixed-size
The vehicle-damage-detection and segmentation system feature map.
based on the Mask RCNN model designed in this paper is (4) Finally, the flow goes through two branches, one branch
shown in Figure 1. entering the fully connected layer for object classification
and frame regression, and the other entering the full
convolution network (FCN) for pixel segmentation.

B. Backbone network structure improvement


Generally, the backbone network of Mask RCNN adopts
ResNet101; that is, the number of network layers is 101, but
too many layers will greatly reduce the rate of the network
structure. The car-damage category trained in this paper is
FIGURE 1.  Car-damage-detection segmentation system framework relatively simple, and the requirements for the network layer
are lower; thus, to further improve the running speed of the
It can be seen from the figure that an image of the algorithm, this paper uses ResNet50.
damaged part of the car is selected and collected according to Because the size of the car damage in the images will
the demand, and the data are marked by the LabelMe be different, only a single convolutional neural network
annotation tool to make a dataset in the .json format, which is cannot extract all the image attributes well. Therefore, the

VOLUME XX, 20 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2964055, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)

backbone structure of ResNet50 and a FPN feature pyramid amount, greatly improving the detection performance of
network is used in this paper. FPN[21] uses a top-down small objects and achieving excellent improvement in
hierarchy with lateral connections, from single-scale input to accuracy and speed.
building a network feature pyramid, which solves the RPN is equivalent to a sliding-window-based classless
multi-scale problem of extracting target objects in images. target detector. It is based on the structure of a convolutional
This structure has strong robustness and adaptability, and neural network. The sliding frame scan produces anchor
requires fewer parameters. frame anchors. A suggested area can generate a large number
To further improve the detection accuracy, the of anchors of different sizes and aspect ratios, and they
backbone network structure is improved and the order of overlap to cover as many images as possible; the size of the
each layer is adjusted, as shown in Figure 3. The right-hand suggested area and the desired area overlap (IOU) will
part of the diagram in each figure is called the “ residual ” directly affect the classification effect. To be able to adapt to
branch and the left-hand part the “ identity ” branch. The more damaged car areas, the algorithm adjusts the scaling
value of the "identity" branch cannot be easily changed. Keep scale of the "anchor point" to {32×32, 64×64, 128×128, 256×
the input and output consistent, otherwise it will affect the 256, 512 × 512}, and the aspect ratio of the anchor point is
information transmission and hinder the loss of loss. Adjust changed to {1:2,1:1,3:1}, as shown in Figure 4. The so-called
the order of the layers on the "residual" branch, the improved IoU is the coverage of the predicted box and the real box, the
ResNet structure has two advantages. First, back-propagation value of which is equal to the intersection of the two boxes
basically meets the requirements, and information divided by the union of the two boxes. In this paper, the
transmission is unimpeded. Second, the BN layer acts as a value of IoU is set to 0.8; that is, when the overlap ratio of
pre-activation, and the concept of “ pre ” is relative to the the area corresponding to the anchor frame and the real target
weight (conv) layer. This can enhance the regularization of area is greater than 0.8, it is the foreground; when the overlap
the model, and the generalization performance is better. rate is less than 0.2, it is the background; between the two
values, it is discarded. This reduces the amount of
computation underlying the model, saves time, and the
improved RPN produces less ROI, which, in turn, increases
the efficiency of the model.

FIGURE 4. ROI generated by the original RPN and the improved RPN

D. RoIAlign model
In the Mask RCNN network structure, the mask branch must
determine whether a given pixel is part of the target, and the
accuracy must be at the pixel level. After the original image
is heavily convolved and pooled, the size of the image has
changed. When the pixel-level segmentation is directly
FIGURE 3. ResNet structure and Improved ResNetV2 structure performed, the image target object cannot be accurately
positioned, so the Mask RCNN is improved on the basis of
Faster RCNN, and the Rol Pooling layer is changed into the
C. RPN model improvement interest-region alignment layer (RoIAlign). The bi-linear
In this paper, the Feature Pyramid Networks structure is interpolation[23] method preserves the spatial information on
adopted, and the images are made into different sizes to the feature map, which largely solves the error caused by the
generate features corresponding to different sizes. The two quantizations of the feature map in the RoI Pooling layer,
shallow features can distinguish simple large targets and the and solves the problem of regional mismatch of the image
deep features can distinguish small targets. The different-size object. Pixel-level detection segmentation can thus be
feature maps generated by the FPN are input into the achieved.
RPN[22], and then the RPN can extract the RoI features from The interest-area alignment layer RoIAlign differs from
different levels of the feature pyramid according to the size the ROI pooling in that it eliminates the quantization
of the target object. Thereby, the simple network structure operation and does not quantize the ROI boundary and the
changes, without substantially increasing the calculation unit, but uses bi-linear interpolation to calculate the exact

VOLUME XX, 20 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2964055, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)

position of the sample points in each unit, retaining its In the Faster RCNN,a hyperparameter  = 10 control
decimal, and then uses the maximum pooling or average balance is introduced between the classification loss and
pooling operation to output the last fixed-size RoI. As shown regression loss, and the large-scale target and small-scale
in Fig. 5, the blue dotted line is the 5×5 feature map after target share this one parameter.
convolution, the solid line is the feature small block The error function of the Class prediction branch in the
corresponding to the ROI in the feature map, and RoIAlign Mask RCNN can be calculated by the following formula:
maintains the floating-point number boundary, without
L( p, u , t u , v)  Lcls ( p, u )  [u  1]Lloc (t u , v)
quantization processing. First, the feature small block is (3)
divided into 2×2 units (each unit boundary is not quantized) Where p is the predicted class, u is GT class, tu is the
and then divided into four small blocks in each unit; the predicted bounding box for class u, v is GT bounding box.
center point is taken as four coordinate positions, as shown If the hyperparameter  = 10 is still introduced in the
by the blue dot in the figure. Then, the values of the four Mask RCNN, it will cause a phenomenon. The high-level
positions are calculated by bi-linear interpolation, and finally semantic information is introduced on the underlying feature.
the maximum pooling or average pooling operation The small-scale target has obvious rise points and the
performed to obtain the feature map of 2×2 size. large-scale target is not obvious. On the high-level feature,
more underlying feature information is introduced or
maintained. The large-scale target has obvious rise points and
the small scale is not obvious. The frame of the large target is
actually more accurate, but the position drift is more serious,
so the underlying information that is good for positioning is
needed, which contributes to the improvement of the large
target on the map indicator. The possible position of the
small target is more accurate. However, the judgment of
semantic information is relatively weak, so high-level
semantic information is needed to assist the discrimination,
which contributes to the improvement of the small target in
FIGURE 5. RoIAlign schematic
the map index. To summarize, the focus is on optimizing the
location information for large targets. For small targets, the
E. Improvement of loss function focus is on optimizing category prediction. That is, for
The multitasking loss function of Mask RCNN is different scale targets, different weights should be
L  Lcls  Lbox  LMask introduced in the loss function to improve the detection
accuracy of the detection branches.
(1)
The above equation is the same as the loss function in the
Faster RCNN model, which represents the classification error III. Experimental results and analysis
and detection error, respectively. The mask branch and the To reduce the number of steps in making dataset labels and
class prediction branch are decoupled, and a binary mask is to improving the detection accuracy of car-damage images,
independently predicted for each category, without relying transfer learning and Mask RCNN are used in this paper to
on the prediction results of the classification branch. The loss process and detect images showing damage.
function in Faster RCNN:
1 1 A. Transfer learning
L({ pi }, {ti })  L cls ( pi , pi* )   p L*
i reg (ti , ti* )
N cls i N reg
(2) i Deep learning requires a significant amount of data, but in
In the above formula, i is the index of the anchor box in the most cases it is difficult to find enough training data for a
specific problem within a certain range. To solve this
mini-batch; N cls and N reg indicate the number of
problem, a solution is proposed, namely to use transfer
classification layers and regression layers respectively; Pi learning[24].
represents the predicted probability value of anchor i being Transfer learning includes a source domain and a target
* domain, defined as
an object ; Pi is 0 if the anchor box is negative, and is 1 if the D ( s )  {x, P ( x)}, D (t )  {x, P ( x)} (4)
anchor box is positive; ti indicates 4 parameterized where D (s ) is the source domain, D (t ) the target
coordinates of the prediction candidate box; ti
*
refers to 4 domain, x the feature space, and P(X) the marginal
parameterized coordinates of the true value region; Lcls and probability distribution, X  {x1 , K , xn }  x
.
Lreg represent classification loss and regression loss, It can be seen from the above formula that transfer
learning is used to transfer the model parameters already
respectively.  Represents the balance coefficient, which is trained in the source domain to new models in the target
used to control the proportion of the two loss functions. domain to help the new model training. Considering that

VOLUME XX, 20 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2964055, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)

most of the image data have similar basic features, such as


color and mode, in this paper pre-training is first done on
large coco datasets, and then the trained weight files are
migrated to the dedicated datasets collected in this article for
training, fine-tuning the network parameters. This allows the
convolutional neural networks to achieve good results on
small datasets, thereby alleviating the problem of insufficient
data sources.
Ideally, a comparison of successful Transfer Learning
with Starting From Scratch is shown in Figure 6.

FIGURE 7. Labelme-marked car-damage image

C. Experimental environment
TABLE I
EXPERIMENTAL ENVIRONMENT INFORMATION TABLE

Attribute name Attribute value

TensorFlow version 1.14.0


Keras version 2.2.5
RAM 31.3G
FIGURE 6. Comparison between Transfer Learning and Starting From Processor Inter(R) Core(TM)i7-6700K CPU
Scratch
@4.00GHz x 8
It can be seen that using migration learning can bring Graphics GeForce GTX 1080/PCle/SSE2
three advantages. First, the initial performance of the model Operating system version Ubuntu16.04,64bit
is higher. Second, the rate of performance improvement of
the model is greater during the training process. Third, the
final performance of the trained model is better. D. Parameter settings
TABLE 2
EXPERIMENTAL PART PARAMETER TABLE
B. Building a dataset
The main research object of this paper is a picture of a Parameter Value
vehicle that is scratched. The experiment collected 2,000 LEARNING_RATE 0.001
damaged vehicle images (1600 training sets and 400 test sets) LEARNING_MOMENTUM 0.9
from online downloads and daily photographs. WEIGHT_DECAY 0.0001
DETECTION_MIN_CONFIDENCE 0.8
The main steps in acquiring a dedicated dataset for STEPS_PRE_EPOCH 100
detecting vehicle scratches in complex environments include NUM_CLASSES 2
the following two parts. MASK_POOL_SIZE 14
(1) Image collection: Images of damaged vehicles at POOL_SIZE 7
VALIDATION_STEPS 50
different angles and of different sizes in different scenes are
photographed and downloaded from the Internet. Because the
downloaded images vary in size, and the sample of Mask E. Evaluation index
RCNN must be normalized to a uniform size, a script is used The evaluation index of the experimental results consists of two
to normalize the images to 1024×1024 pixels, and the aspects: detection performance and segmentation performance. In
insufficient portions are filled with 0. this experiment, the P-R curve and the AP value were used to
(2) Image processing: The captured images are marked using evaluate the performance of the target detection, and the mean
the marking tool Labelme and divided into training sets and intersection over union (MIoU) and the running speed were used
test sets. The specific steps in this process are the following. to evaluate the image segmentation performance.
First, a folder “ datasets ” is created, and then two TP TP
subfolders, “train” and “val”, are created for storing training P R
TP  FP , TP  FN (5)
samples and test samples. The images in each folder
Where TP is the correct number of samples correctly
correspond to a .json annotation information file with the
classified. FP is the number of negative samples of a positive
same name. The labeling interface is shown in Figure 7. sample that is incorrectly marked. FN is the number of positive
samples that are incorrectly marked as negative samples. P is the
accuracy rate and R is the recall rate. A P-R graph is made based
on the prediction results of the test set in the network model, and
then the average accuracy of the model is obtained from the area
under the P-R. The larger the AP value, the better the detection
performance.

VOLUME XX, 20 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2964055, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)

TABLE 3
1 k pii
MIoU   k COMPARISON OF TEST RESULTS ACCURACY AND TIME
k  1  j 0 pij   kj 0 p ji  pii
i 1 (FPS DENOTES FRAME PER SECOND)
(6) Detection
Mask accuracy Running speed
Where k is the total number of output classes in the Algorithm accuracy
(MIoU) (%) (fps)
(%)
model, pij represents the number of pixels that belong Mask RCNN 94.53 81.25 4.26
to category i but have been misjudged as category j. Improved 96.68 83.14 4.78
Mask RCNN
pii indicates the number of pixels correctly classified, As can be seen from Table 3, compared with the Mask
while pij and p ji represent pixels that are misclassified. RCNN, the improved Mask RCNN improves the detection
accuracy by 2.15%, the mask accuracy by 1.89%, and the
running speed by 0.52fps. It can be seen that the improved
F. Experimental results and analysis algorithm not only improves the accuracy, but also speeds
In order to study the detection performance of the improved up the detection speed, has better performance advantages,
algorithm on the car damage data set, it is compared with and has higher applicability in the damaged area of the
the advanced detection algorithm Mask RCNN algorithm. automobile.
Figure 8 shows the P-R curve obtained using two To verify the accuracy and reliability of the improved
algorithms. Then, the area under the P-R curve is obtained Mask RCNN for automobile damage detection, experiments
by integration, and the average accuracy of the two were conducted on images under the following conditions:
algorithms for car damage detection, that is, the AP value, normal illumination, weak illumination, close distance,
is obtained, and the result is shown in Figure 9. multiple damage, strong exposure, and insignificant
damage. These conditions map to images (a)–(f),
respectively, in the original Mask RCNN algorithm and the
improved Mask RCNN for testing, and the test results are
shown in Table 4 and Table 5. The rectangular box
indicates the detected target position, the number on the
rectangular frame the probability of belonging to the
damaged area of the car, and the binary mask the
approximate outline of the damaged area of the car.

FIGURE 8. P-R curve

(a)Normal light (b)Weak light

FIGURE 9. AP values of the two algorithms

It can be seen from Fig.9 and Fig.10 that the improved


Mask RCNN algorithm has a significant improvement in (c)Close distance (d)Multiple damage
detection performance by comparing the Mask RCNN
algorithm. As can be seen from Figure 10, the Mask value
of the Mask RCNN is 0.75, and the AP value of the
improved detection algorithm is 0.83, which is 0.08 higher
than the advanced target detection algorithm Mask RCNN.

VOLUME XX, 20 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2964055, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)

(e)Strong exposure (f)Minor injury (e)Strong exposure (f)Minor injury


FIGURE 11. Car damage detection results based on the improved Mask
FIGURE 10. Vehicle damage detection result based on Mask RCNN
RCNN algorithm
algorithm
The results of the car damage detection of the improved
Statistics on the car damage detection results of the Mask Mask RCNN algorithm in the above figure are counted as
RCNN algorithm above are shown in the following table: shown in the following table:
TABLE 4 TABLE 5
STATISTICAL TABLE OF AUTOMOBILE DAMAGE DETECTION RESULTS STATISTICAL TABLE OF AUTOMOBILE DAMAGE DETECTION RESULTS
BASED ON MASK RCNN BASED ON IMPROVED MASK RCNN
Detect
a b c d e f Detect
picture a b c d e f
picture
lab Normal Weak Close Multiple Strong Minor
Lab Normal Weak Close Multiple Strong Minor
environment light light distance damage exposure injury
environment light light distance damage exposure injury
Target 1 1 1 3 0 1
Target 1 1 1 4 1 2
quantity
quantity
detected
detected
Detection 0.983 0.911 0.906 0.977 0 0.968
Detection 0.995 0.952 0.986 0.972 0.947 0.992
accuracy 0.960
accuracy 0.974 0.933
0.977
0.977
0.905
Comparing the Figure 10, Figure 11 and Table 4,
Table 5, it is shown that the improved Mask RCNN exhibits
improvements in missed detection and low accuracy. The
improved algorithm thus shows strong robustness and
adaptability for vehicle-damage detection. It can be further
seen from the comparison of experimental results that it is
difficult to detect the damaged area of the vehicle with high
exposure using the original Mask RCNN. Areas in which
the damage is not obvious are also difficult to detect, but
the improved Mask RCNN has a good performance
improvement in this area.
(a)Normal light (b)Weak light
IV. Conclusions
In the work described in this paper, a detection algorithm
based on deep learning for vehicle-damage detection is
used to deal with the compensation problem in traffic
accidents. After testing and improvement, the proposed
transfer-learning and improved Mask RCNN–based
vehicle-damage-detection method is more universal, and
can better adapt to various aspects of car-damage images.
The algorithm achieved good detection results in different
scenarios. Regardless of the strength of the light, the
(c)Close distance (d)Multiple damage damaged area of multiple cars, or a scene with an overly
high exposure, the fitting effect is better and the robustness
is strong.
Although the robust Mask RCNN algorithm is adopted
in this paper and it improves on the original algorithm and
obtained ideal experimental results, some aspects have yet
to be studied. For example, the detection accuracy is very

VOLUME XX, 20 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2964055, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)

high, but the mask instance segmentation cannot be [19] J. Shi, Y. Zhou, Q. Zhang, “Service robot item recognition system
based on improved Mask RCNN and Kinect,”Application Research of
completely correct, and some areas in which the damage is Computers,Jun. 2019, pp.1-9.
not obvious cannot be segmented. In future work, data [20] J. Li, W. He, et al.,”Building target detection algorithm based on
expansion can be carried out to increase the size of the Mask RCNN,”Science of Surveying and Mapping, Apr. 2019, pp. 1-13.
dataset, collect more car damage images under different [21] Y. Lin, R. Girshick, K. He, et al.,”Feature pyramid networks for
object detection,”Computer Vision and Pattern Recognition, Jun. 2016,
weather conditions and different levels of illumination,
pp.320-329.
enhance the data, improve the edge-contour enhancement [22] X. Zhang, J. Zou, X. Ming, et al.,”Efficient and accurate
of images, and make the masking of the damaged areas of approximations of nonlinear convolutional networks,”IEEE conference on
the car more accurate. computer vision and pattern recognition, Oct. 2015, pp.1984-1992.
[23] S. Wang, K. Yang, “An Image Scaling Algorithm Based on Bilinear
Interpolation with VC++,”Techniques of Automation and Applications,
REFERENCES 2008, pp 168-176.
[1] R. Girshick, J. Donahue, T. Darrell, et al.,”Rich feature hierarchies for [24] A. Mathew, J. Mathew, M. Govind, et al.,”An improved transfer
accurate object detection and semantic segmentation.”IEEE conference on learning approach for intrusion detection,”Procedia Computer Science,
computer vision and pattern recognition, vol.13, no.1, pp. 580–587,Jan. 2017,pp.251-257.
2014. [25] G. Han, J. Su, C. Zhang,”A method based on Multi-Convolution
[2]R. Girshick, “Fast R-CNN,” in Proc. IEEE international conference on Layers Joint and Generative Adversarial Networks for Vehicle
computer vision, Dec. 2015, pp 1440-1448. Detection,”KSII Transactions on Internet and Information Systems, 2019,
[3] S. Ren, K. He, R. Girshick, et al.,”Faster R-CNN: Towards real-time pp.1795-1811.
object detection with region proposal networks,”IEEE Transactions on [26]Y. Yu, K. L. Zhang, Y. Li, et al.,”Fruit detection for strawberry
Pattern Analysis and Machine Intelligence ,pp.1137-1149.Jun.1,2017,DOI: harvesting robot in non-structural environment based on
10.1109/TPAMI.2016.2577031 Mask-RCNN,”Computers and Electronics in Agriculture, 2019, pp.
[4] W. Liu, D. Anguelov, D. Erhan, et al.,”SSD:Single shot multibox 163-172.
detector,”IEEE European Conference on Computer Vision,Jun. 2016, pp. [27] Y. Liu, P. Zhang, et al.,”Automatic Segmentation of Cervical Nuclei
21-37. Based on Deep Learning and a Conditional Random Field,”IEEE Access,
[5] K. He, X. Zhang, S. Ren, et al.,”Deep residual learning for image 2018, pp.53790-53721.
recognition,”Computer vision and pattern recognition, Dec. 2016, pp.
770-778.
[6] K. He, G.Georgia, D. Piotr, et al.,”Mask R-CNN,” in Proc. IEEE Int.
Conf. Comput. Vis. (ICCV), Oct. 2017, pp.2980-2988.
[7] N Kumar, R Verma, et al. “A Multi-organ Nucleus Segmentation
Challenge,” IEEE transactions on medical imaging, vol. ED-11, no. 1, pp. Qinghui Zhang received the B.S.E. in
34–39, Oct. 2019, 10.1109/TMI.2019.2947628. college of Fire Control from Zhengzhou
[8] AK Jaiswal, P Tiwari, et al. ”Identifying pneumonia in chest X-rays: A Institute of Anti-aircraft, Henan, M.E. in
deep learning approach.” Measurement, vol.145, pp. 511-518, Oct. 2019, Navigation Guidance and Control from
DOI:10.1016/j.measurement.2019.05.076. Ordnance Engineering College,
[9]PO Pinheiro, R Collobert, et al. “Learning to segment object Shijiazhuang, and Ph.D from Beijing
candidates,” Advances in Neural Information Processing Systems, Institute of Technology, Beijing, P.R.China
pp.1990-1998. in 1996,2003 and 2006, respectively. Now,
[10] WS Tang, HL Liu, et al. “Fast hypervolume approximation scheme he is a professor with the College of
based on a segmentation strategy,” Information sciences, vol.509, Information Science and Engineering,
pp.320-342, Jan. 2020, DOI:10.1016/j.ins.2019.02.054. Henan University of Technology. His
[11] Y Li, HZ Qi, et al. “Fully Convolutional Instance-aware Semantic research interests include artificial
Segmentation,” Computer Vision and Pattern Recognition, Apr. 2017, intelligence information processing and
embedded system.
pp.4438-4446.
[12] XJ Rong, CC Yi, et al. “Unambiguous Scene Text Segmentation With
Referring Expression Comprehension,” IEEE Transactions on image
processing, vol.29, pp 591-601, 2020, DOI: 10.1109/TIP.2019.2930176.
[13] YL Qiao, M Truman, S Sukkarieh. Cattle segmentation and contour
extraction based on Mask R-CNN for precision livestock farming.
Computers and electronics in agriculture, pp.165-173. Dec.20,2019, Xianing Chang received the B.S.E. in
DOI:10.1016/j.compag.2019.104958. college of science from Henan Agricultural
[14] SH Cheng, SJ Zhang, DF Zhang. “Water quality monitoring method University, Henan in 2018, she is pursuing
based on feedback self correcting dense connected convolution the master of computer technology at the
network,”Neurocomputing, vol.349, pp.301-313. Jul.2019, DOI: College of Information Science and
10.1016/j.neucom.2019.03.023. Engineering from Henan University of
[15] JR Yang, LY Ji, et al.“Building detection in high spatial resolution Technology, Henan, P.R. China. Her
remote sensing imagery with the U-Rotation Detection Network”, research interests include artificial
International Journal of Remote Sending, vol.40, pp.6036-6048. Aug.2019, intelligence information processing, road
DOI:10.1080/01431161.2019.1587200. scene target detection and Deeping
[16] EK Wang, X Zhang, et al. “Multi-Path Dilated Residual Network for learning.
Nuclei Segmentation and Detection,” Cells, vol.8, pp.109-120. May.2019,
DOI:10.3390/cells8050499.
[17] X. Lin, S. Zhu, J. Zhang, et al.,”Rice Planthopper Image
Classification Method Based on Transfer Learning and Mask
R-CNN,”Transactions of the Chinese Society for Agricultural Machinery,
vol. 13, no.4, pp. 181-184, Dec. 2019.
[18] G. Wang, S. Liang, et al.,”Ship Object Detection Based on Mask
RCNN,”Radio Engineering,2018, pp. 947-952.

VOLUME XX, 20 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2964055, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)

Shanfeng Bian received the B.S.E. in


college of science from HuangHuai
University, Henan in 2017, he is pursuing
the master of Signal and Information
Processing at the College of Information
Science and Engineering from Henan
University of Technology, Henan, P.R.
China. He research interests include
Intelligent information processing and
embedded system, vehicle detection and
Deeping learning.

VOLUME XX, 20 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.

You might also like