Automatic Damaged Vehicle Estimator Using Enhanced Deep Learning Algorithm
Automatic Damaged Vehicle Estimator Using Enhanced Deep Learning Algorithm
A R T I C L E I N F O A B S T R A C T
Keywords: Claim leakage costs insurance companies millions of dollars each year because of the disparity between the cost
Vehicle damage assessment spent by allowance businesses and the accurate quantity that must be reimbursed. As a result, processing claims
Deep learning for identifying and classifying automobile damages takes time and is costly for insurance providers. In this paper,
Instance segmentation, Mask R-CNN, Object
we used an improved Mask R-CNN method, which has a significant research benefit of object detection, to
detection, Transfer learning
automatically detect, identify, and categorize car damage sites in traffic incidents. To detect and label an image
of a damaged vehicle, we used a combination of deep learning, transfer learning, Mask R-CNN, and instance
segmentation. In addition, a web-based automatic claim estimator can accept photographs from the user and
determine the position and degree of the damage automatically. Furthermore, three different pre-trained models,
namely inception ResNetV2, VGG-16, and VGG-19, were used to aid quick convergence. Finally, comparative
performance assessments employ several evaluation measures such as precision, recall, F1 score, accuracy, loss
function, and confusion matrices based on the three pre-trained models. The empirical results reveal that the
proposed method not only recognizes damaged vehicles but also locates them and determines their severity level
which accomplishes the study’s objective of automatically locating and classifying car damage. According to the
data, employing Mask-RCNN with pre-trained Inception ResNetV2 outperforms the other models in all detection,
localization, and severity-damaged performance categories.
* Corresponding author.
E-mail addresses: [email protected] (J. Qaddour), [email protected] (S.A. Siddiqa).
https://doi.org/10.1016/j.iswa.2023.200192
Received 22 October 2022; Received in revised form 7 January 2023; Accepted 29 January 2023
Available online 8 March 2023
2667-3053/© 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
J. Qaddour and S.A. Siddiqa Intelligent Systems with Applications 18 (2023) 200192
(Unmanned Aerial Vehicles) for desecration measurement was given in 2. Related work
Attari et al. (2017). An amalgamated deep learning pipeline was intro
duced to detect structures followed by categorization of fine-grained Due to the extreme multiple issues connected with vehicle recogni
damage for such structures. A road damage detection and street moni tion, segmentation, and tracking’s, such as detection speed, live move
toring system were created in Alfarrarjeh et al. (2018), in which a ment, occlusion, and scaling have become popular research topics
pre-trained model called You Only Look Once (YOLO) was used to (Sudha & Priyadarshini, 2020). The researchers investigated a variety of
recognize several road damage categories as identifiable objects in im computer vision and image processing-based technologies to achieve
ages using a pre-trained model YOLO. However, the accuracy and pre vehicle recognition, segmentation, and damage detection. Some of these
cision are substantially worse, but it is a fast method. Furthermore, there works are discussed in the following subsections:
are a few challenges that are associated with the application of DL
models when it comes to image classification since the models must be 2.1. Vehicle detection using traditional image processing approaches
trained on a large number of images in order to obtain very high per
formance and precision. To begin with, it is difficult to obtain a large Image processing often includes numerous transformation and
collection of publicly available damaged automobile images. Second, extraction features, as well as an analytical technique that is investi
given a large amount of training data, DL will require a lot of processing gated for various applications such as vehicle detection and vehicle
resources to train the model. A model’s training procedure takes time as damage. One of the popular methods includes HOG (Histogram of Ori
well. Third, a DL model such as CNN will have a large number of pa ented Gradient) that is pipelined with the algorithm of k-nearest
rameters and hyperparameters that must be tuned. neighbor (KNN) (Putra et al., 2020) to attain an accuracy level of 84%
In this paper, we use CNN, transfer learning, and approaches of with linear Support Vector Machine (SVM) to get a 2.57% error in color
enhanced Mask-R-CNN to detect, locate, and categorize severity levels of space of bars showing that compression in videos can result in several
car damage through the automatic detector. Unlike earlier research, our misleading guidelines. The research in Son et al. (2019) proposed a filter
method not only finds damage, yet automatically determines its ex of correlation to detect vehicles by utilization of Canny transform first
tremity and envisages it on the vehicle’s pictures and mostly specifies followed by a Hough transform to know extract patches and lanes, then
the severity of vehicle damage levels. We blended DL, transfer learning, the analysis of correlation to get an accuracy level of 96.4% (Yuan et al.,
Mask R-CNN, and instance segmentation to identify and categorize an 2019) which is used the improved version of classes square error to
image of a damaged vehicle. Moreover, we developed a web application know the position of the targeted vehicle. Following that, feature
for automatic detection and classification estimation using photographs analysis similar to HAAR with Adaboost Classifier was performed for
of the damaged vehicle. Moreover, to enable fast convergence, three detection verification and comparison to traditional approaches. How
different pre-trained models namely inception ResNetV2 (Szegedy et al., ever, in their analysis, the loss graph revealed a gap between the testing
2017), VGG-16, and VGG-19 (Simonyan & Zisserman, 2014) were used. loss and the training loss, indicating that there is an overfitting problem.
Finally, employing the three pre-trained models, comparative perfor Furthermore, in the studies in Choudhury et al. (2017); Haselhoff and
mance assessments for precision, recall, F1 score, accuracy, loss func Kummert (2009); Al Mamun and Deb (2019), HAAR features were
tion, and confusion matrices are presented. Furthermore, we used studied, followed by additional feature extractors such as the Kalman
improved Mask R-CNN to automatically detect car damage areas in filter and triangle features using machine learning classifiers.
traffic accidents, which is an important research value and has a broad
range of application scenarios in the disciplines of object identification 2.2. Vehicle detection using deep learning approaches
and transportation. Therefore, we integrate transfer learning, Mask
R-CNN, and instance segmentation of approach for better feature The AlexNet (Krizhevsky et al., 2017) addition to the competition of
learning and damage identification to locate, identify and classify an ImageNet permitted the usage of CNN to detect objects. Following the
image of the damaged vehicle. Following are our point-by-point trend on the dataset of PASCAL VOC 2012, various tools like R-FCN,
contributions: R-CNN (Ren et al., 2015), and Fast R-CNN (Girshick et al., 2014; Gir
shick & Fast, 2015) used a mixture of unique convolution layer config
• Using images of damaged vehicles, we suggested a novel model uration with packed layers to find area proposals by utilization of
combining blended DL, transfer learning, Mask R-CNN, and instance selective research, but it proved to be sluggish. Thus, RPN (Region
segmentation to recognize, categorize, and assess the severity levels Proposal Network) was proposed to respond to the single-staged de
of accidents. tector methods like several YOLO versions (Redmon et al., 2015) and
• Comparative analysis is also performed using precision, recall, F1 (Redmon & Farhadi, 2018). Furthermore, the major aim was to improve
score, accuracy, confusion matrices, and loss functions using three the speed of detection by the creation of bounding boxes and anchor
different pre-trained models, i.e., VGG-16, VGG-19, and Inception boxes in one step for the intended item, but they sacrificed precision
ResNetV2. during detection. Upgraded YOLOV3 in Gong et al. (2020) was used to
• The findings show that the pre-trained Inception ResNetV2 model evaluate thermal captures for the detection of vehicles, getting an mAP
beat the VGG-16 and VGG-19 models in all detection, localization, of 78.77% (Kaushik et al., 2020). Furthermore, segmentation methods
and severity damage performance categories. include Mask R-CNN and Fast-RCNN vehicle detections. The research in
• Finally, we developed the web-based automatic claim estimator Mittal et al. (2020) gave an upgraded R-CNN model that was fast to
using the best pre-trained model integrated with enhanced Mask R- detect vehicles and ignore the duplicate detection issues with an 85.7%
CNN. The claim estimator can accept photographs from the user in a F1-score for heavy vehicles and the percentage for light vehicles was
traffic site incident and determine the position and degree of the 75%. Usage of a pre-trained Single Shot MultiBox detector (SSD) model
damage automatically. (Liu et al., 2016) that was implemented using the Caffe Frameworks,
which achieved 81.2% mAP for a combination of visible and infrared
The rest of the paper structure is presented as follows; Section 2 fo pictures followed by custom capture. The research in Wang et al. (2019)
cuses on related research. Section 3 introduces the framework algorithm explains ways to detect vehicles by using the point cloud of the LiDAR
for automobile damage identification, which includes a broad range of sensor and the algorithm of YOLO V3 with an accuracy level of 70.58%.
concerns such as datasets and methodology. In Section 4, we present a Moreover, the (Wang et al., 2019) study showed an accuracy of 70.58%
performance evaluation and analysis of the experimental result. In for the detection of vehicles using a LiDAR sensor and the algorithm of
Section 5, the web-based automatic claim estimator is presented. Section YOLO V3 on the dataset of KITTI (Wang et al., 2019).
6 finally presents the conclusion. On the other hand, the research in Zhao et al. (2019) wanted to attain
2
J. Qaddour and S.A. Siddiqa Intelligent Systems with Applications 18 (2023) 200192
Fig. 1. A flowchart illustrating the establishment of pipelines for analyzing vehicle damage.
the detection of objects in images of nighttime without having annota 3.1.2. Objectives of the study
tion of ground truth for targeted captures so they considered the method We explain our target tasks, which are organized into four steps (Kyu
of unsupervised training using GANs (Generative Adversarial networks) & Woraratpanya, 2020), in this section. The four tasks and their out
for image-to-image translation to transform annotated daylight pictures comes are presented as follows:
to the nighttime ones without changes in the car positions. Then, rather
than employing merely one of the aforesaid sets, a model of Faster • Task 1: Recognize the image, is it a car or not?
R-CNN is considered to train using an amalgamated data set of images • Task 2: Spot the damaged parts of the car, is it a damaged car or not?
that result in a high mAP of 88% result. As a consequence, when • Task 3: Categorize the damaged spot positions, is the damaged part
compared to the current literature, our suggested blend model using the from the front, back, or side of the car?
Inception ResNetV2 pre-trained model obtained a superior and auto • Task 4: classify the severity of the car’s damaged portion, is car
mated outcome of Precision of 89.13, hit rate of 0.98, the sensitivity of damage’s part mild, moderate, or severe?
0.91, and accuracy of 92%. Other than computer vision approaches,
there also exists some sensor-based methods such as in 3.1.3. Systemic flow of the system
An_in-Vehicle_System_and_Method_for_Dur The exchange learning test configuration is shown in Fig. 1 is what
ing_Accident_Detection_without_being_Fixed_to_Vehicle (2022); Sys we employ. As a highlight extractor, the pre-prepared model is used in
tem_and_Method_for_During_Cra Patil et al. (2017). Our framework consists of 4 phases and 3 models,
sh_Accident_Detection_and_Notification (2022). Furthermore, in order each of which is based on one of three different datasets:
to further enhance the automobile business, various new automated
ways for detecting car damages are being developed (Patil et al., 2017; • Dataset 1: includes two classes for completing task 2, damaged or
Deep Learning-Based Car Damage Classification & Detection for Auto undamaged car.
motive Industry, 2022; van Ruitenbeek & Bhulai, 2022; Digital Trans • Dataset 2: includes three classes for completing task 3, damaged
formation in Car Insurance Industry: Streamline Recognition of Car location is from the front, side, or rear.
Damage Assessment, 2022; Madheswari et al., 2022; Why Estimating • Dataset 3: includes three classes for completing task 4, the damage is
Car Damage with Machine Learning Is Hard, Available Online, 2022; minor, moderate, or severe
Ahmad et al., 2022; Automatic vehicle damage detection with images,
2022). Moreover, for diver assistance to prevent road incidents an Then we select one of the pre-trained models; VGG-16, VGG-19
automated system using deep learning is also proposed such as in the (Simonyan & Zisserman, 2014), or Inception ResNetV2 (Szegedy et al.,
work of Jaikishore et al. (Neelam Jaikishore et al., 2022). 2017). More precisely, the first phase determines whether or not an
automobile exists. Then an automobile is based on input data that is
3. Proposed methodology either uploaded as an image or derived from existing databases. In the
second phase, after selecting and testing the model, we create model 1
Our suggested framework for an automatic claim damaged estimator using dataset 1 to determine whether or not the automobile is damaged.
is separated into several steps. The next subsections go through in During the third step, the generated and trained model named model 2
further detail: with dataset 2 will be utilized to locate the damaged portion of the
automobile. Finally, in the final step, we estimate the severity of the
damaged portion of the automobile, which is referred to as model 3
3.1. Damaged detection and classification using dataset 3. Fig. 1 depicts the above flowchart of the above phases
and their details. The flowchart diagram will assist in analyzing the
This work proposes a technique for detecting, recognizing, local overview of the system working i-e car’s defects, locating them, classi
izing, and categorizing vehicle damage using several images of the fying, and categorizing the severity of the damages.
damaged vehicle. To obtain the desired outcomes, deep learning CNN
and transfer learning approaches from several pre-trained backbone
models are used. Fig. 1 depicts our suggested technique workflow:
3
J. Qaddour and S.A. Siddiqa Intelligent Systems with Applications 18 (2023) 200192
Fig. 2 depicts the test images from each class. Because of the great
similarities between classes, the underlying task is non-immaterial.
Because the mischief only covers a limited portion of the image, the
new strategy work becomes much more dynamically unpleasant.
4
J. Qaddour and S.A. Siddiqa Intelligent Systems with Applications 18 (2023) 200192
and limit an overfitting problem, as illustrated in Fig. 3. We train the last • Nine to sixteen convolution hidden layer uses 512 feature kernel
two neural network layers and freeze all other weights to avoid filter.
long-time training. To improve the model’s performance and reduce • Seventeen and eighteen fully connected hidden layers have 4096
overfitting, factors such as regularization are used. As techniques like nodes with ReLU.
transfer learning and CNN are less-time consuming in the phase of • Layer nineteen (the last convolution layer) has 1000 nodes with
training, hence we used this learning strategy to get the best parameters softmax.
in a shorter amount of time. We proposed three different pre-trained
models VGG-16, VGG-19 (Simonyan & Zisserman, 2014), and We used both pre-trained VGG-16 and VGG-19 models, which
Inception-ResnetV2 (Szegedy et al., 2017), detail of the pre-trained appeared to have been previously trained on massive benchmark data
model are given in the coming backbone section. sets such as ImageNet (Russakovsky et al., 2015), as a pre-trained model
to perform image categorization tasks and possess the accurate weights
to be implied as our backbone model via transfer learning to reduce
3.2. Destructed functionalization and anticipation training time in our models.
C) Inception-ResNet-V2:
In this regard, we proposed the improved Mask R-CNN and material Inception-ResNet-V2 [19]model is an adaptation of the ResNet-V2
identification, classification, and dismemberment approaches to find model, which is a convolutional neural network classify images into
and portray damage in vehicle images. (He et al., 2017). Mask R-CNN is 1000 object categories. Architectures that are built on the inception
an improvement of Faster R-CNN (Ren et al., 2015), in which a third family architectures and have 164 layers deep, and are trained on
branch is added to output the object mask. Moreover, a RoIAlign pro million images using ImageNet database (Russakovsky et al., 2015), will
cedure is used instead of RoI pooling to create instance segmentation result in lower error rate and can achieve more complexity on both
masks for better accuracy. As illustrated in Fig. 5, Mask R-CNN includes dimensional and avenue-wise. This standard, on the other hand, is made
a completely connected neural network for classification and up of an object known as an "Inception cell" into which an array of
bounding-box prediction, as well as another fully connected CNN for complexity is done at various stages and finally accumulates the result.
extracting instance segmentation masks. The initiation site carries out interconnect association while disregard
ing contagious dimensions using the 1 × 1 convolution, which is
3.2.1. Backbone networks directed by cross-channel and cross-spatial correlations using the 5 × 5
In the Mask R-CNN model, the backbone network is a CNN that is as well as 3 × 3 filters. Then, overall layers pass into aspect reduction to
utilized as attribute withdrawal, with few attributes retrieved from the result in 1 × 1 convolution. This model has been trained on over a
constitutional surface and craved features uprooted from ensuing levels. million images from the ImageNet dataset.
Additionally, when images go across the backbone network, they are
processed to extract feature maps. It is usually pre-trained on ImageNet 3.2.2. RPN (region proposal networks)
(Russakovsky et al., 2015) (Krizhevsky et al., 2017). Therefore, the The RPN is known as CNN in which properties of the backbone
backbone is used as a feature extractor, which gives you a feature map network are used as inputs to predict the location of an anchor (fore
representation of the input. In this paper, we used three different ground or background). The RPN takes the feature maps of varied sizes
backbone models including VGG-16, VGG-19 (Simonyan & Zisserman, generated by the Feature Pyramid Network (FPN) and extracts the fea
2014), and Inception-ResNetV2 [19]to extract different feature maps tures of ROIs from different feature pyramid levels relating to the target
from the input images. The details of the pre-trained model are given in object size. As a result, the sample network’s structure adapts without
the following subsections: increasing the amount of processing, dramatically improving the iden
tification of smaller objects while maintaining speed and accuracy. In
A) VGG16: this situation, a sliding window traversing the maps creates a collection
of anchorages with differentiating proportions and scales that can be
VGG-16 is known as a network that was proposed by Simonyan used to forecast whether an object is in the background or front. We
(Simonyan & Zisserman, 2014). It is comprised of three connected layers employ the Non-Maximum Suppression (NMS) approach with a 0.7
and 13 convolutional layers with activation of Rectified Linear Unit Intersection over Union (IOU) criteria to decrease redundancy since the
(ReLU). The network uses filters of very small convolution such as 3 × 3 anchors overlap.
and 2 × 2 which further includes parameters of 138 million. Model of
VGG-16 attains 92.7% belonging to the top five accuracies of test on a 3.2.3. Region of interest alignment (RoI align)
dataset of ImageNet and awarded in 2014 ILSVRC challenge with first The RoI section may possess varied dimensions due to the RPN
and second places on tasks related to image classification and object bounding box refinement process. RoI attributes must be associated to
localization with the error ratings of 25.32% and 7.32% respectively. As hold a similar dimension as that of the RoIAlign aiming to establish a
a result, VGG-16 is one of the most widely utilized architectures for correct mask with Mask R-CNN. Faster R-CNN utilizes the RoIPool
object detection. technique to discretize the attribute map and establishes misalignments
B) VGG19: among the ROIs as well as withdrawal attributes, whereas He et al. (He
Simonyan (Simonyan & Zisserman, 2014), upgraded the deeper et al., 2017) introduced the RoIAlign Technique, which utilizes bi-linear
version of VGG-16 which is known as VGG-19 based on the previous interpolation (Wang & Yang, 2008) to analyze the correct worth of at
version as a 19-layer that was further composed of three connected tributes and so could be accumulated. As a result, Mask R-CNN is
layers and sixteen convolutional layers with activation of ReLU. Some of enhanced using Faster R-CNN, and the ROI-Pooling layer is transformed
the structures of VGG-19 are (Simonyan & Zisserman, 2014): into the interest-region alignment layer (RoIAlign). The spatial infor
mation is kept on the feature map by applying bi-linear interpolation.
• First and second convolution hidden layers use 64 feature kernel RoIAlign is an alignment layer of an interesting area that differs from
filters RoIPool in that it bypasses the quantization phase and is unable to
• The third and fourth convolution hidden layers uses 124 feature quantize the RoI border. Rather, it does the calculation of the sample
kernel filters and the output decreases by four times from the input of points’ exact position in every unit utilizing bi-linear interpolation while
224×224×3 to the output of 56×56×128. retaining the decimal and generating the RoI of the last fixed size using
• The fifth to eighth convolutions hidden layer uses a 256-feature the procedure of average pooling and maximum pooling. In Fig. 4, the
kernel filter. blue solid line shows the 5 × 5 after convolution feature map, and the
5
J. Qaddour and S.A. Siddiqa Intelligent Systems with Applications 18 (2023) 200192
red line shows the feature smaller block that corresponds to the feature unique object of interest). Mask R-CNN is a simple, uncomplicated
map RoI as shown in the diagram. The small blockage is categorized into addition to Faster R-CNN, which operates at 5 frames per second. Due to
two tiny ones with less quantized boundary then it was further catego the fact that pixel segmentation requires a much finer-grained alignment
rized into four small blocks and this time with no quantized boundary. than bounding boxes, the Mask R-CNN enhances the RoI pooling layer
The point of the center is considered as 4 coordinate points as shown by (named "RoI Align layer") to make it easier and more accurate to map
the blue dots. After that, the values of all positions are calculated using RoI to the original image areas. It leverages a similar architecture to
bi-linear interpolation using the average pooling or maximum pooling Faster R-CNN for object detection. Mask R-CNN leverages RoI alignment
operation (Zhang et al., 2020). as opposed to RoI pooling to enable pixels to maintain ROIs and prevent
data loss.
3.2.4. Mask RCNN and loss function improvement The RPN which searches all FPNs from top to bottom and suggests
He et al. (He et al., 2017) extend Faster RCNN by adding a branch for regions that may contain artifacts is also employed. It employs anchors,
each region of interest (RoI) to predict segmentation masks, which is which are a set of predefined position boxes and sizes based on the input
called Mask R-CNN. A tiny fully convolutional network (FCN) is added individual anchors are given ground-truth groups and bounding boxes.
to each region of interest (RoI) as a mask branch, anticipating a RPN generates 2 outputs for each anchor: anchor type and bounding box
pixel-by-pixel Segmentation mask [66]. Mask R-CNN extends Faster parameters. The anchor class might be either foreground or shadow
R-CNN to pixel-level image segmentation, as described in He et al. class. Faster R-CNN has a module called RoI Pooling, which is different
(2017). The concept is to separate the categorization prediction and from Mask R-CNN. RoIPool’s feature map regions were somewhat mis
pixel-level masking functions. In addition to the existing classification aligned with the original picture regions, according to the Mask R-CNN
and localization branches, a third branch based on the architecture of developers. This contributes to mistakes since image segmentation at the
Faster R-CNN was introduced to predict an object mask. The mask pixel level of the image demands accuracy. RoIAlign was used to tackle
branch, as shown in the last graphs of Fig. 5, is a thin, fully connected this problem, in which the function map was sampled at various posi
network (FCN) that is added to each RoI and predicts a pixel-by-pixel tions, and then a bilinear interpolation was applied.
segmentation mask. Semantic segmentation using fully convolutional Instance segmentation is a complex problem that involves combining
n/w (FCN). FCN generates masks (binary masks in our case) around two independent image tasks, such as object detection and semantic
objects of bounding boxes by classifying each field pixel by pixel (a segmentation (Lin et al., 2014). Faster R-CNN and Mask R-CNN employ
6
J. Qaddour and S.A. Siddiqa Intelligent Systems with Applications 18 (2023) 200192
an extra branch for mask prediction that is parallel to the current two Table 1
branches, as illustrated in the last graphs in Fig. 5. Mask-RCNN’s Parameters of actual and prediction classifications.
multi-task losses include classification, segmentation mask branch loss, Prediction Classification
and bounding box regression losses. The category prediction is tied to
Total Population Positive Negative
the branch of classification, but the classification loss is linked to the (P + N) Prediction (PP) Prediction (NP)
class ground truth. Mask R-CNN loss’s function is defined as multi-task Actual Positive (P) TP (True Positive) FN (False
loss (He et al., 2017) on each sampled RoI as: Classification Negative)
Negative (N) FP (False TN (True negative)
L = Lcls + Lbox + Lmask (1) Positive)
L stands for total training loss, Lcls for classification loss, Lbox for
bounding-box loss, and Lmask for mask loss, we can expand the first two and a segmentation mask. The working of MASK-RCNN is different from
terms of Eq. (1) as follow: that of a traditional image classification model. For example, if we
observe Fig. 5, the MASK-RCNN first extracts the features to result in
Lcls + Lbox = Lcls (p, u) + λ [u ≥ 1] Lloc (tu , v) (2) activation maps, and later on, the model is partitioned into two sub-
modules that is region-proposal network and ROI alignment. The pur
Where u is the label of each training RoI with a ground-truth class and v pose of RPN (Region proposal network) is to be executed once per image
is a label of each RoI with a ground truth bounding regression target; tu to provide a set of region proposals. Region proposals are feature map
= (txu , tyu , twu , thu ), specifies a scale-invariant translation and log-space regions that include the object. These objects might be damaged areas of
height/width shift relative to u class; p = (p0 , p1 , …, pn ) represents cars in our case. The classifier predicts bounding boxes as well as object
the probability distribution over N + 1 categories; [u ≥ 1] denotes the class for every suggested region generated in stage 1. Each suggested
Iverson bracket indicator function that evaluates to 1 when u ≥ 1 and region can be of varying sizes, but fully linked layers in networks
0 otherwise. For bounding-box regression, the loss is defined as: required a set of size vectors to generate forecasts. The size of these
∑ ( ) proposed regions is determined through either the RoI pool or the
Lloc (tu , u) = smoothL1 tiu − vi (3)
i∈x,y,w,h RoIAlign technique. The RoIAlign layer’s result is therefore passed
further into the Mask head that has two convolution layers. It creates a
In which, mask for every RoI, segmenting a damaged car image pixel by pixel. This
{
0.5x2 if |x| < 1 module localizes the given car image’s damage as front, side, or rear.
smoothL1 (x) = (4) Subsequently, in the last step, the CNN model is trained to classify the
|x| − 0.5 otherwise
severity of the damage as mild, moderate, or severe. This CNN model is
If the regression targets are unbounded, Eq. (4) eliminates the based on pre-trained VGG16, VGG19, and Inception-ResNetV2. Hence,
sensitivity. The hyper-parameter λ in Eq. (2) controls the balance be as an excellent practical use of deep learning, all of these modules are
tween the two task losses. λ = 1 is used in most experiments. The Lmask is linked to providing a completely automated automotive damage esti
calculated by taking the average cross-entropy of all pixels on the RoI, as mate. A simple pseudo code of proposed framework is also given above.
shown below:
1 ∑ [ ) ( ) ( ) 4. Results and discussions
Lmask = yij log (aij + 1 − yij log 1 − aij ] (5)
m2 1≤i,j≥m
In this part, we examine our findings and provide appropriate ex
1 planations and analysis. Furthermore, the assessment measures used to
yij = xi
(6) evaluate the proposed work are also described in this section. In addi
(1 + e− )
tion, all parameters of the model are fine-tuned to their best values.
1
aij = (7)
(1 + e− bi ) 4.1. Evaluation metrics
Where yij is the label of a cell (i, j) in the true mask for the region of
Table 1 illustrates the parameters of actual and prediction classifi
size mxm; aij is the predicted value of the same cell in the mask learned
cations and the most common metrics of object detection and classifi
for the ground-truth class N. Moreover, xi and bi are the predicted and
cation to evaluate model performance are the following:
true values of the ith component in the positive RoI respectively (Gir
A) Intersection over Union (IoU): It is the measure of intersection
shick, 2015). Summarizing the core endpoints, the emphasis is placed on
area among the predicted map A of predicted segmentation and the map
the optimization of the information of location for large targets. The
B ground truth divided by the total (union) of both A and B, and the
focus is on the optimization of category prediction for small targets.
range is between [0 and 1].
Different weights must be incorporated to improve the detection accu
racy of the detection branches for scale targets in the loss function. A∩B
IoU = (8)
A∪B
3.2.5. Model architecture layer-by-layer The Mean of IoU is defined as the average IoU over all classes.
The suggested model’s architecture is made up of three major ele B) Precision, Recall, F1 score, and accuracy: They can be specified
ments. The first and second modules are intended to determine whether for each class as well as at the aggregate level in the following ways:
or not the image provided contains cars and whether or not the car is
damaged. This is possible due to pre-trained convolutional neural net P=
TP
(9)
works (CNN) made up of VGG16, VGG19, and Inception-ResNetV2. The TP + FP
first layers of VGG16, VGG19, and Inception-ResNetV2 extract visual In the formula, FP is the number of +ve samples detected as negative
features producing refined and downscaled feature maps known as samples (Zhao et al., 2019). The Sensitivity, Hit rate, or Recall rate is
activation maps. These features are transmitted to fully connected layers computed using Eq. (10):
(also known as dense layers) followed by sigmoid activation to deter
TP
mine whether or not the car is damaged. Following the detection of R= (10)
damage in the car, we transmit the imag e to the MASK-RCNN to localize TP + FN
the region where the car is damaged in the form of both a bounding box In the above equation, TP relates to the number of +ve samples that
7
J. Qaddour and S.A. Siddiqa Intelligent Systems with Applications 18 (2023) 200192
Table 2
Performance of Damage Severity Classification.
Damage Detection Performances Damage Location Performances Damage Severity Performances
Metric Precision Recall F1- Accuracy Precision Recall F1- Accuracy Precision Recall F1- Accuracy
(%) Score (%) (%) Score (%) (%) Score (%)
VGG-16 83.5 0.97 0.89 90.4 74.33 0.76 0.74 83 65.66 0.67 0.66 77
VGG-19 86.09 0.95 0.90 91.09 70.66 0.74 0.71 82 66.33 0.67 0.66 78
Inception- 89.13 0.98 0.91 92 80.4 0.80 0.78 85 67.12 0.70 0.70 80
ResnetV2
are correctly tested, and FN relates to the number of − ve samples tested Inception-ResNetV2, VGG-16, and VGG19, and feed the identical
as the +ve ones (Zhao et al., 2019). The accuracy is computed using Eq. training and testing data to the pre-trained models, which will use 100
(11). epochs.
TP + TN
A= (11) 4.2.1. Quantitative performance of proposed deep learning model
TP + FP + TN + FN
Table 2 shows the differences in damage identification, location, and
Similarly, F1-Score is given by: severity categorization between the three pre-trained models. The pro
PxR posed claim damaged estimator model has been introduced to classify,
F1 − Score = 2 × (12) detect, and visualize damaged vehicles. Furthermore, we used four
P+R
distinct measures to assess the performance of different transfer learning
F1-Score is called harmonic mean (average value), which is the models: precision, recall (sensitivity), F1-score (harmonic mean), and
combined measure of both precision and recall as shown in equation accuracy. The higher those criteria are, the better our model is. As it is
(12). shown in Table 2, we find that the pre-trained model Inception-
ResNetV2 outperformed both VGG-16 and VGG-19 in all categories, of
4.2. Performance evaluation and analysis of experimental results detection, localization, and severity-damaged performances. The
Inception-ResNetV2 accuracies for detection are 92%, localization is
We evaluate the proposed deep learning technique’s capacity to 85%, and severity damage is 80%. Damage localization is more efficient
detect, identify, categorize, and portray vehicle damage in this section. using the Intercept-ResNetV2 pre-trained model, which has a precision
In this research, we will use three distinct backbone pre-trained models, of 80.4% compared to other VGG-16, and VGG-19 of 74.33% and
8
J. Qaddour and S.A. Siddiqa Intelligent Systems with Applications 18 (2023) 200192
Fig. 7. Evolution of Loss Function for (i) VGG16, (ii) VGG19, and (iii) Inception-ResNetV2.
70.66% respectively. Furthermore, the VGG-16 and VGG-19 model 4.2.2. Confusion matrices
performs poorly in damage severity classification accuracy with just We also calculated the confusion matrices using the three different
77%, and 78% accuracies compared to 80% accuracy for the pre-trained models (a) VGG-16, (b) VGG-19, and (c) Incept ResNetV2 as
theiInception-ResNetV2 model. In all tasks, the Inception-ResNetV2 shown in Fig. 6. The performance evaluation of the various transfer
results beat the other two models. learning models used in this paper uses three different metrics: Preci
sion, Recall, and F1-score. The one with the higher metrics is the best
model. The confusion matrices conclude the normal predicted values of
9
J. Qaddour and S.A. Siddiqa Intelligent Systems with Applications 18 (2023) 200192
Table 3
Comparison with existing research studies.
No# Existing works Model Used Detection Localization Severity
performance performance performance
10
J. Qaddour and S.A. Siddiqa Intelligent Systems with Applications 18 (2023) 200192
Fig. 9. Car Damage Assessment Website (a) Front Page (b) Final Page Result.
employed our upgraded Mask R-CNN with pre-trained Incept ResNetV2 • Gate 1: This determines whether or not the image uploaded contains
results. Fig. 9(a) and (b) illustrate the web-based automatic estimator a car.
front page. In addition, Fig. 10 shows the working results of the auto • Gate 2: Check that the provided image of the automobile is not
matic claim estimator that we designed and implemented. The model damaged to avoid fraudulent claims.
must pass several tests, the first of which verifies that the image is true of • Damage Location: The image is compared to a pre-trained model to
a vehicle, and the second of which verifies that it is damaged. These are determine if the damage is on the front, rear, or side.
the preliminary inspections that must be completed before the analysis • Damage Severity: The image is compared to pre-trained models to
begins. The damage check will begin after all the gate checks have been determine if the damage is mild, moderate, or severe.
verified. The model will estimate the damage’s location, such as front, • Results: The results are sent to the user and a third party. Fig. 10
side, or rear, as well as its severity degree, such as mild, moderate, or depicts the various testing results of our automatic claim estimator.
severe.
The system architecture of the automatic claim estimator is The proposed model is developed using Tensor flow, Keras, which is
comprised of the following steps: a deep learning library. The Numpy library for scientific numerical
calculations and Scikit–learn that consists of machine learning algo
• Input: The user first submits an image of the damaged car. rithms tools. Furthermore, PyCharm IDE and Jupyter Notebooks are
used as Development platforms for python codes and Web applications
11
J. Qaddour and S.A. Siddiqa Intelligent Systems with Applications 18 (2023) 200192
12
J. Qaddour and S.A. Siddiqa Intelligent Systems with Applications 18 (2023) 200192
Appendix
Acronym Meaning
DL Deep Learning
AV Autonomous Vehicle
CNN convolutional neural network
DSSD DE convolutional Single Shot Detector
GAN Generative Adversarial Network
ML- FPN Multi-level Feature Pyramid Network
MSE means square error
R-CNN Region Based Convolutional Neural Networks
RoI Region of Interest
SOD Salient Object Detection
SVD Singular Value Decomposition
VHR Very High Resolution
BB Bounding Box
DCN Deformable Convolutional Network
FC Fully-connected
GPU Graphics Processing Unit
MR-CNN Multi-Region CNN
NAS Neural Architecture Search
ReLU Rectified Linear Unit
RPN Regions Proposal Network
SPP Layer Spatial Pyramid layer
SVM Support Vector Machine
YOLO You Only Live Once
BN Batch normalization
DSOD Deeply Supervised Object Detectors
FPN Feature Pyramid Network
ILSVRC ImageNet Large Scale Visual Recognition Challenge
MS COCO Microsoft Common Objects in Context
Pascal VOC PASCAL Visual Object Classes
RGB-D Red, Green Blue-Depth
SGD Stochastic Gradient Descent
SSD Single Shot Detector
VGG Visual Geometry Group
KNN k-nearest neighbors’ algorithm
References n/image-annotation/deep-learning-based-car-damage-classification-and-detection-
for-automotive-industry, Accessed on Nov, 2022”.
Dhieb, N., Ghazzai, H., Besbes, H., & Massoud, Y. (2019a). Extreme gradient boosting
Abdulla, W. “Mask R-CNN for object detection and instance segmentation on Keras and
machine learning algorithm for safe auto insurance operations. In 2019 IEEE
TensorFlow,” 2017.
international conference on vehicular electronics and safety (ICVES) (pp. 1–5).
Ahmad, A. B., Saibi, H., Belkacem, A. N., & Tsuji, T. (2022). Vehicle Auto-Classification
Dhieb, N., Ghazzai, H., Besbes, H., & Massoud, Y. (2019b). A very deep transfer learning
Using Machine Learning Algorithms Based on Seismic Fingerprinting. Computers, 11,
model for vehicle damage detection and localization. In 2019 31st International
148.
Conference on Microelectronics (ICM) (pp. 158–161).
Al Mamun, M. A., & Deb, K. (2019). An approach for recognizing vehicle based on
"Digital Transformation in Car Insurance Industry: Streamline Recognition of Car
appearance. In 2019 International Conference on Computer, Communication, Chemical,
Damage Assessment, Available Online, https://www.altamira.ai/ai-powered-car
Materials and Electronic Engineering (IC4ME2) (pp. 1–4).
-damage-assessment/, Accessed on, Nov, 2022".
Alfarrarjeh, A., Trivedi, D., Kim, S. H., & Shahabi, C. (2018). A deep learning approach
Dwivedi, M., Malik, H. S., Omkar, S., Monis, E. B., Khanna, B., & Samal, S. R. (2021).
for road damage detection from smartphone images. In 2018 IEEE International
Deep learning-based car damage classification and detection. Advances in artificial
Conference on Big Data (Big Data) (pp. 5201–5204).
intelligence and data engineering (pp. 207–221). Springer.
“An_in-Vehicle_System_and_Method_for_During_Accident_Detection_without_being_Fixed_
Girshick, R., & Fast, R. (2015). In IEEE Int. Conf. Comput. Vis (pp. 7–13).
to_Vehicle, 2022 Available Online https://www.researchgate.net/publication/3422
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for
33982_An_in-Vehicle_System_and_Method_for_During_Accident_Detection_without_
accurate object detection and semantic segmentation. In Proceedings of the IEEE
being_Fixed_to_Vehicle”.
conference on computer vision and pattern recognition (pp. 580–587).
Attari, N., Ofli, F., Awad, M., Lucas, J., & Chawla, S. (2017). Nazr-CNN: Fine-grained
Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on
classification of UAV imagery for damage assessment. In 2017 IEEE International
computer vision (pp. 1440–1448).
Conference on Data Science and Advanced Analytics (DSAA (pp. 50–59).
Gomes, C., Jin, Z., & Yang, H. (2021). Insurance fraud detection with unsupervised deep
"Automatic vehicle damage detection with images, Available Online, https://repositorio
learning. Journal of Risk and Insurance, 88, 591–624.
-aberto.up.pt/bitstream/10216/107814/2/219929.pdf, Accessed on Nov, 2022".
Gong, J., Zhao, J., Li, F., & Zhang, H. (2020). Vehicle detection in thermal images with an
Bhatt, C., Kumar, I., Vijayakumar, V., Singh, K. U., & Kumar, A. (2021). The state of the
improved yolov3-tiny. In 2020 IEEE international conference on power, intelligent
art of deep learning models in medical science and their challenges. Multimedia
computing and systems (ICPICS) (pp. 253–256).
Systems, 27, 599–613.
Harshani, W. R., & Vidanage, K. (2017). Image processing based severity and cost
Choudhury, S., Chattopadhyay, S. P., & Hazra, T. K. (2017). Vehicle detection and
prediction of damages in the vehicle body: A computational intelligence approach.
counting using haar feature-based classifier. In 2017 8th annual industrial automation
In 2017 National Information Technology Conference (NITC) (pp. 18–21).
and electromechanical engineering conference (IEMECON) (pp. 106–109).
Haselhoff, A., & Kummert, A. (2009). A vehicle detection system based on haar and
Chua, A. C., Mercado, C. R. B., Pin, J. P. R., Tan, A. K. T., Tinhay, J. B. L., & Dadios, E. P.
triangle features. In 2009 IEEE intelligent vehicles symposium (pp. 261–266).
(2021). Damage Identification of Selected Car Parts Using Image Classification and
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the
Deep Learning. In 2021 IEEE 13th International Conference on Humanoid,
IEEE international conference on computer vision (pp. 2961–2969).
Nanotechnology, Information Technology, Communication and Control, Environment,
Imaam, F., Subasinghe, A., Kasthuriarachchi, H., Fernando, S., Haddela, P., &
and Management (HNICEM) (pp. 1–5).
Pemadasa, N. (2021). Moderate Automobile Accident Claim Process Automation
“Deep Learning-Based Car Damage Classification and Detection for Automotive Industry,
2022 Available Online, https://kili-technology.com/data-labeling/computer-visio
13
J. Qaddour and S.A. Siddiqa Intelligent Systems with Applications 18 (2023) 200192
Using Machine Learning. In 2021 International Conference on Computer Sünderhauf, N., Brock, O., Scheirer, W., Hadsell, R., Fox, D., & Leitner, J. (2018). The
Communication and Informatics (ICCCI) (pp. 1–6). limits and potentials of deep learning for robotics. The International Journal of
Károly, A. I., Galambos, P., Kuti, J., & Rudas, I. J. (2020). Deep learning in robotics: Robotics Research, 37, 405–420.
Survey on model structures and training strategies. IEEE Transactions on Systems, Sharma, A., Verma, A., & Gupta, D. (2019). Preventing Car Damage using CNN and
Man, and Cybernetics: Systems, 51, 266–279. Computer Vision. International Journal of Innovative Technology and Exploring
Karande, K. P. M. K. S. (2022). Deep Learning Based Car Damage Classification. TCS Engineering (IJITEE), 9, 1–5.
Innovation Labs, Pune, India. K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image
Kaushik, S., Raman, A., & Rao, K. R. (2020). Leveraging Computer Vision for Emergency recognition," arXiv preprint arXiv:1409.1556, 2014.
Vehicle Detection-Implementation and Analysis. In 2020 11th International Singh, R., Ayyar, M. P., Pavan, T. V. S., Gosain, S., & Shah, R. R. (2019). Automating car
Conference on Computing, Communication and Networking Technologies (ICCCNT) (pp. insurance claims using deep learning techniques. In 2019 IEEE Fifth International
1–6). Conference on Multimedia Big Data (BigMM) (pp. 199–207).
Kim, B., & Lee, J. (2019). A video-based fire detection using deep learning models. Son, C., Park, S., Lee, J., & Paik, J. (2019). Context Aware Vehicle Detection using
Applied Sciences, 9, 2862. Correlation Filter. In 2019 IEEE International Conference on Consumer Electronics
Kim, J.-. M., Yum, S.-. G., Park, H., & Bae, J. (2022). Strategic framework for natural (ICCE) (pp. 1–2).
disaster risk mitigation using deep learning and cost-benefit analysis. Natural Sudha, D., & Priyadarshini, J. (2020). An intelligent multiple vehicle detection and
Hazards and Earth System Sciences, 22, 2131–2144. tracking using modified vibe algorithm and deep learning algorithm. Soft Computing,
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). Imagenet classification with deep 24, 17417–17429.
convolutional neural networks. Communications of the ACM, 60, 84–90. Supraja, K., & Saritha, S. (2017). Robust fuzzy rule based technique to detect frauds in
Kyu, P. M., & Woraratpanya, K. (2020). Car damage detection and classification. In vehicle insurance. In 2017 International Conference on Energy, Communication, Data
Proceedings of the 11th international conference on advances in information technology Analytics and Soft Computing (ICECDS (pp. 3734–3739).
(pp. 1–6). “System_and_Method_for_During_Crash_Accident_Detection_and_Notification, 2022
Lin, T.-. Y., Maire, M., Belongie, S., Hays, J., Perona, P., & Ramanan, D. (2014). Microsoft Available Online https://www.researchgate.net/publication/342233879_System_a
coco: Common objects in context. In European conference on computer vision (pp. nd_Method_for_During_Crash_Accident_Detection_and_Notification”.
740–755). Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. A. (2017). Inception-v4, inception-
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., & Fu, C.-. Y. (2016). Ssd: Single resnet and the impact of residual connections on learning. In Thirty-first AAAI
shot multibox detector. In European conference on computer vision (pp. 21–37). conference on artificial intelligence.
A.N. Madheswari, J. Haripriya, G. Kiruthika, and R. Meyammai, "Exterior Vehicular van Ruitenbeek, R., & Bhulai, S. (2022). Convolutional Neural Networks for vehicle
Damage Detection using Deep Learning." 2022. damage detection. Machine Learning with Applications, Article 100332.
Mittal, U., Potnuru, R., & Chawla, P. (2020). Vehicle detection and classification using Wang, Y., & Xu, W. (2018). Leveraging deep learning with LDA-based text analytics to
improved faster region based convolution neural network. In 2020 8th International detect automobile insurance fraud. Decision Support Systems, 105, 87–95.
Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Wang, S., & Yang, K. (2008). An image scaling algorithm based on bilinear interpolation
Directions)(ICRITO) (pp. 511–514). with VC++. Techniques of Automation and Applications, 27, 44–45.
Neelam Jaikishore, C., Podaturpet Arunkumar, G., Jagannathan Srinath, A., Vamsi, H., Wang, H., Lou, X., Cai, Y., Li, Y., & Chen, L. (2019). Real-time vehicle detection
Srinivasan, K., & Ramesh, R. K. (2022). Implementation of Deep Learning Algorithm algorithm based on vision and lidar point cloud fusion. Journal of Sensors, 2019.
on a Custom Dataset for Advanced Driver Assistance Systems Applications. Applied Waqas, U., Akram, N., Kim, S., Lee, D., & Jeon, J. (2020). Vehicle damage classification
Sciences, 12, 8927. and fraudulent image detection including moiré effect using deep learning. In 2020
Patil, K., Kulkarni, M., Sriraman, A., & Karande, S. (2017). Deep learning based car IEEE Canadian Conference on Electrical and Computer Engineering (CCECE) (pp. 1–5).
damage classification. In 2017 16th IEEE international conference on machine learning Wassel, M. (2018). Property Casualty: Deterring Claims Leakage in the Digital Age.
and applications (ICMLA (pp. 50–54). Cognizant Insurance Practice. Tech. Rep.
Putra, F. A. I. A., Utaminingrum, F., & Mahmudy, W. F. (2020). HOG feature extraction "Why Estimating Car Damage with Machine Learning Is Hard, Available Online, https
and KNN classification for detecting vehicle in the highway. IJCCS (Indonesian ://www.altoros.com/blog/automating-car-damage-estimation-for-insurance-with-
Journal of Computing and Cybernetics Systems), 14, 231–242. machine-learning/, Accessed On, Nov, 2022".
J. Redmon and A. Farhadi, "Yolov3: An incremental improvement," arXiv preprint arXiv: Yuan, C., Huo, C., Tong, Z., Men, G., & Wang, Y. (2019). Research on Vehicle Detection
1804.02767, 2018. Algorithm of Driver Assistance System Based on Vision. In 2019 Chinese Control And
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You look only once: Unified real- Decision Conference (CCDC) (pp. 1024–1027).
time object detection," arXiv preprint arXiv:1506.02640, 2015. Zhang, Q., Chang, X., & Bian, S. B. (2020). Vehicle-damage-detection segmentation
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object algorithm based on improved mask RCNN. IEEE Access, 8, 6997–7004.
detection with region proposal networks. In Advances in neural information processing Zhao, K., Ren, X., Kong, Z., & Liu, M. (2019). Object detection on remote sensing images
systems, 28. using deep learning: An improved single shot multibox detector method. Journal of
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., & Ma, S. (2015). Imagenet Electronic Imaging, 28, Article 033026.
large scale visual recognition challenge. International journal of computer vision, 115,
211–252.
14