0% found this document useful (0 votes)
54 views14 pages

Automatic Damaged Vehicle Estimator Using Enhanced Deep Learning Algorithm

Uploaded by

Second Space
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views14 pages

Automatic Damaged Vehicle Estimator Using Enhanced Deep Learning Algorithm

Uploaded by

Second Space
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Intelligent Systems with Applications 18 (2023) 200192

Contents lists available at ScienceDirect

Intelligent Systems with Applications


journal homepage: www.journals.elsevier.com/intelligent-systems-with-applications

Automatic damaged vehicle estimator using enhanced deep


learning algorithm
Jihad Qaddour *, Syeda Ayesha Siddiqa
School of Information Technology, Illinois state University, Normal, IL, USA

A R T I C L E I N F O A B S T R A C T

Keywords: Claim leakage costs insurance companies millions of dollars each year because of the disparity between the cost
Vehicle damage assessment spent by allowance businesses and the accurate quantity that must be reimbursed. As a result, processing claims
Deep learning for identifying and classifying automobile damages takes time and is costly for insurance providers. In this paper,
Instance segmentation, Mask R-CNN, Object
we used an improved Mask R-CNN method, which has a significant research benefit of object detection, to
detection, Transfer learning
automatically detect, identify, and categorize car damage sites in traffic incidents. To detect and label an image
of a damaged vehicle, we used a combination of deep learning, transfer learning, Mask R-CNN, and instance
segmentation. In addition, a web-based automatic claim estimator can accept photographs from the user and
determine the position and degree of the damage automatically. Furthermore, three different pre-trained models,
namely inception ResNetV2, VGG-16, and VGG-19, were used to aid quick convergence. Finally, comparative
performance assessments employ several evaluation measures such as precision, recall, F1 score, accuracy, loss
function, and confusion matrices based on the three pre-trained models. The empirical results reveal that the
proposed method not only recognizes damaged vehicles but also locates them and determines their severity level
which accomplishes the study’s objective of automatically locating and classifying car damage. According to the
data, employing Mask-RCNN with pre-trained Inception ResNetV2 outperforms the other models in all detection,
localization, and severity-damaged performance categories.

1. Introduction claim process to provide visual examination and validation (Imaam


et al., 2021).
Processing claims for identifying and classifying automotive dam­ As indicated in Supraja and Saritha (2017), AI has been successful in
ages is time-consuming as well as costly for insurance providers. proving the efficacy level to detect fraud for claims of alleged collusion.
Furthermore, poor decision-making, fraud, and processing cost insur­ On the other hand, a few researchers worked on building services related
ance firms millions of dollars. As a result, they are the first to invest in AI to automated visual recognition to give different solutions to insurance
(Artificial Intelligence), increased automation, and other technologies. companies to find and detect vehicle damages. The study presented in
(N. Dhieb et al., 2019). Moreover, the disparity between the cost spends Patil et al. (2017) used the method of deep learning to find damages to
by allowance companies and the accurate cost that must be paid is called cars (N. Dhieb et al., 2019). Convolutional Neural Networks (CNNs),
claims leakage (Wassel, 2018), which costs vehicle insurance companies transfer information conditioned on Visual Geometry Group-16
a lot of money and results in poor customer service. AI and DL (Deep (VGG16) representation, as well as complexity auto encoder condi­
Learning) algorithms are designed to assist in numerous domains, such tioned on pre-training framework developed by fine-tuning are three
as robotics (Sünderhauf et al., 2018; Károly et al., 2020), medical sci­ approaches to this problem. Despite this, the prior study is confined to
ence (Bhatt et al., 2021), and computer vision (Kim & Lee, 2019). Many identifying vehicle damage without providing additional specifics.
deep learning tools are also made to help respond to several insurance Furthermore, because it is particularly sensitive to overfitting, it is un­
sector difficulties (Singh et al., 2019), such as data analysis (Wang & Xu, able to assess the damage severity or precisely locate it (Simonyan &
2018), fraud discovery (Gomes et al., 2021), risk mitigation (Kim et al., Zisserman, 2014). Other fields have looked at damage detection and
2022) and automate the claim processes (Waqas et al., 2020). DL tech­ visualization. For example, the solution of a deep learning pipeline for
nologies also can be used to combat claims leakage and automate the delicate architecture photos categorization captured by UAVs

* Corresponding author.
E-mail addresses: [email protected] (J. Qaddour), [email protected] (S.A. Siddiqa).

https://doi.org/10.1016/j.iswa.2023.200192
Received 22 October 2022; Received in revised form 7 January 2023; Accepted 29 January 2023
Available online 8 March 2023
2667-3053/© 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
J. Qaddour and S.A. Siddiqa Intelligent Systems with Applications 18 (2023) 200192

(Unmanned Aerial Vehicles) for desecration measurement was given in 2. Related work
Attari et al. (2017). An amalgamated deep learning pipeline was intro­
duced to detect structures followed by categorization of fine-grained Due to the extreme multiple issues connected with vehicle recogni­
damage for such structures. A road damage detection and street moni­ tion, segmentation, and tracking’s, such as detection speed, live move­
toring system were created in Alfarrarjeh et al. (2018), in which a ment, occlusion, and scaling have become popular research topics
pre-trained model called You Only Look Once (YOLO) was used to (Sudha & Priyadarshini, 2020). The researchers investigated a variety of
recognize several road damage categories as identifiable objects in im­ computer vision and image processing-based technologies to achieve
ages using a pre-trained model YOLO. However, the accuracy and pre­ vehicle recognition, segmentation, and damage detection. Some of these
cision are substantially worse, but it is a fast method. Furthermore, there works are discussed in the following subsections:
are a few challenges that are associated with the application of DL
models when it comes to image classification since the models must be 2.1. Vehicle detection using traditional image processing approaches
trained on a large number of images in order to obtain very high per­
formance and precision. To begin with, it is difficult to obtain a large Image processing often includes numerous transformation and
collection of publicly available damaged automobile images. Second, extraction features, as well as an analytical technique that is investi­
given a large amount of training data, DL will require a lot of processing gated for various applications such as vehicle detection and vehicle
resources to train the model. A model’s training procedure takes time as damage. One of the popular methods includes HOG (Histogram of Ori­
well. Third, a DL model such as CNN will have a large number of pa­ ented Gradient) that is pipelined with the algorithm of k-nearest
rameters and hyperparameters that must be tuned. neighbor (KNN) (Putra et al., 2020) to attain an accuracy level of 84%
In this paper, we use CNN, transfer learning, and approaches of with linear Support Vector Machine (SVM) to get a 2.57% error in color
enhanced Mask-R-CNN to detect, locate, and categorize severity levels of space of bars showing that compression in videos can result in several
car damage through the automatic detector. Unlike earlier research, our misleading guidelines. The research in Son et al. (2019) proposed a filter
method not only finds damage, yet automatically determines its ex­ of correlation to detect vehicles by utilization of Canny transform first
tremity and envisages it on the vehicle’s pictures and mostly specifies followed by a Hough transform to know extract patches and lanes, then
the severity of vehicle damage levels. We blended DL, transfer learning, the analysis of correlation to get an accuracy level of 96.4% (Yuan et al.,
Mask R-CNN, and instance segmentation to identify and categorize an 2019) which is used the improved version of classes square error to
image of a damaged vehicle. Moreover, we developed a web application know the position of the targeted vehicle. Following that, feature
for automatic detection and classification estimation using photographs analysis similar to HAAR with Adaboost Classifier was performed for
of the damaged vehicle. Moreover, to enable fast convergence, three detection verification and comparison to traditional approaches. How­
different pre-trained models namely inception ResNetV2 (Szegedy et al., ever, in their analysis, the loss graph revealed a gap between the testing
2017), VGG-16, and VGG-19 (Simonyan & Zisserman, 2014) were used. loss and the training loss, indicating that there is an overfitting problem.
Finally, employing the three pre-trained models, comparative perfor­ Furthermore, in the studies in Choudhury et al. (2017); Haselhoff and
mance assessments for precision, recall, F1 score, accuracy, loss func­ Kummert (2009); Al Mamun and Deb (2019), HAAR features were
tion, and confusion matrices are presented. Furthermore, we used studied, followed by additional feature extractors such as the Kalman
improved Mask R-CNN to automatically detect car damage areas in filter and triangle features using machine learning classifiers.
traffic accidents, which is an important research value and has a broad
range of application scenarios in the disciplines of object identification 2.2. Vehicle detection using deep learning approaches
and transportation. Therefore, we integrate transfer learning, Mask
R-CNN, and instance segmentation of approach for better feature The AlexNet (Krizhevsky et al., 2017) addition to the competition of
learning and damage identification to locate, identify and classify an ImageNet permitted the usage of CNN to detect objects. Following the
image of the damaged vehicle. Following are our point-by-point trend on the dataset of PASCAL VOC 2012, various tools like R-FCN,
contributions: R-CNN (Ren et al., 2015), and Fast R-CNN (Girshick et al., 2014; Gir­
shick & Fast, 2015) used a mixture of unique convolution layer config­
• Using images of damaged vehicles, we suggested a novel model uration with packed layers to find area proposals by utilization of
combining blended DL, transfer learning, Mask R-CNN, and instance selective research, but it proved to be sluggish. Thus, RPN (Region
segmentation to recognize, categorize, and assess the severity levels Proposal Network) was proposed to respond to the single-staged de­
of accidents. tector methods like several YOLO versions (Redmon et al., 2015) and
• Comparative analysis is also performed using precision, recall, F1 (Redmon & Farhadi, 2018). Furthermore, the major aim was to improve
score, accuracy, confusion matrices, and loss functions using three the speed of detection by the creation of bounding boxes and anchor
different pre-trained models, i.e., VGG-16, VGG-19, and Inception boxes in one step for the intended item, but they sacrificed precision
ResNetV2. during detection. Upgraded YOLOV3 in Gong et al. (2020) was used to
• The findings show that the pre-trained Inception ResNetV2 model evaluate thermal captures for the detection of vehicles, getting an mAP
beat the VGG-16 and VGG-19 models in all detection, localization, of 78.77% (Kaushik et al., 2020). Furthermore, segmentation methods
and severity damage performance categories. include Mask R-CNN and Fast-RCNN vehicle detections. The research in
• Finally, we developed the web-based automatic claim estimator Mittal et al. (2020) gave an upgraded R-CNN model that was fast to
using the best pre-trained model integrated with enhanced Mask R- detect vehicles and ignore the duplicate detection issues with an 85.7%
CNN. The claim estimator can accept photographs from the user in a F1-score for heavy vehicles and the percentage for light vehicles was
traffic site incident and determine the position and degree of the 75%. Usage of a pre-trained Single Shot MultiBox detector (SSD) model
damage automatically. (Liu et al., 2016) that was implemented using the Caffe Frameworks,
which achieved 81.2% mAP for a combination of visible and infrared
The rest of the paper structure is presented as follows; Section 2 fo­ pictures followed by custom capture. The research in Wang et al. (2019)
cuses on related research. Section 3 introduces the framework algorithm explains ways to detect vehicles by using the point cloud of the LiDAR
for automobile damage identification, which includes a broad range of sensor and the algorithm of YOLO V3 with an accuracy level of 70.58%.
concerns such as datasets and methodology. In Section 4, we present a Moreover, the (Wang et al., 2019) study showed an accuracy of 70.58%
performance evaluation and analysis of the experimental result. In for the detection of vehicles using a LiDAR sensor and the algorithm of
Section 5, the web-based automatic claim estimator is presented. Section YOLO V3 on the dataset of KITTI (Wang et al., 2019).
6 finally presents the conclusion. On the other hand, the research in Zhao et al. (2019) wanted to attain

2
J. Qaddour and S.A. Siddiqa Intelligent Systems with Applications 18 (2023) 200192

Fig. 1. A flowchart illustrating the establishment of pipelines for analyzing vehicle damage.

the detection of objects in images of nighttime without having annota­ 3.1.2. Objectives of the study
tion of ground truth for targeted captures so they considered the method We explain our target tasks, which are organized into four steps (Kyu
of unsupervised training using GANs (Generative Adversarial networks) & Woraratpanya, 2020), in this section. The four tasks and their out­
for image-to-image translation to transform annotated daylight pictures comes are presented as follows:
to the nighttime ones without changes in the car positions. Then, rather
than employing merely one of the aforesaid sets, a model of Faster • Task 1: Recognize the image, is it a car or not?
R-CNN is considered to train using an amalgamated data set of images • Task 2: Spot the damaged parts of the car, is it a damaged car or not?
that result in a high mAP of 88% result. As a consequence, when • Task 3: Categorize the damaged spot positions, is the damaged part
compared to the current literature, our suggested blend model using the from the front, back, or side of the car?
Inception ResNetV2 pre-trained model obtained a superior and auto­ • Task 4: classify the severity of the car’s damaged portion, is car
mated outcome of Precision of 89.13, hit rate of 0.98, the sensitivity of damage’s part mild, moderate, or severe?
0.91, and accuracy of 92%. Other than computer vision approaches,
there also exists some sensor-based methods such as in 3.1.3. Systemic flow of the system
An_in-Vehicle_System_and_Method_for_Dur­ The exchange learning test configuration is shown in Fig. 1 is what
ing_Accident_Detection_without_being_Fixed_to_Vehicle (2022); Sys­ we employ. As a highlight extractor, the pre-prepared model is used in
tem_and_Method_for_During_Cra­ Patil et al. (2017). Our framework consists of 4 phases and 3 models,
sh_Accident_Detection_and_Notification (2022). Furthermore, in order each of which is based on one of three different datasets:
to further enhance the automobile business, various new automated
ways for detecting car damages are being developed (Patil et al., 2017; • Dataset 1: includes two classes for completing task 2, damaged or
Deep Learning-Based Car Damage Classification & Detection for Auto­ undamaged car.
motive Industry, 2022; van Ruitenbeek & Bhulai, 2022; Digital Trans­ • Dataset 2: includes three classes for completing task 3, damaged
formation in Car Insurance Industry: Streamline Recognition of Car location is from the front, side, or rear.
Damage Assessment, 2022; Madheswari et al., 2022; Why Estimating • Dataset 3: includes three classes for completing task 4, the damage is
Car Damage with Machine Learning Is Hard, Available Online, 2022; minor, moderate, or severe
Ahmad et al., 2022; Automatic vehicle damage detection with images,
2022). Moreover, for diver assistance to prevent road incidents an Then we select one of the pre-trained models; VGG-16, VGG-19
automated system using deep learning is also proposed such as in the (Simonyan & Zisserman, 2014), or Inception ResNetV2 (Szegedy et al.,
work of Jaikishore et al. (Neelam Jaikishore et al., 2022). 2017). More precisely, the first phase determines whether or not an
automobile exists. Then an automobile is based on input data that is
3. Proposed methodology either uploaded as an image or derived from existing databases. In the
second phase, after selecting and testing the model, we create model 1
Our suggested framework for an automatic claim damaged estimator using dataset 1 to determine whether or not the automobile is damaged.
is separated into several steps. The next subsections go through in During the third step, the generated and trained model named model 2
further detail: with dataset 2 will be utilized to locate the damaged portion of the
automobile. Finally, in the final step, we estimate the severity of the
damaged portion of the automobile, which is referred to as model 3
3.1. Damaged detection and classification using dataset 3. Fig. 1 depicts the above flowchart of the above phases
and their details. The flowchart diagram will assist in analyzing the
This work proposes a technique for detecting, recognizing, local­ overview of the system working i-e car’s defects, locating them, classi­
izing, and categorizing vehicle damage using several images of the fying, and categorizing the severity of the damages.
damaged vehicle. To obtain the desired outcomes, deep learning CNN
and transfer learning approaches from several pre-trained backbone
models are used. Fig. 1 depicts our suggested technique workflow:

3
J. Qaddour and S.A. Siddiqa Intelligent Systems with Applications 18 (2023) 200192

examples, subjective changes (between 20 and 80◦ ) were combined with


level flip alterations. The dataset was randomly divided between 80%
and 20% with 80% being used for training and 20% for testing.

3.1.6. Formulation of damage levels


For a car damaged level, there are three scenarios to consider. Ac­
cording to Libertymutual.com (an insurance website), damages are clas­
sified as follows (Harshani & Vidanage, 2017).

• Minor damage: such as a scratch on a headlight or a small dent in a


car’s bonnet.
• Moderate Damage: big dents in a car’s bonnet/hood, fender, or
door.
• Severe damage: includes broken axes, total damaged parts, crooked
or warped frames, and the destruction of a Car’s airbags.

Fig. 2 depicts the test images from each class. Because of the great
similarities between classes, the underlying task is non-immaterial.
Because the mischief only covers a limited portion of the image, the
new strategy work becomes much more dynamically unpleasant.

3.1.7. Transfer learning


Transfer learning is based on the concept of using previously gained
knowledge to clarify similar work rapidly and/or finer. It is among the
Fig. 2. Illustrations of many types of vehicles, both damaged and not damaged.
most successful strategies for tiny, marked datasets, in which a pre-
There are lines running throughout that indicate the many types of damage.
Bumper dents, various forms of scrapes, glass breakage, and a damaged head­ trained model extracts different features for the specified image
light are all possible. whereas ensuring a low risk of overfitting. We use three different pre-
trained backbone models, VGG-16, VGG-19 (Simonyan & Zisserman,
3.1.4. Datasets 2014), and Inception ResNetV2 (Szegedy et al., 2017), which are pub­
We individually collect images available on ImageNet from our im­ licly available among many others. We utilized the above three
mediate community due to the scarcity of available databases for pre-trained models to extract features and use the pre-trained model’s
damaged automotive datasets. In this research, we used two datasets: weights to be applied to the target task of our representation. Traditional
the first comprises photographs of cars and other objects, and the second machine learning techniques focus on learning and training individual
includes pictures of damaged as well as non-damaged vehicles. tasks from the beginning while conveying informatory extracts attri­
The second dataset contains the severity of damage into three cate­ butes and admissible knowledge from source assignments and implying
gories: “significant” which relates to a total out the damaged vehicle, that to an objective task. When the source and target domains are
“moderate” relating to big or dents or scratches, and “minor” to small similar, knowledge transfer would enhance the performance of the tar­
dents or scratches. Fig. 2 depicts a sample of our training and testing geted task. Therefore, the pre-trained model’s classes are considered
dataset of various types of damaged and non-damaged vehicles that we source domains while the tasks are output-targeted damages to be
collected. Moreover, we also focus on the augmentation of data to detected with their location, classification, and severity levels.
enlarge synthetically and alter the data set to relax its tolerance and
improve its performance to the problem of overfitting at the time of 3.1.8. Model parameter settings
training. We tend to apply flipping transformations, zooming, dimen­ Two significant issues must be solved to successfully detect, localize,
sional shift, and random rotation to vary the data generated (N. Dhieb and classify car damages from images (Abdulla, 2017): i) the high
et al., 2019). inter-class similarity, and ii) the image pose and its orientation. We
make use of a feature extractor i.e., a pre-trained model by another
3.1.5. Data augmentation neural network to classify and detect the images which are damaged. We
It has been discovered that increasing the dataset with relatively will employ a CNN which is composed of 572 layers and trained on more
modified photographs increases the classifier’s speculative execution. As than a million ImageNet images i.e., Inception ResnetV2 (Wang & Xu,
a consequence, the dataset was artificially increased. On numerous 2018). To achieve our goal, we append two neural networks, Softmax,
and pooling layers, as well as a drop-out layer to increase performance

Fig. 3. Model architecture for Damage detection and categorization.

4
J. Qaddour and S.A. Siddiqa Intelligent Systems with Applications 18 (2023) 200192

and limit an overfitting problem, as illustrated in Fig. 3. We train the last • Nine to sixteen convolution hidden layer uses 512 feature kernel
two neural network layers and freeze all other weights to avoid filter.
long-time training. To improve the model’s performance and reduce • Seventeen and eighteen fully connected hidden layers have 4096
overfitting, factors such as regularization are used. As techniques like nodes with ReLU.
transfer learning and CNN are less-time consuming in the phase of • Layer nineteen (the last convolution layer) has 1000 nodes with
training, hence we used this learning strategy to get the best parameters softmax.
in a shorter amount of time. We proposed three different pre-trained
models VGG-16, VGG-19 (Simonyan & Zisserman, 2014), and We used both pre-trained VGG-16 and VGG-19 models, which
Inception-ResnetV2 (Szegedy et al., 2017), detail of the pre-trained appeared to have been previously trained on massive benchmark data
model are given in the coming backbone section. sets such as ImageNet (Russakovsky et al., 2015), as a pre-trained model
to perform image categorization tasks and possess the accurate weights
to be implied as our backbone model via transfer learning to reduce
3.2. Destructed functionalization and anticipation training time in our models.
C) Inception-ResNet-V2:
In this regard, we proposed the improved Mask R-CNN and material Inception-ResNet-V2 [19]model is an adaptation of the ResNet-V2
identification, classification, and dismemberment approaches to find model, which is a convolutional neural network classify images into
and portray damage in vehicle images. (He et al., 2017). Mask R-CNN is 1000 object categories. Architectures that are built on the inception
an improvement of Faster R-CNN (Ren et al., 2015), in which a third family architectures and have 164 layers deep, and are trained on
branch is added to output the object mask. Moreover, a RoIAlign pro­ million images using ImageNet database (Russakovsky et al., 2015), will
cedure is used instead of RoI pooling to create instance segmentation result in lower error rate and can achieve more complexity on both
masks for better accuracy. As illustrated in Fig. 5, Mask R-CNN includes dimensional and avenue-wise. This standard, on the other hand, is made
a completely connected neural network for classification and up of an object known as an "Inception cell" into which an array of
bounding-box prediction, as well as another fully connected CNN for complexity is done at various stages and finally accumulates the result.
extracting instance segmentation masks. The initiation site carries out interconnect association while disregard­
ing contagious dimensions using the 1 × 1 convolution, which is
3.2.1. Backbone networks directed by cross-channel and cross-spatial correlations using the 5 × 5
In the Mask R-CNN model, the backbone network is a CNN that is as well as 3 × 3 filters. Then, overall layers pass into aspect reduction to
utilized as attribute withdrawal, with few attributes retrieved from the result in 1 × 1 convolution. This model has been trained on over a
constitutional surface and craved features uprooted from ensuing levels. million images from the ImageNet dataset.
Additionally, when images go across the backbone network, they are
processed to extract feature maps. It is usually pre-trained on ImageNet 3.2.2. RPN (region proposal networks)
(Russakovsky et al., 2015) (Krizhevsky et al., 2017). Therefore, the The RPN is known as CNN in which properties of the backbone
backbone is used as a feature extractor, which gives you a feature map network are used as inputs to predict the location of an anchor (fore­
representation of the input. In this paper, we used three different ground or background). The RPN takes the feature maps of varied sizes
backbone models including VGG-16, VGG-19 (Simonyan & Zisserman, generated by the Feature Pyramid Network (FPN) and extracts the fea­
2014), and Inception-ResNetV2 [19]to extract different feature maps tures of ROIs from different feature pyramid levels relating to the target
from the input images. The details of the pre-trained model are given in object size. As a result, the sample network’s structure adapts without
the following subsections: increasing the amount of processing, dramatically improving the iden­
tification of smaller objects while maintaining speed and accuracy. In
A) VGG16: this situation, a sliding window traversing the maps creates a collection
of anchorages with differentiating proportions and scales that can be
VGG-16 is known as a network that was proposed by Simonyan used to forecast whether an object is in the background or front. We
(Simonyan & Zisserman, 2014). It is comprised of three connected layers employ the Non-Maximum Suppression (NMS) approach with a 0.7
and 13 convolutional layers with activation of Rectified Linear Unit Intersection over Union (IOU) criteria to decrease redundancy since the
(ReLU). The network uses filters of very small convolution such as 3 × 3 anchors overlap.
and 2 × 2 which further includes parameters of 138 million. Model of
VGG-16 attains 92.7% belonging to the top five accuracies of test on a 3.2.3. Region of interest alignment (RoI align)
dataset of ImageNet and awarded in 2014 ILSVRC challenge with first The RoI section may possess varied dimensions due to the RPN
and second places on tasks related to image classification and object bounding box refinement process. RoI attributes must be associated to
localization with the error ratings of 25.32% and 7.32% respectively. As hold a similar dimension as that of the RoIAlign aiming to establish a
a result, VGG-16 is one of the most widely utilized architectures for correct mask with Mask R-CNN. Faster R-CNN utilizes the RoIPool
object detection. technique to discretize the attribute map and establishes misalignments
B) VGG19: among the ROIs as well as withdrawal attributes, whereas He et al. (He
Simonyan (Simonyan & Zisserman, 2014), upgraded the deeper et al., 2017) introduced the RoIAlign Technique, which utilizes bi-linear
version of VGG-16 which is known as VGG-19 based on the previous interpolation (Wang & Yang, 2008) to analyze the correct worth of at­
version as a 19-layer that was further composed of three connected tributes and so could be accumulated. As a result, Mask R-CNN is
layers and sixteen convolutional layers with activation of ReLU. Some of enhanced using Faster R-CNN, and the ROI-Pooling layer is transformed
the structures of VGG-19 are (Simonyan & Zisserman, 2014): into the interest-region alignment layer (RoIAlign). The spatial infor­
mation is kept on the feature map by applying bi-linear interpolation.
• First and second convolution hidden layers use 64 feature kernel RoIAlign is an alignment layer of an interesting area that differs from
filters RoIPool in that it bypasses the quantization phase and is unable to
• The third and fourth convolution hidden layers uses 124 feature quantize the RoI border. Rather, it does the calculation of the sample
kernel filters and the output decreases by four times from the input of points’ exact position in every unit utilizing bi-linear interpolation while
224×224×3 to the output of 56×56×128. retaining the decimal and generating the RoI of the last fixed size using
• The fifth to eighth convolutions hidden layer uses a 256-feature the procedure of average pooling and maximum pooling. In Fig. 4, the
kernel filter. blue solid line shows the 5 × 5 after convolution feature map, and the

5
J. Qaddour and S.A. Siddiqa Intelligent Systems with Applications 18 (2023) 200192

Fig. 4. RoI Alignment Schematic (Zhang et al., 2020).

Fig. 5. Mask R-CNN Damage detection and classification Model Architecture.

red line shows the feature smaller block that corresponds to the feature unique object of interest). Mask R-CNN is a simple, uncomplicated
map RoI as shown in the diagram. The small blockage is categorized into addition to Faster R-CNN, which operates at 5 frames per second. Due to
two tiny ones with less quantized boundary then it was further catego­ the fact that pixel segmentation requires a much finer-grained alignment
rized into four small blocks and this time with no quantized boundary. than bounding boxes, the Mask R-CNN enhances the RoI pooling layer
The point of the center is considered as 4 coordinate points as shown by (named "RoI Align layer") to make it easier and more accurate to map
the blue dots. After that, the values of all positions are calculated using RoI to the original image areas. It leverages a similar architecture to
bi-linear interpolation using the average pooling or maximum pooling Faster R-CNN for object detection. Mask R-CNN leverages RoI alignment
operation (Zhang et al., 2020). as opposed to RoI pooling to enable pixels to maintain ROIs and prevent
data loss.
3.2.4. Mask RCNN and loss function improvement The RPN which searches all FPNs from top to bottom and suggests
He et al. (He et al., 2017) extend Faster RCNN by adding a branch for regions that may contain artifacts is also employed. It employs anchors,
each region of interest (RoI) to predict segmentation masks, which is which are a set of predefined position boxes and sizes based on the input
called Mask R-CNN. A tiny fully convolutional network (FCN) is added individual anchors are given ground-truth groups and bounding boxes.
to each region of interest (RoI) as a mask branch, anticipating a RPN generates 2 outputs for each anchor: anchor type and bounding box
pixel-by-pixel Segmentation mask [66]. Mask R-CNN extends Faster parameters. The anchor class might be either foreground or shadow
R-CNN to pixel-level image segmentation, as described in He et al. class. Faster R-CNN has a module called RoI Pooling, which is different
(2017). The concept is to separate the categorization prediction and from Mask R-CNN. RoIPool’s feature map regions were somewhat mis­
pixel-level masking functions. In addition to the existing classification aligned with the original picture regions, according to the Mask R-CNN
and localization branches, a third branch based on the architecture of developers. This contributes to mistakes since image segmentation at the
Faster R-CNN was introduced to predict an object mask. The mask pixel level of the image demands accuracy. RoIAlign was used to tackle
branch, as shown in the last graphs of Fig. 5, is a thin, fully connected this problem, in which the function map was sampled at various posi­
network (FCN) that is added to each RoI and predicts a pixel-by-pixel tions, and then a bilinear interpolation was applied.
segmentation mask. Semantic segmentation using fully convolutional Instance segmentation is a complex problem that involves combining
n/w (FCN). FCN generates masks (binary masks in our case) around two independent image tasks, such as object detection and semantic
objects of bounding boxes by classifying each field pixel by pixel (a segmentation (Lin et al., 2014). Faster R-CNN and Mask R-CNN employ

6
J. Qaddour and S.A. Siddiqa Intelligent Systems with Applications 18 (2023) 200192

an extra branch for mask prediction that is parallel to the current two Table 1
branches, as illustrated in the last graphs in Fig. 5. Mask-RCNN’s Parameters of actual and prediction classifications.
multi-task losses include classification, segmentation mask branch loss, Prediction Classification
and bounding box regression losses. The category prediction is tied to
Total Population Positive Negative
the branch of classification, but the classification loss is linked to the (P + N) Prediction (PP) Prediction (NP)
class ground truth. Mask R-CNN loss’s function is defined as multi-task Actual Positive (P) TP (True Positive) FN (False
loss (He et al., 2017) on each sampled RoI as: Classification Negative)
Negative (N) FP (False TN (True negative)
L = Lcls + Lbox + Lmask (1) Positive)

L stands for total training loss, Lcls for classification loss, Lbox for
bounding-box loss, and Lmask for mask loss, we can expand the first two and a segmentation mask. The working of MASK-RCNN is different from
terms of Eq. (1) as follow: that of a traditional image classification model. For example, if we
observe Fig. 5, the MASK-RCNN first extracts the features to result in
Lcls + Lbox = Lcls (p, u) + λ [u ≥ 1] Lloc (tu , v) (2) activation maps, and later on, the model is partitioned into two sub-
modules that is region-proposal network and ROI alignment. The pur­
Where u is the label of each training RoI with a ground-truth class and v pose of RPN (Region proposal network) is to be executed once per image
is a label of each RoI with a ground truth bounding regression target; tu to provide a set of region proposals. Region proposals are feature map
= (txu , tyu , twu , thu ), specifies a scale-invariant translation and log-space regions that include the object. These objects might be damaged areas of
height/width shift relative to u class; p = (p0 , p1 , …, pn ) represents cars in our case. The classifier predicts bounding boxes as well as object
the probability distribution over N + 1 categories; [u ≥ 1] denotes the class for every suggested region generated in stage 1. Each suggested
Iverson bracket indicator function that evaluates to 1 when u ≥ 1 and region can be of varying sizes, but fully linked layers in networks
0 otherwise. For bounding-box regression, the loss is defined as: required a set of size vectors to generate forecasts. The size of these
∑ ( ) proposed regions is determined through either the RoI pool or the
Lloc (tu , u) = smoothL1 tiu − vi (3)
i∈x,y,w,h RoIAlign technique. The RoIAlign layer’s result is therefore passed
further into the Mask head that has two convolution layers. It creates a
In which, mask for every RoI, segmenting a damaged car image pixel by pixel. This
{
0.5x2 if |x| < 1 module localizes the given car image’s damage as front, side, or rear.
smoothL1 (x) = (4) Subsequently, in the last step, the CNN model is trained to classify the
|x| − 0.5 otherwise
severity of the damage as mild, moderate, or severe. This CNN model is
If the regression targets are unbounded, Eq. (4) eliminates the based on pre-trained VGG16, VGG19, and Inception-ResNetV2. Hence,
sensitivity. The hyper-parameter λ in Eq. (2) controls the balance be­ as an excellent practical use of deep learning, all of these modules are
tween the two task losses. λ = 1 is used in most experiments. The Lmask is linked to providing a completely automated automotive damage esti­
calculated by taking the average cross-entropy of all pixels on the RoI, as mate. A simple pseudo code of proposed framework is also given above.
shown below:
1 ∑ [ ) ( ) ( ) 4. Results and discussions
Lmask = yij log (aij + 1 − yij log 1 − aij ] (5)
m2 1≤i,j≥m
In this part, we examine our findings and provide appropriate ex­
1 planations and analysis. Furthermore, the assessment measures used to
yij = xi
(6) evaluate the proposed work are also described in this section. In addi­
(1 + e− )
tion, all parameters of the model are fine-tuned to their best values.
1
aij = (7)
(1 + e− bi ) 4.1. Evaluation metrics
Where yij is the label of a cell (i, j) in the true mask for the region of
Table 1 illustrates the parameters of actual and prediction classifi­
size mxm; aij is the predicted value of the same cell in the mask learned
cations and the most common metrics of object detection and classifi­
for the ground-truth class N. Moreover, xi and bi are the predicted and
cation to evaluate model performance are the following:
true values of the ith component in the positive RoI respectively (Gir­
A) Intersection over Union (IoU): It is the measure of intersection
shick, 2015). Summarizing the core endpoints, the emphasis is placed on
area among the predicted map A of predicted segmentation and the map
the optimization of the information of location for large targets. The
B ground truth divided by the total (union) of both A and B, and the
focus is on the optimization of category prediction for small targets.
range is between [0 and 1].
Different weights must be incorporated to improve the detection accu­
racy of the detection branches for scale targets in the loss function. A∩B
IoU = (8)
A∪B
3.2.5. Model architecture layer-by-layer The Mean of IoU is defined as the average IoU over all classes.
The suggested model’s architecture is made up of three major ele­ B) Precision, Recall, F1 score, and accuracy: They can be specified
ments. The first and second modules are intended to determine whether for each class as well as at the aggregate level in the following ways:
or not the image provided contains cars and whether or not the car is
damaged. This is possible due to pre-trained convolutional neural net­ P=
TP
(9)
works (CNN) made up of VGG16, VGG19, and Inception-ResNetV2. The TP + FP
first layers of VGG16, VGG19, and Inception-ResNetV2 extract visual In the formula, FP is the number of +ve samples detected as negative
features producing refined and downscaled feature maps known as samples (Zhao et al., 2019). The Sensitivity, Hit rate, or Recall rate is
activation maps. These features are transmitted to fully connected layers computed using Eq. (10):
(also known as dense layers) followed by sigmoid activation to deter­
TP
mine whether or not the car is damaged. Following the detection of R= (10)
damage in the car, we transmit the imag e to the MASK-RCNN to localize TP + FN
the region where the car is damaged in the form of both a bounding box In the above equation, TP relates to the number of +ve samples that

7
J. Qaddour and S.A. Siddiqa Intelligent Systems with Applications 18 (2023) 200192

Table 2
Performance of Damage Severity Classification.
Damage Detection Performances Damage Location Performances Damage Severity Performances
Metric Precision Recall F1- Accuracy Precision Recall F1- Accuracy Precision Recall F1- Accuracy
(%) Score (%) (%) Score (%) (%) Score (%)

VGG-16 83.5 0.97 0.89 90.4 74.33 0.76 0.74 83 65.66 0.67 0.66 77
VGG-19 86.09 0.95 0.90 91.09 70.66 0.74 0.71 82 66.33 0.67 0.66 78
Inception- 89.13 0.98 0.91 92 80.4 0.80 0.78 85 67.12 0.70 0.70 80
ResnetV2

Fig. 6. Confusion Matrices for VGG16, VGG19 and Inception-ResNetV2.

are correctly tested, and FN relates to the number of − ve samples tested Inception-ResNetV2, VGG-16, and VGG19, and feed the identical
as the +ve ones (Zhao et al., 2019). The accuracy is computed using Eq. training and testing data to the pre-trained models, which will use 100
(11). epochs.
TP + TN
A= (11) 4.2.1. Quantitative performance of proposed deep learning model
TP + FP + TN + FN
Table 2 shows the differences in damage identification, location, and
Similarly, F1-Score is given by: severity categorization between the three pre-trained models. The pro­
PxR posed claim damaged estimator model has been introduced to classify,
F1 − Score = 2 × (12) detect, and visualize damaged vehicles. Furthermore, we used four
P+R
distinct measures to assess the performance of different transfer learning
F1-Score is called harmonic mean (average value), which is the models: precision, recall (sensitivity), F1-score (harmonic mean), and
combined measure of both precision and recall as shown in equation accuracy. The higher those criteria are, the better our model is. As it is
(12). shown in Table 2, we find that the pre-trained model Inception-
ResNetV2 outperformed both VGG-16 and VGG-19 in all categories, of
4.2. Performance evaluation and analysis of experimental results detection, localization, and severity-damaged performances. The
Inception-ResNetV2 accuracies for detection are 92%, localization is
We evaluate the proposed deep learning technique’s capacity to 85%, and severity damage is 80%. Damage localization is more efficient
detect, identify, categorize, and portray vehicle damage in this section. using the Intercept-ResNetV2 pre-trained model, which has a precision
In this research, we will use three distinct backbone pre-trained models, of 80.4% compared to other VGG-16, and VGG-19 of 74.33% and

8
J. Qaddour and S.A. Siddiqa Intelligent Systems with Applications 18 (2023) 200192

Fig. 7. Evolution of Loss Function for (i) VGG16, (ii) VGG19, and (iii) Inception-ResNetV2.

70.66% respectively. Furthermore, the VGG-16 and VGG-19 model 4.2.2. Confusion matrices
performs poorly in damage severity classification accuracy with just We also calculated the confusion matrices using the three different
77%, and 78% accuracies compared to 80% accuracy for the pre-trained models (a) VGG-16, (b) VGG-19, and (c) Incept ResNetV2 as
theiInception-ResNetV2 model. In all tasks, the Inception-ResNetV2 shown in Fig. 6. The performance evaluation of the various transfer
results beat the other two models. learning models used in this paper uses three different metrics: Preci­
sion, Recall, and F1-score. The one with the higher metrics is the best
model. The confusion matrices conclude the normal predicted values of

9
J. Qaddour and S.A. Siddiqa Intelligent Systems with Applications 18 (2023) 200192

4.2.5. Results comparison with existing methods


In this section, we have compared the performance of the proposed
enhanced mask-RCNN for automated vehicle estimation with some
existing studies to show that the proposed has a significant contribution
worth in the literature. As a result, Table 3 summarizes the comparative
analysis. It is observed from above Table 3, that in some studies only a
single stage is focused for example, in Chua et al. (2021), the authors
only carried out the detection of vehicles either in damaged or
non-damaged class. Similarly, in Karande (2022), authors designed the
ensemble approach in which class probability estimates are produced
from numerous pre-trained models for every training image. The ulti­
mate choice class is determined by taking the weighted mean of the class
posteriors. However, they skip the next phase i-e localization of damage
Fig. 8. Localization and Visualization Examples. as well as the severity of the damaged part. In Singh et al. (2019), there is
only one task is performed i-e localization and authors achieved 40%
every class, helping in the significant performance of the proposed mAP with their proposed methods including Damage MRCNN, PANet,
model. We can see from Table 2 and the results below that the Inception and Ensemble technique. In Sharma et al. (2019), the findings of all
ResNetV2 outperforms the other two models. It is more efficient when it phases are combined, and they achieve an overall accuracy of 88%. This
comes to damage localization with a precision of 80.4%. The perfor­ is good, but it is difficult to determine which stage performance is
mance results of the model are validated using the confusion matrix mediocre and which is outstanding, resulting in inadequate analysis.
provided in Fig. 6, which summarizes the normalized values predicted Moreover, in Kyu and Woraratpanya (2020), performance is accessed
for each of those classes. Inception ResNetV2 has residual connections with all stages and their model shows 95.22% detection performance i-e
which allow shortcuts in the model to train neural networks without the given image is damaged or not. Nevertheless, the performance of
having overfitting problems which will in turn result in better localization and severity is much low in comparison with enhanced
performance. mask-RCNN with backbone Inception-ResNetv2. The suggested model
achieves 85% and 80% localization and severity prediction accuracy,
4.2.3. Accuracy and loss functions respectively, demonstrating that the technique provided in this work is
During the stages of evaluation and training, the performance of the far more suited for this application. Second, rather than focusing on
enhanced Mask R-CNN model is assessed using a multi-tasking loss single stages, it is a comprehensive framework. So because the perfor­
function that is composed of segmentation mask losses, localization, and mance of all phases is dependent on each other. For example, if auto­
classification. The classification and bounding box losses are similar as motive damage is incorrectly localized, the severity of the damage to the
presented in Ren et al. (2015). The segmented mask loss is explained as car may also be incorrectly classified. Another significant advantage of
the loss of average binary cross entropy that incorporates the kth mask if the proposed technique is that it conducts localization not only in the
the region is linked with the truth class of kth ground due to the form of a bounding box but also in the form of segmentation, which is
competition among classes for the production of the mask. Fig. 7 shows considerably more precise. The mask-RCNN contains an additional
the changing patterns of the accuracy and the loss function of the three branch or layers that are used to forecast segmentation images in which
pre-trained models, respectively. A model is better if it has a low loss pixel-by-pixel classification is conducted. In contrast to (Dwivedi et al.,
value. 2021), where the YOLO model is employed for localization, the sug­
We may be certain that the given model is better at dealing with the gested approach accomplishes both localization and segmentation. As a
damages since the functions of loss computed on the validation and result of the above table comparison and discussion, we may conclude
training datasets are around zero. Fig. 7 represents, a) the VGG-16 loss that the proposed work is more fully functional since it provided an
function and accuracy level, b) the VGG-19 loss function and accuracy, adequate automated system for car damage assessment from beginning
and c) represents the loss function and accuracy level for Inception­ to end as well as a workable solution in the form of practical web-based
ResNetV2. Moreover, the graphs show that the inceptionResNetV2 application because end users are not very familiar with backend and a
outperforms the other two pre-trained models VGG-16 and VGG-19. user-friendly and complete automated system is more obligated for
them.
4.2.4. Damaged vehicles localization and visualization
Fig. 8 depicts a sample of damage localization instances from our 4.2.6. Web-based automatic claim estimator
suggested AI-based methodology. The model is capable of detecting The most basic method of automating a system is to build a web-
damage in a range of positions and orientations, as well as various types based CNN model that can receive images from the user and assess the
and levels of vehicle damage. location and severity of the damage. To develop and test our model, we

Table 3
Comparison with existing research studies.
No# Existing works Model Used Detection Localization Severity
performance performance performance

1. (Kyu & Woraratpanya, VGG16 and VGG19 95.22% 76.48% 57.89%


2020)
2. (Karande, 2022) Ensemble technique 89.53% ⨯ ⨯
3. (Dwivedi et al., 2021) AlexNet, VGG19, InceptionV3, MobileNet, ResNet50 and 96.39% 77.8% (mAP) ⨯
YOLO
4. (Singh et al., 2019) Damage MRCNN, PANet, and Ensemble ⨯ 40% (mAP) ⨯
5. (Sharma et al., 2019) Simple CNN 88%(combined result) 88% (combined result) 88% combined result)
6. (Chua et al., 2021) Simple CNN 80% ⨯ ⨯
Our Method VGG16, VGG19 and Inception-ResNetV2, Mask-RCNN 92% 85% 80%

⨯ Means that step is not performed.

10
J. Qaddour and S.A. Siddiqa Intelligent Systems with Applications 18 (2023) 200192

Fig. 9. Car Damage Assessment Website (a) Front Page (b) Final Page Result.

employed our upgraded Mask R-CNN with pre-trained Incept ResNetV2 • Gate 1: This determines whether or not the image uploaded contains
results. Fig. 9(a) and (b) illustrate the web-based automatic estimator a car.
front page. In addition, Fig. 10 shows the working results of the auto­ • Gate 2: Check that the provided image of the automobile is not
matic claim estimator that we designed and implemented. The model damaged to avoid fraudulent claims.
must pass several tests, the first of which verifies that the image is true of • Damage Location: The image is compared to a pre-trained model to
a vehicle, and the second of which verifies that it is damaged. These are determine if the damage is on the front, rear, or side.
the preliminary inspections that must be completed before the analysis • Damage Severity: The image is compared to pre-trained models to
begins. The damage check will begin after all the gate checks have been determine if the damage is mild, moderate, or severe.
verified. The model will estimate the damage’s location, such as front, • Results: The results are sent to the user and a third party. Fig. 10
side, or rear, as well as its severity degree, such as mild, moderate, or depicts the various testing results of our automatic claim estimator.
severe.
The system architecture of the automatic claim estimator is The proposed model is developed using Tensor flow, Keras, which is
comprised of the following steps: a deep learning library. The Numpy library for scientific numerical
calculations and Scikit–learn that consists of machine learning algo­
• Input: The user first submits an image of the damaged car. rithms tools. Furthermore, PyCharm IDE and Jupyter Notebooks are
used as Development platforms for python codes and Web applications

11
J. Qaddour and S.A. Siddiqa Intelligent Systems with Applications 18 (2023) 200192

Fig. 10. Detection results of proposed vehicle damage estimator.

where we create Anaconda Virtual Environments i-e python virtual


Algorithm: 1
application environments.
Pseudo code of proposed vehicle estimation framework.

5. Conclusion and future work 1. Input: xi ∈ X (i-e set of images of cars)


2. Output: Car detection O1 , Damaged detection O2 , Damaged localization/
segmentation O3 , Severity prediction O4
In this study, we suggested a unique framework model for deter­ 3. O1 (Car detection) ←xi ∈ X
mining and estimating vehicle damage, as well as the level of severity. 4. model1 ←Load pre-trained model weights←func(VGG16, VGG19, Inception −
Insurance firms are interested in using this strategy to reduce claims ResNetV2)
leakage and save time and money. Specifically, we proposed an 5. label(car, not car)←model1 (xi )
6 if “car”:
improved Mask R-CNN method, which has a significant research benefit 7. O2 (Damaged detection) ←xi
and a wide range of application scenarios in the disciplines of object 8. model2 ←Load pre-trained model weights←func(VGG16, VGG19, Inception −
detection, recognition, and transportation. In order to find, identify, and ResNetV2)
categorize an image of a damaged automobile. We blend deep learning, 9. label (damaged, not damaged)←model2 (xi )
10. if “damaged”:
transfer learning, Mask R-CNN, and instance segmentation. Moreover, to
11. O3 (Damaged localization/segmentation) ←xi
enable fast convergence with improved performance, three different 12. model3 ←Load pre-trained model weights←func(MASK − RCNN)
pre-trained models namely, Inception ResNetV2, VGG-16, and VGG-19 13. label(localized segmented image, (front, back side))←model3 (xi )
were used. In addition, we created a web application for automated 14. if ‘localized”:
identification and estimating by feeding images of damaged automobiles 15. O4 (Severity prediction) ←xi
16. model4 ←Load pre-trained model weights←func(VGG16, VGG19, Inception −
into an automatic claim estimator, which outputs the location of the ResNetV2)
damage and estimates its severity levels. 17. label(mild, moderate, severe)←model4 (xi )
Moreover, we have made a comparative analysis of performance 18. end if
metrics, i.e., Precision, Recall, F1 Score, confusion matrices, and Accu­ 19. end if
20. end if
racy and loss function between the three pre-trained models i.e., VGG16,
VGG19, and Inception-ResNetV2 as shown in Table 2, Figs. 7, and 8 to
identify the best performance and best pre-trained model to use in our be used as a future extension research in further work. Furthermore,
model; All results showed that the Inception-ResNetv2 stands out to be alternative pre-trained models and detection algorithms other than
the best as is shown in the previous Figures and Tables. The experi­ MASK R-CNN can also be a good future direction (Algorithm 1).
mental study, accompanied by a significant comparative analysis,
showed that the approach proposed by this paper is promising and Declaration of Competing Interest
contains real novelties in the field of automatic object recognition and
classification. The overall outcomes show that the suggested framework The authors declare that they have no known competing financial
model not only detects damaged vehicles but also automatically locates interests or personal relationships that could have appeared to influence
them and estimates their severity level. This method demonstrates a the work reported in this paper.
valuable value for the insurance industry in combating claims leakage
issues. Data availability
The limitation of the model is the lack of a database for the following:
database for part pricing, availability, and make/model vehicle speci­ No data was used for the research described in the article.
fication integrated with my model. The integration of this model with
other databases of part prices and vehicle make model specifications will

12
J. Qaddour and S.A. Siddiqa Intelligent Systems with Applications 18 (2023) 200192

Appendix

Acronym Meaning
DL Deep Learning
AV Autonomous Vehicle
CNN convolutional neural network
DSSD DE convolutional Single Shot Detector
GAN Generative Adversarial Network
ML- FPN Multi-level Feature Pyramid Network
MSE means square error
R-CNN Region Based Convolutional Neural Networks
RoI Region of Interest
SOD Salient Object Detection
SVD Singular Value Decomposition
VHR Very High Resolution
BB Bounding Box
DCN Deformable Convolutional Network
FC Fully-connected
GPU Graphics Processing Unit
MR-CNN Multi-Region CNN
NAS Neural Architecture Search
ReLU Rectified Linear Unit
RPN Regions Proposal Network
SPP Layer Spatial Pyramid layer
SVM Support Vector Machine
YOLO You Only Live Once
BN Batch normalization
DSOD Deeply Supervised Object Detectors
FPN Feature Pyramid Network
ILSVRC ImageNet Large Scale Visual Recognition Challenge
MS COCO Microsoft Common Objects in Context
Pascal VOC PASCAL Visual Object Classes
RGB-D Red, Green Blue-Depth
SGD Stochastic Gradient Descent
SSD Single Shot Detector
VGG Visual Geometry Group
KNN k-nearest neighbors’ algorithm

References n/image-annotation/deep-learning-based-car-damage-classification-and-detection-
for-automotive-industry, Accessed on Nov, 2022”.
Dhieb, N., Ghazzai, H., Besbes, H., & Massoud, Y. (2019a). Extreme gradient boosting
Abdulla, W. “Mask R-CNN for object detection and instance segmentation on Keras and
machine learning algorithm for safe auto insurance operations. In 2019 IEEE
TensorFlow,” 2017.
international conference on vehicular electronics and safety (ICVES) (pp. 1–5).
Ahmad, A. B., Saibi, H., Belkacem, A. N., & Tsuji, T. (2022). Vehicle Auto-Classification
Dhieb, N., Ghazzai, H., Besbes, H., & Massoud, Y. (2019b). A very deep transfer learning
Using Machine Learning Algorithms Based on Seismic Fingerprinting. Computers, 11,
model for vehicle damage detection and localization. In 2019 31st International
148.
Conference on Microelectronics (ICM) (pp. 158–161).
Al Mamun, M. A., & Deb, K. (2019). An approach for recognizing vehicle based on
"Digital Transformation in Car Insurance Industry: Streamline Recognition of Car
appearance. In 2019 International Conference on Computer, Communication, Chemical,
Damage Assessment, Available Online, https://www.altamira.ai/ai-powered-car
Materials and Electronic Engineering (IC4ME2) (pp. 1–4).
-damage-assessment/, Accessed on, Nov, 2022".
Alfarrarjeh, A., Trivedi, D., Kim, S. H., & Shahabi, C. (2018). A deep learning approach
Dwivedi, M., Malik, H. S., Omkar, S., Monis, E. B., Khanna, B., & Samal, S. R. (2021).
for road damage detection from smartphone images. In 2018 IEEE International
Deep learning-based car damage classification and detection. Advances in artificial
Conference on Big Data (Big Data) (pp. 5201–5204).
intelligence and data engineering (pp. 207–221). Springer.
“An_in-Vehicle_System_and_Method_for_During_Accident_Detection_without_being_Fixed_
Girshick, R., & Fast, R. (2015). In IEEE Int. Conf. Comput. Vis (pp. 7–13).
to_Vehicle, 2022 Available Online https://www.researchgate.net/publication/3422
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for
33982_An_in-Vehicle_System_and_Method_for_During_Accident_Detection_without_
accurate object detection and semantic segmentation. In Proceedings of the IEEE
being_Fixed_to_Vehicle”.
conference on computer vision and pattern recognition (pp. 580–587).
Attari, N., Ofli, F., Awad, M., Lucas, J., & Chawla, S. (2017). Nazr-CNN: Fine-grained
Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on
classification of UAV imagery for damage assessment. In 2017 IEEE International
computer vision (pp. 1440–1448).
Conference on Data Science and Advanced Analytics (DSAA (pp. 50–59).
Gomes, C., Jin, Z., & Yang, H. (2021). Insurance fraud detection with unsupervised deep
"Automatic vehicle damage detection with images, Available Online, https://repositorio
learning. Journal of Risk and Insurance, 88, 591–624.
-aberto.up.pt/bitstream/10216/107814/2/219929.pdf, Accessed on Nov, 2022".
Gong, J., Zhao, J., Li, F., & Zhang, H. (2020). Vehicle detection in thermal images with an
Bhatt, C., Kumar, I., Vijayakumar, V., Singh, K. U., & Kumar, A. (2021). The state of the
improved yolov3-tiny. In 2020 IEEE international conference on power, intelligent
art of deep learning models in medical science and their challenges. Multimedia
computing and systems (ICPICS) (pp. 253–256).
Systems, 27, 599–613.
Harshani, W. R., & Vidanage, K. (2017). Image processing based severity and cost
Choudhury, S., Chattopadhyay, S. P., & Hazra, T. K. (2017). Vehicle detection and
prediction of damages in the vehicle body: A computational intelligence approach.
counting using haar feature-based classifier. In 2017 8th annual industrial automation
In 2017 National Information Technology Conference (NITC) (pp. 18–21).
and electromechanical engineering conference (IEMECON) (pp. 106–109).
Haselhoff, A., & Kummert, A. (2009). A vehicle detection system based on haar and
Chua, A. C., Mercado, C. R. B., Pin, J. P. R., Tan, A. K. T., Tinhay, J. B. L., & Dadios, E. P.
triangle features. In 2009 IEEE intelligent vehicles symposium (pp. 261–266).
(2021). Damage Identification of Selected Car Parts Using Image Classification and
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the
Deep Learning. In 2021 IEEE 13th International Conference on Humanoid,
IEEE international conference on computer vision (pp. 2961–2969).
Nanotechnology, Information Technology, Communication and Control, Environment,
Imaam, F., Subasinghe, A., Kasthuriarachchi, H., Fernando, S., Haddela, P., &
and Management (HNICEM) (pp. 1–5).
Pemadasa, N. (2021). Moderate Automobile Accident Claim Process Automation
“Deep Learning-Based Car Damage Classification and Detection for Automotive Industry,
2022 Available Online, https://kili-technology.com/data-labeling/computer-visio

13
J. Qaddour and S.A. Siddiqa Intelligent Systems with Applications 18 (2023) 200192

Using Machine Learning. In 2021 International Conference on Computer Sünderhauf, N., Brock, O., Scheirer, W., Hadsell, R., Fox, D., & Leitner, J. (2018). The
Communication and Informatics (ICCCI) (pp. 1–6). limits and potentials of deep learning for robotics. The International Journal of
Károly, A. I., Galambos, P., Kuti, J., & Rudas, I. J. (2020). Deep learning in robotics: Robotics Research, 37, 405–420.
Survey on model structures and training strategies. IEEE Transactions on Systems, Sharma, A., Verma, A., & Gupta, D. (2019). Preventing Car Damage using CNN and
Man, and Cybernetics: Systems, 51, 266–279. Computer Vision. International Journal of Innovative Technology and Exploring
Karande, K. P. M. K. S. (2022). Deep Learning Based Car Damage Classification. TCS Engineering (IJITEE), 9, 1–5.
Innovation Labs, Pune, India. K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image
Kaushik, S., Raman, A., & Rao, K. R. (2020). Leveraging Computer Vision for Emergency recognition," arXiv preprint arXiv:1409.1556, 2014.
Vehicle Detection-Implementation and Analysis. In 2020 11th International Singh, R., Ayyar, M. P., Pavan, T. V. S., Gosain, S., & Shah, R. R. (2019). Automating car
Conference on Computing, Communication and Networking Technologies (ICCCNT) (pp. insurance claims using deep learning techniques. In 2019 IEEE Fifth International
1–6). Conference on Multimedia Big Data (BigMM) (pp. 199–207).
Kim, B., & Lee, J. (2019). A video-based fire detection using deep learning models. Son, C., Park, S., Lee, J., & Paik, J. (2019). Context Aware Vehicle Detection using
Applied Sciences, 9, 2862. Correlation Filter. In 2019 IEEE International Conference on Consumer Electronics
Kim, J.-. M., Yum, S.-. G., Park, H., & Bae, J. (2022). Strategic framework for natural (ICCE) (pp. 1–2).
disaster risk mitigation using deep learning and cost-benefit analysis. Natural Sudha, D., & Priyadarshini, J. (2020). An intelligent multiple vehicle detection and
Hazards and Earth System Sciences, 22, 2131–2144. tracking using modified vibe algorithm and deep learning algorithm. Soft Computing,
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). Imagenet classification with deep 24, 17417–17429.
convolutional neural networks. Communications of the ACM, 60, 84–90. Supraja, K., & Saritha, S. (2017). Robust fuzzy rule based technique to detect frauds in
Kyu, P. M., & Woraratpanya, K. (2020). Car damage detection and classification. In vehicle insurance. In 2017 International Conference on Energy, Communication, Data
Proceedings of the 11th international conference on advances in information technology Analytics and Soft Computing (ICECDS (pp. 3734–3739).
(pp. 1–6). “System_and_Method_for_During_Crash_Accident_Detection_and_Notification, 2022
Lin, T.-. Y., Maire, M., Belongie, S., Hays, J., Perona, P., & Ramanan, D. (2014). Microsoft Available Online https://www.researchgate.net/publication/342233879_System_a
coco: Common objects in context. In European conference on computer vision (pp. nd_Method_for_During_Crash_Accident_Detection_and_Notification”.
740–755). Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. A. (2017). Inception-v4, inception-
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., & Fu, C.-. Y. (2016). Ssd: Single resnet and the impact of residual connections on learning. In Thirty-first AAAI
shot multibox detector. In European conference on computer vision (pp. 21–37). conference on artificial intelligence.
A.N. Madheswari, J. Haripriya, G. Kiruthika, and R. Meyammai, "Exterior Vehicular van Ruitenbeek, R., & Bhulai, S. (2022). Convolutional Neural Networks for vehicle
Damage Detection using Deep Learning." 2022. damage detection. Machine Learning with Applications, Article 100332.
Mittal, U., Potnuru, R., & Chawla, P. (2020). Vehicle detection and classification using Wang, Y., & Xu, W. (2018). Leveraging deep learning with LDA-based text analytics to
improved faster region based convolution neural network. In 2020 8th International detect automobile insurance fraud. Decision Support Systems, 105, 87–95.
Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Wang, S., & Yang, K. (2008). An image scaling algorithm based on bilinear interpolation
Directions)(ICRITO) (pp. 511–514). with VC++. Techniques of Automation and Applications, 27, 44–45.
Neelam Jaikishore, C., Podaturpet Arunkumar, G., Jagannathan Srinath, A., Vamsi, H., Wang, H., Lou, X., Cai, Y., Li, Y., & Chen, L. (2019). Real-time vehicle detection
Srinivasan, K., & Ramesh, R. K. (2022). Implementation of Deep Learning Algorithm algorithm based on vision and lidar point cloud fusion. Journal of Sensors, 2019.
on a Custom Dataset for Advanced Driver Assistance Systems Applications. Applied Waqas, U., Akram, N., Kim, S., Lee, D., & Jeon, J. (2020). Vehicle damage classification
Sciences, 12, 8927. and fraudulent image detection including moiré effect using deep learning. In 2020
Patil, K., Kulkarni, M., Sriraman, A., & Karande, S. (2017). Deep learning based car IEEE Canadian Conference on Electrical and Computer Engineering (CCECE) (pp. 1–5).
damage classification. In 2017 16th IEEE international conference on machine learning Wassel, M. (2018). Property Casualty: Deterring Claims Leakage in the Digital Age.
and applications (ICMLA (pp. 50–54). Cognizant Insurance Practice. Tech. Rep.
Putra, F. A. I. A., Utaminingrum, F., & Mahmudy, W. F. (2020). HOG feature extraction "Why Estimating Car Damage with Machine Learning Is Hard, Available Online, https
and KNN classification for detecting vehicle in the highway. IJCCS (Indonesian ://www.altoros.com/blog/automating-car-damage-estimation-for-insurance-with-
Journal of Computing and Cybernetics Systems), 14, 231–242. machine-learning/, Accessed On, Nov, 2022".
J. Redmon and A. Farhadi, "Yolov3: An incremental improvement," arXiv preprint arXiv: Yuan, C., Huo, C., Tong, Z., Men, G., & Wang, Y. (2019). Research on Vehicle Detection
1804.02767, 2018. Algorithm of Driver Assistance System Based on Vision. In 2019 Chinese Control And
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You look only once: Unified real- Decision Conference (CCDC) (pp. 1024–1027).
time object detection," arXiv preprint arXiv:1506.02640, 2015. Zhang, Q., Chang, X., & Bian, S. B. (2020). Vehicle-damage-detection segmentation
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object algorithm based on improved mask RCNN. IEEE Access, 8, 6997–7004.
detection with region proposal networks. In Advances in neural information processing Zhao, K., Ren, X., Kong, Z., & Liu, M. (2019). Object detection on remote sensing images
systems, 28. using deep learning: An improved single shot multibox detector method. Journal of
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., & Ma, S. (2015). Imagenet Electronic Imaging, 28, Article 033026.
large scale visual recognition challenge. International journal of computer vision, 115,
211–252.

14

You might also like