Automatic Vehicle Detection System in Different Environment Conditions Using Fast R-CNN
Automatic Vehicle Detection System in Different Environment Conditions Using Fast R-CNN
https://doi.org/10.1007/s11042-022-12347-8
Nitika Arora 1 & Yogesh Kumar 2 & Rashmi Karkra 1 & Munish Kumar 3
Abstract
Vehicle detection and classification is a challenging move in the field of traffic manage-
ment and surveillance. With the rapid increase in the number of vehicles on roads, streets,
and highways, the Intelligent Transport System (ITS) requirement has become inevitable.
Vehicle detection and recognition systems have their roots embedded in ITS. It has been
observed that different researchers have done much work in Day mode detection using
Machine and Deep learning techniques. However, most of them faced difficulty in
detection due to inadequate data, low illumination conditions, misclassification due to
long shadows of vehicles, and testing on static frames. On the other hand, night vision
detection is also facing difficulty due to low illumination conditions. The proposed work
focuses on detecting moving vehicles in both day and night mode using a region-based
deep learning technique called fast region based convolutional neural network (fast R-
CNN). The proposed work has achieved promising results in situations like detection in
the presence of long shadows, cloudy weather, detections in dense traffic during day
vision, and pioneers the results in night mode conditions. Four evaluation parameters
were used to test the system’s efficiency, mainly Recall, Accuracy, Precision, and
* Munish Kumar
[email protected]
Nitika Arora
[email protected]
Yogesh Kumar
[email protected]
Rashmi Karkra
[email protected]
1
Department of Computer Science & Engineering, Chandigarh Engineering College, Landran, Mohali,
India
2
Department of Computer Engineering, Indus University, Ahmedabad, Gujarat, India
3
Department of Computational Sciences, Maharaja Ranjit Singh Punjab Technical University,
Bathinda, Punjab, India
18716 Multimedia Tools and Applications (2022) 81:18715–18735
Processing time. The proposed work achieved an overall average computation time of
0.59 s. Overall average Recall, accuracy, and precision of vehicle detection in day and
night mode achieved were 98.44%, 94.20%, and 90%, respectively.
Keywords Intelligent transport system . Deep learning . Fast R-CNN . Vehicle detection .
Traffic management
1 Introduction
In recent times, Road traffic has increased tenfold compared to former times. Road safety has
become the primary concern of every individual as it has severe repercussions on the general
health and welfare of the people. According to a study, on average, at least one person dies
every minute of the hour in a vehicle crash or collision. Therefore, Vehicle Detection and
Recognition is an onerous move in Traffic Management as it demands special attention and
technology for the efficient management of vehicles. Vehicle Recognition and classification is
a process of determining the potential location of a moving vehicle on the road. This
knowledge is used to classify different types of vehicles and analyze the flow rate. Intelligent
Transport System (ITS) (Shaheen and Finson 2013 [25]) is an eminent vehicle detection and
classification application. It is a process of identifying the moving vehicle on the road to
analyze the flow rate and then accurately classify different objects. Lately, building an
automatic onboard driver assistance system to assist drivers about possible collisions and
clashes has received immense significance [20, 23]. It is predicted that the hospital bill,
damaged property, and other costs will add up to 1–3% of the world’s gross domestic product.
Many researchers have proposed different methodologies using different source inputs to
detect vehicles at day and night vision. It is becoming an active research area among
automotive manufacturers, suppliers, and universities to reduce injury, accident severity, and
pre-crash sensing. Vision-based vehicle detection has gained immense popularity in the past
few years due to mainly three reasons [11, 16, 29]:
1. Loss caused to human lives and property due to accidents during day and night vision
(Juric and Loncaric 2014 [10]).
2. Availability of high-tech and sustainable computer vision technologies.
3. Advancement in the development of high-speed processors.
Due to the deteriorating quality of roads and infrastructure (Tang et al. 2015 [30]), traffic
congestion has also hindered inefficient traffic management. Therefore, it becomes necessary
to interpret the data from different environments using inputs (videos, images, etc.) and then
convert it into a desirable format for classification. At the same time, with the evolving tech-
savvy generation in the automation industry, Intelligent Transport System (ITS) [4] seems to
be the feasible solution for vehicle identification and management. This imminent service by
ITS can be accomplished using static sources (sensors, cameras, etc.) (Soetedjo and
Somawirata 2018 [26]) mounted over the street light or a building through which a frame is
extracted, and flow rate analysis can be done. The most common active sensors for vehicle
detection include radar-based, laser-based, and acoustic-based systems [12]. ITS is more
pronounced for its immense applications. The major ones are surveying vehicle count,
occlusion prevention, parking management, and detection of road accidents. Developing a
Multimedia Tools and Applications (2022) 81:18715–18735 18717
machine vision system in a real-time scenario for classifying the vehicles in a complicated
atmosphere is a challenging task in terms of computation. Researchers have attempted to
improve the capability and robustness of machine vision systems. The research work carried
out is presented in the various sections. The first section briefly introduces the research topic,
various research applications, and the related challenges. A detailed related literature survey is
presented in the second section. The literature survey covers different Automatic vehicle
detection systems based on their learning techniques and applications. The third section
highlights the problem and challenges of vehicle detection systems. The section also formu-
lates the problem definition of the vehicle detection system. The fourth section provides the
novelty and contribution outline of the research. The fifth section focuses on the proposed
methodology for an automatic vehicle detection system, and finally, the results are represented
in the Sixth section of the paper. Comparative analysis is represented in Section seventh, and
the eighth section of the paper defines the conclusion and future work.
2 Related work
Many researchers have toiled relentlessly to achieve excellent vehicle detection and classifi-
cation results in both day and night mode. However, a tremendous amount of data is being
generated each day from real-time traffic inputs. In order to handle such a massive amount of
data, new technologies are collaborated with computer vision methodologies to harness better
accuracy from the system. Machine learning and deep learning are the two areas that drive
autonomous vehicles, driverless cars, and many pattern recognition applications. The section
below discusses the profound study done on Day and Night vision vehicle detection by
different researchers using different deep learning and machine learning techniques. The
majority of the methods for vehicle detection reported in the survey adhere to two steps: 1)
Hypothesis Generation (HG), where the potential location of the vehicle is framed, and 2)
Hypothesis Verification (HV), which verifies the presence of vehicle (Tu and Du 2018 [31]).
Also, the sections below discuss the feature extraction and classification techniques employed
so far by the distinct researchers in their research areas of vehicle detection. Recent advent in
technology has helped in reducing human and manual effort. The ML algorithms are first
inputted with the training dataset and validate the accuracy with new testing data. Algorithms
such as Support Vector Machine (SVM), ADA boost, Gaussian Mixture Model (GMM),
Linear Discriminant Analysis (LDA), and K-Nearest Neighbor (KNN) are the most widely
implemented machine learning techniques in the domain of vehicle detection. However, SVM
and Ada Boost have shown tremendous results in detecting and classifying vehicles in both
day and night mode, such as in [13]. Results were obtained for three classes of vehicles using
the SVM classifier at night mode. The latter obtained 95.82% accuracy. A combination of
feature extraction techniques was used to obtain features from the dataset. Similarly, [35]
implemented an SVM classifier to achieve efficient vehicle detection and classification during
night vision. The proposed study implemented thresholding as a segmentation technique and
Gabor filter for feature extraction on frontal car images and improved the system accuracy. [7]
suggested a method for automatic traffic data collection to improve detection and classification
accuracy. The proposed method integrated existing computer vision techniques with thermal
video sensors. Histogram of gradient (HOG) was used as a feature extraction technique, and
Support Vector Machine (SVM) was used as a classifier for classifying vehicles from a
thermal camera. The proposed system achieved improved accuracy during Night time.
18718 Multimedia Tools and Applications (2022) 81:18715–18735
Another nighttime vehicle detection technique was implemented using the Ada boost classifier
was implemented in (Satzoda and Trivedi 2016 [24]). The experiment was performed in
ambient lighting conditions on massive data set using the Haar-like feature extraction tech-
nique. The proposed methodology obtained 98% accuracy. It has also been observed that
accurate results have been achieved using machine learning techniques during day vision. A
combination of ADA boost and SVM classifier was implemented [33]. The proposed work
achieved 97.96% accuracy by employing Haar and HOG feature extraction techniques.
Another system was suggested by [9] using ADA boost. Efficient results were obtained using
LBP and Haar feature extraction techniques.
Background subtraction has proved to be an efficient technique in differentiating fore-
ground and background techniques, and [1] used [15] background subtraction techniques for
efficient detection of vehicles and achieved 99% and 97.3% accuracy, respectively. Tech-
niques like Scale Invariant Feature Transform (SIFT), combined with different preprocessing
techniques, enhance the quality of vehicle detection such as in (Manzoor and Morgan 2017
[18]), SIFT, and with Histogram equalization was implemented to segment the desired area
efficiently. The proposed work obtained an accuracy of 89% on the NTOUMMR dataset.
(Razakarivony and Jurie 2016 [22]) proposed a vehicle detection system for aerial imagery.
The study created a new database of aerial images containing vehicles. A combination of
Histogram of Gradients (HOG), local binary pattern (LBP), and local ternary pattern (LTP)
were employed for the extraction of features from the aerial image. Further, a Support vector
machine (SVM) was used as a classifier.
Gaussian Mixture Model is a supervised learning classifier that works well with unlabelled
data and computes the probability of the potential location of vehicles. In the field, [8]
proposed a novel vehicle detection and tracking approach. The study used the ROI segmen-
tation technique followed by blob analysis for efficient detection of the desired area. Kalman
filter was used for object tracking, and a different Gaussian model mixture was used for
training the data for efficient classification. The proposed methodology achieved 97.22%
accuracy for detecting light traffic vehicles and 79.63% for heavyweight traffic. Deep learning
techniques are used to process the data with more abstractions and give efficient results. Many
researchers have proposed indigenous vehicle detection systems using deep learning in both
day and night mode. Neural networks have gained much importance due to their capability of
computing results like a human brain. In [14] presented a nighttime vehicle detection system
inspired by the retinal mechanism of natural visual processing that models adaptive feedback
from horizontal cells. A Convolutional neural network was initially used to extract features and
was thus trained using an SVM classifier. The study achieved a 95.95% of detection rate. The
stated methodology was observed to be robust in detecting vehicles of different sizes and
shapes. It was observed that morphological operations and deep learning techniques added a
different combination for extracting essential and discriminative features from the vehicle
dataset. [3] Moreover, [21] used morphological operation followed by neural network classi-
fiers for efficient detection and classification of vehicles during day vision. In [3], background
subtraction was implemented to distinguish between foreground and background regions
properly. Further, ANN was used compared with naïve Bayes and KNN techniques, and the
study achieved an overall accuracy of 99.8%. The LIDAR and vision fusion approach was
implemented in [6] to detect cars on the road efficiently. The study implemented a deep
learning framework and achieved 89.4% efficiency in detection during day Vision. Another
novel approach for vehicle detection and identification was suggested by [2]. The study was
conducted in both day and night vision. CNN was used as a classifier and achieved
Multimedia Tools and Applications (2022) 81:18715–18735 18719
98.4% accuracy in the detection of moving vehicles. Similarly, [5] proposed a vehicle
detection system for day vision using CNN deep learning technique. The experiment was
conducted using Appearance-based feature extraction techniques and showed improved
results.
3 Research challenges
ITS-based systems are being employed in different areas of traffic planning and security for
extracting meaningful information. However, achieving the target of detecting the moving
vehicle on roads requires robust mechanisms and algorithms that possess efficient extraction
techniques. A sturdy video-based tracking system must be flexible to adapt to the dynamics in
the environment. Sadly, there still exist vulnerabilities such as handling low illumination
situations, shuddering cameras, noise, less diverse data, and inefficient detection of vehicles
at night. Detecting vehicles in the daytime is also a strenuous task as long shadows due to
sunlight can cause problems like misclassification or occlusion. It is observed that current
research work has used the less diverse and small size of input data. On the other hand, night
vision detection has its challenges as it lacks proper illumination, making it difficult for the
classifier to detect efficiently. The proposed work is subjected to build an efficient vehicle
detection and classification system that can overrule the other researchers’ challenges in both
day and night modes.
Monocular vision-based vehicle detection systems are designed to derive the ROI using
different parameters that are shape, motion, color, texture, and region matching. Many authors
have used novel and indigenous machine learning and deep learning techniques to conduct
vehicle detection and classification. Although they have achieved efficient and accurate
results, there still is room for improvement as every environmental condition requires sturdy
techniques. The section below illustrates research gaps that different researchers faced in their
research work in both day and night vision.
a) It was found that existing systems were not efficient in detecting vehicles in different
environmental conditions, such as low illumination. This decreases the system’s overall
productivity.
b) Major work reported in the survey employed video frames that were not adequate in size
to decide the overall performance of an algorithm. In order to get a proper hold on an
algorithm, sufficient training samples should be provided to avoid misclassification while
detecting vehicles possessing the same shape and features.
c) Most of the researchers faced ambiguity in classifying vehicles that possess the same
features as cars and vans. This created the problem of misclassification and had much
impact on computing accuracy.
d) It was observed that the current study primarily was conducted on static frames of vehicles
taken from only one camera. The evaluation of the study was not performed on real-time
Traffic Video.
e) Evaluation of detection results in the daytime has also been a strenuous task as factors like
vehicle shadows increase the misclassification rate.
18720 Multimedia Tools and Applications (2022) 81:18715–18735
f) Often the placement of the camera can create different results. Images captured from
different angles create a significant impact on the detection of vehicles both day and night.
g) Different results are produced using the static and dynamic camera. Static images captured
from a camera are relatively easier to detect and classify. On the other hand, dynamic
cameras record footage of vehicles in motion, and most of the work is done on static
images. To detect vehicles in motion, mostly R-CNN or fast R-CNN methodologies are
employed, which are robust and give promising results.
Table 1 covers the current work with their limitations faced during research work. Although
they have achieved great accuracy during day vision, there is still room for improvement.
The major fallbacks observed were using less diverse frames for detection, occlusion,
misclassification, inefficient performance of the algorithm in low illumination conditions,
and consideration of only one single road in the frame. Such as in [1], the proposed
method employed a supervised method of learning and achieved 99.3% accuracy;
however, the testing of the proposed method was performed on static captured images
and was unable to detect vehicles with a heavyweight. Likewise, [15] performed vehicle
detection in day vision using Ada Boost and SVM as classifiers and achieved static
camera images’ accuracy. Later it was observed that the proposed method could not
perform well on real-time video sequences. Detection of vehicles in cloudy weather is
challenging as it lacks proper illumination, and systems are sometimes unable to rectify a
vehicle properly. This scenario has been seen in the study proposed by [21]. This calls
for a requirement for a new methodology of deep learning for motion-based vehicle
detection and classification.
Inadequate and improper training samples can create a massive influence on accuracy. This
was the case observed during [13] at night by using SVM as a classifier. Also, results achieved
by [5] using CNN could not correctly classify vehicles with the same features as Van and cars.
[3] conducted the study at both day and night vision using the Artificial Neural Network
(ANN) classifier. However, the methodology could not achieve real-time accuracy in adverse
lighting conditions. The section below describes a summary of significant fallbacks in the
detection of day and night vision.
[1] Day Supervised Learning Misclassification occurred in detecting heavy weight vehicles
Technique such as cargo trucks and testing was performed on static
images containing vehicles.
[3] Both Day Artificial neural Proposed method could not achieve real time accuracy in
and Network dynamic change of illumination conditions.
Night
[5] Day Convolutional Neural Ambiguity in classifying vehicles exhibiting same features
Network like cars and vans
[15] Day ADA Boost and The evaluation was done on static frames and not on the real
Supervised vector time video sequence.
Machine
[21] Day Supervised learning Low efficiency in detecting vehicles in cloudy weather.
Technique.
[13] Night Supervised Learning Images used in dataset were obscure because of improper
Technique illumination.
Multimedia Tools and Applications (2022) 81:18715–18735 18721
4 Contribution outline
As observed in the existing studies, handcrafted features are effective in vehicle detection and
classification to a certain extent as the efficacy of the system is dependent on extensive domain
knowledge. On the other hand, DL models have an end-to-end process and are capable of
learning features automatically. However, they require complex network architecture and long
calculation time in order to achieve better classification results. The study aims at combining
the advantages of both handcrafted features and DL models without having to increase the
complexity of DL architecture. The study incorporates handcrafted features whose efficacy
had been verified by exiting studies with Fast R-CNN model that lacked the potential to
recognize local features, enabling the model to determine other effective features.
Implementing a large amount of data collected under various environments (day and night)
to detect and classify different vehicles is the primary aim of this study. Thus, this study allows
us to have multiple objectives:
a) The proposed research method integrates the abstract form of CNN that is fast region-
based CNN (Fast R-CNN). Compared to state-of-the-art work, the study is unique as it is
not only designed for a single detection mode. The evaluation is done in both day and
night mode. For day mode, the study is presented under different environments such as
cloudy weather, dense traffic, and the presence of long shadows.
b) Deep learning methods are sensitive to external image noise that highly influences
detection performance. To avoid the imbalance and to amplify the features, the Fore-
ground detection of vehicles in the video frame is done by using foreground detector and
morphological operators trained by GMM to reduce the noise. These are performed by the
sub-networks of the fast R-CNN architecture. The study integrates the Kalman filter for
achieving vehicle tracking objectives by estimating the Position vector. The last phase
implements the detection and labeling of vehicles using fast R-CNN.
c) As no state-of-the-art works provide diverse data for validation, we created a dataset of
3975 images under different conditions and annotated them for vehicle detection and
classification. This depicts an accurate picture to quantify the Fast R-CNN appropriately
and is helpful for research purposes.
d) Furthermore, the proposed system provides good accuracy in detecting vehicles at night
and day mode which adds the novelty.
The proposed approach includes datasets prepared by downloading videos from different
YouTube channels followed by vehicle detection and then classification through fast R-CNN.
5 Research methodology
The proposed study of vehicle detection and classification uses fast Region-based CNN
technology for efficient object detection on the road. The system is based on automatic
detection and classification of moving vehicles on the road by employing a real-time video
of front-side vehicles. The study has been conducted using four online YouTube videos
consisting of a total of 3975 frames. It contains a series of challenging scenes such as occlusion
of other objects, day and night, cloudy weather. Video for cloudy weather consists of 345
frames of 640 × 360 pixels. The second video was captured under night mode, which consists
18722 Multimedia Tools and Applications (2022) 81:18715–18735
of 1232 frames of 426 × 240 pixels. Two videos of a total of 2086 frames of 640 × 360 pixels
were chosen for researching day mode. Hence, the dataset provides a rich test field scene. Fast
R-CNN has rarely been used in vehicle detection and classification and hence adds novelty to
the present article. The Region-based deep learning system Fast R-CNN has been used for the
automatic classification of vehicles in different weather conditions. As stated in the objectives,
the method aims at developing a system that is less complex and can provide desired results in
less computation time. The existing methods have used less diverse data. They have been
taking features for their classification which cannot define the overall applicability of the
system to that precision. In the proposed method, the authors have adopted the below
methodology for obtaining high accuracy system. With the data taken in different modes, it
will be better to explain the quality of the system in tackling different illumination conditions.
Figure 1 annotates the techniques used at each phase of vehicle detection. The proposed
flow of work follows a sequential process for detecting vehicles in the different modes and is
based on dynamic video testing of vehicles.
Step 3: Position feature estimation followed by feature extraction in the dynamic video is
done by inducing Kalman filter.
i) Computation of two things repeatedly that is measurement model and state prediction for
estimating the position vector of the vehicle in the frame.
ii) Estimating noise using and removing them from the tracks to increase the efficiency.
Step 4: Saving the outputs for further processing. A snapshot is displayed to show the
gathered outputs.
Step 5: Training and building of model using initial features derived using fast R-CNN.
Step 6: Uploading of preprocessed outputs from steps 3 and 4 to obtain the Region of interest
using integrated Fast R-CNN architecture.
Step 7: Performing Blob analysis and classification of detected vehicles using the SoftMax
layer.
Step 8: Performance Evaluation under a different set of conditions using four parameters:
Accuracy, Processing Time, Precision and recall.
The algorithm is the manifestation of the working of the proposed system as to how it
will detect vehicles from the video. The proposed system has used fast R-CNN which
brings novelty to vehicle detection systems. Each technique has been employed by taking
into consideration the limitations from existing work and are explained in the following
sections.
Multimedia Tools and Applications (2022) 81:18715–18735 18723
The vehicle detection system is designed for increasing productivity, safety, and environmen-
tal effectiveness for traffic moving on the road. In order to achieve this, the system must
process every video frame carefully without compromising any critical factor. The major
limitation of current work is that different lighting conditions have not been taken into account,
reducing the efficiency of a model. Data preparation is a significant step in order to achieve
high accuracy. The proposed system focuses on dynamic video sequence as input taken in both
day and night mode which is preprocessed by using two techniques to maintain the balance
and amplify the detection results; foreground detection trained by GMM and Morphological
operation for noise removal described in the section below. To enhance the robustness of the
algorithm in all respect and to mitigate the imbalance in the foreground and background
images, the preprocessing techniques are integrated into the model network. Our model differs
from other vehicle tracking and detection model based on deep learning where there is no
preprocessing and tracking part. Data can be acquired from different online sources. The
system conducts the Vehicle detection from the front angle. The foreground detector trained by
the Gaussian mixture model draws an analogy between True color (RGB) and Grayscale
video. The proposed method uses three Gaussian mixtures concerning vehicle, road, and
18724 Multimedia Tools and Applications (2022) 81:18715–18735
shadow for each background pixel. This is done to ascertain whether the individual pixels are
part of the foreground or background. Further, this technique computes a foreground mask for
determining the desired area. The method considers the value of current pixel probability,
which is given by the following equation,
XN
P ðX t Þ ¼ ! η X t ; μi;t ; σ
i¼1 i;t
ð1Þ
mixture at time t. Standard deviation, weight and mean are revised for every new pixel. The
threshold is applied to obtain the background image. In Fig. 2, clear vehicle images with
appropriate illumination and sharpness have been obtained under different illumination
conditions.
Image distortions and noise can highly affect the performance of the system. Therefore, to
remove this noise and enhance the image quality, the morphological operation is applied using
Fig. 2 Process of Foreground detection in different traffic scenes: a) In presence of Long shadows, b) Cloudy
weather, c) Night Vision
Multimedia Tools and Applications (2022) 81:18715–18735 18725
a structuring element of size. It fills the open holes and produces a high-quality image, as
illustrated in Fig. 3. The operator is applied on every frame obtained in different illumination
conditions.
There is another factor that is highly important for detecting and classifying moving vehicles,
known as features. There can be many derived features such as shape, size, color, texture, and
Fig. 3 Illustration of Clean foreground images for different modes 1) Long Shadows, 2) Cloudy weather and 3)
Night Vision
18726 Multimedia Tools and Applications (2022) 81:18715–18735
region. Different feature extraction techniques were reported for vehicle detection and classi-
fication, such as Kalman filter, SIFT, SURF, HAAR, and many more. Traditional
methods of extracting features especially based on image formation have encountered
various challenges in a complex environment. The study is based on the detection of
moving vehicles on road. The proposed approach integrated Kalman Filter to give a
better idea about the position where the re-detection should emphasize. This feature
extraction technique is integrated into the model to determine the position feature of the
moving vehicle. It is induced in the model to achieve the target of vehicle tracking using
the equations as represented in [2, 3]. It helps to reduce the number of false positives by
improving the re-detection rate. The linear recursive motion of Kalman filters helps in
targeting the position of the vehicle in the next frame. It works by repeatedly computing
two things: measurement model and state prediction for estimating the position vector of
the vehicle in the frame as depicted in Fig. 4.
The measurement model and process are defined by following equations:
xtþ1 ¼ f t xt þ wt ð2Þ
y t ¼ H t xt þ v t ð3Þ
where xt and ytare defined as state and measurement vectors at time t, ft is the Transition Matrix
defined at time t, Ht is Measurement Matrix at time t,wt is called as Process Noise and vt is the
Measurement Noise. In the proposed method, velocity is considered as constant. Center point
(x,y) and area A of the detected vehicle are the variables included in Kalman Filter. The initial
state matrix is defined as:
[x, y, A, vx, vy, vA ] (4)where vx and vy is defined as velocity in the movement of x
direction and y direction respectively. vA is the parameter of rate of change in the size of
vehicle. After defining the state matrix, we constructed Transition matrix (ft) and
Measurement model (Ht) as shown in Eq. ([5, 6] respectively for predicting the next
state.
2 3
1 0 dt 0
6 0 1 0 dt 7
ft ¼ 6
40 0 1 0 5
7 ð5Þ
0 0 0 1
0 0 1 0
Ht ¼ ð6Þ
0 0 0 1
Estimation of center point (x,y) and Area(A) will help in deciding the position and size of
vehicle to be re-detected.
Many authors have widely used deep learning techniques in their reported work. CNN is the
most representative model of deep learning. As compared to shallow traditional models, more
profound architecture always provides the exponentially enhanced capability. Each layer of
CNN consists of a 3d matrix known as a feature map. It works by taking an input image and
pooling out the region of interest on which it applies different transformations such as filtering
and pooling. Final responses are obtained in the last layer with a different activation function.
There are two methods of generic object detection. One is based on a region proposal that
includes Spatial pyramid pooling (SPP) net [32], R-CNN, Fast R-CNN, Mask R-CNN, FPN,
and many more. The second method is based on regression/classification, including algorithms
like Graph CNN, YOLO, YOLOv2 [27] and YOLOv3. The proposed method uses a region
proposal-based method called Fast R-CNN, which matches the mechanism of the human brain
to the most extent. Fast R-CNN can overcome the limitation of CNN and R-CNN, such as Fast
computation, predictions, storage, and performance.
The fast R-CNN method is trained independently in MATLAB using a deep learning
toolbox. The architecture of Fast R-CNN is exhibited in Fig. 5. It reads a new frame from the
video dataset to produce a feature map. A fixed-length feature vector is then extracted by the
ROI pooling layer which divides the video frame into multiple windows by using a selective
approach. These are then treated as independent image frames and are sent to the sequence of
fully connected (FC) layers that detect the presence of a vehicle and also analyze the noise in
the image. There are two branches of the output layer. One layer is responsible for generating
softmax possibilities for detected vehicles. Another layer determines the size of the detected
vehicle and runs a loop for generating the bounding box to capture and visually represent those
in the evaluation phase. All the parameters are optimized via Multi-task Loss which is
associated with the output layer to enhance the performance. It basically uses a greedy
algorithm that recursively combines the regions which are similar and produce one output
image.
The purpose of Multi task-loss (L) function is to train the two sibling branches of output
layer. It is consisting of two parts, the loss of classification(Lcls) and bounding box
regression(Lloc). The Loss function is defined as:
Lð p; u; t u ; t Þ ¼ Lcls ð p; uÞ þ λ½u 1Lloc ðtu ; t Þ ð7Þ
Where Lcls(p, u) = log(1/pu) which calculates the log loss for ground truth class u and p is the
discrete probability of N + 1 output obtained from last FC layer.Lloc Calculates smoothed L1
loss as shown in eq. [8] between predicted bounding box positions tu ¼ tux ; tuy ; tuw; t uh and
target offset positiont ¼ tx ; ty ; tw; t h where x, y, w, h denotes the two coordinates of box-
center, width and height respectively. [u ≥ 1] is defined to omit all the unnecessary and
background ROIs in the frame. Smoothed L1discards outliers and eliminates sensitivity which
increases the system’s overall performance. It is defined as:
X
Lloc tu ; t ¼ i2x;y;w;h
smoothL1 t ui ti ð8Þ
Finally, after the optimizing multi-task loss function, the last step involves labeling of the
vehicles into mainly three classes such as small, medium and large on the basis of size of the
boundary box formed around the vehicles as shown in below sections.
The proposed method integrates the use of Fast R-CNN, which is a robust region-based deep
learning method. It creates a significant impact in evaluating system performance. There exist
multiple ways of acquiring the input data like acquiring data through various sensors, using
cameras, capturing images through mobile phones, taking pre-existing YouTube videos,
taking pre-existing prepared datasets from the data repositories like UCI, Kaggle, etc., or
directly taking the images from Google. For the proposed system, the authors found it
appropriate to create custom-based datasets by using videos from YouTube. The proposed
system emphasizes detecting vehicles under various lighting conditions such as bright sunny,
cloudy weather, and night. Most of the existing work in the reported study was done at day
mode in healthy brightness conditions. Our proposed method is novel as it analyzes the
detection results under night vision and cloudy weather as less work has been proposed in
low illumination situations. The entire research work has been conducted using the MATLAB
tool as the implementation can be done quickly. The GUI of MATLAB is a high-performance
language that combines computation, visualization, and programming quickly. The running
platform is Windows 10 64-bit operating system with Intel® HD Graphics 520 Graphics and
has 8 GB DDR3 memory, Intel(R) Core (TM) i5-6300U processor, [email protected] GHz. The
study uses traffic scene videos input in day and night mode containing frames 2743 of Day
videos and 1232 of night vision, respectively. The performance evaluation uses four param-
eters: accuracy, precision, recall, and processing time. Accuracy is defined as the total number
of correct predictions divided by the total number of predictions made. Precision is defined as
the total True Positives from all the examples divided by predicted examples belonging to
Multimedia Tools and Applications (2022) 81:18715–18735 18729
some class. The recall is defined as the division of True Positive examples by the total number
of predictions that belonged to the same class. Processing time is defined as the time taken for
computing desired results. The method is designed to detect frontal vehicles. The section
below describes the various accomplishments achieved by the proposed method in different
environments.
As observed in literature work, the shadows of vehicles, dense traffic, and low illumination
conditions can create a problem while detecting vehicles in Day mode. The proposed method
overrules the limitation of detecting vehicles in dense traffic and the presence of long shadows,
as shown in Fig. 6(a) and (b), respectively. It depicts the precise detection of vehicles while not
being disturbed by the long shadows due to the sunlight. Also, the system has proven to be
better in case of detecting vehicles when there is traffic congestion.
Our model has good robustness as compared to other methods. The proposed system
also includes the detection of vehicles in cloudy weather. As compared to a typical day,
cloudy weather has less illumination. Less work has been reported in context to detection
in low illumination conditions. However, the present study covers the area of low
illumination as well. Our study has achieved good results in detecting vehicles, as shown
in Fig. 7. The yellow bounding box represents the detected vehicles and the classification
labels as small, medium, and large. The study took 0.62 s to compute the results. Also,
the proposed method achieved 94% accuracy in detecting vehicles in video containing
345 frames of 640 × 360, as shown in Fig. 7. The system obtained 98% recall and 90%
precision in the detection of vehicles.
Fast R-CNN is famous for its significant features, such as computing fast and accurate
results. It is the time taken by the system to compute results. Processing time should be low for
a system to produce desired results. A-Line graph analysis is presented, which defines the
processing time taken by fast R-CNN in computing desired results during daytime. The graph
consists of the analysis of all three cases discussed in day mode. We can see that our model
precisely detects the multi-target vehicles in less processing time (Fig. 8).
Fig. 6 Labeled detection results: (a) In presence of Traffic congestion, (b) Detection of vehicles in presence of
Long shadows.
18730 Multimedia Tools and Applications (2022) 81:18715–18735
The table below presents the overall performance achieved in Day Mode detection of vehicles
using four metrics stated above. The authors have computed the evaluation metrics for each
case in day mode, and the results are shown in Table 2.
Table 2shows the performance evaluation using Fast R-CNN classification in day mode.
The data in the table is the result of performance metrics used. Very few systems have
considered the factors like low illumination, traffic congestion, and shadows of vehicles. This
clearly shows that the proposed fast R-CNN approach can detect vehicles in adverse situations
as well. For a system to be efficient and robust, processing time should be low, and Accuracy,
Recall, Precision should be high. A bar graph representation in Fig. 9 of the performance
Table 2 Performance analysis of different evaluation metrics used in Day vision detection.
metric in day mode indicates that processing time has come out to be less for each case in the
proposed system, and all other metrics are high.
Night vision vehicle detection is still at the struggling side, as reported in the current work. The
proposed methodology is novel as it is also designed and trained to detect moving vehicles in
Night mode. Low illumination causes several challenges in the localization of vehicles, such as
misclassification, increase time in computing results, and many more. The present method uses
night video containing 1232 frames of 426 × 240 to evaluate the performance. A visual
representation is shown in Fig. 10.
Figure 10 shows the detection of vehicles in extremely dark conditions, which validates the
applicability of the proposed method. The study achieved 94.35% accuracy in the detection of
vehicles. Also, recall, precision, and computation time measured were 99.24%, 90%, and
0.39 s, respectively.
6.3 Overall average performance analysis of proposed system in day and night mode
This section draws an overall analogy between the performance metrics in both day and night
mode. Results of Day mode performance metrics have been averaged for all three cases.
Table 3 shows the percentage of performance metrics used in the proposed methodology.
For most of the problems, systems have been chiefly designed for single-mode, mainly for
Day mode. It is comparatively easy to perform research in good illumination conditions.
Exceptionally few systems have been designed for multiple modes (Day and night). The
proposed research has integrated the methodology for both Day and night vision. A general-
purpose system has been constructed using the region-based Fast-R-CNN model, which is
rarely used in the Vehicle Detection area. The data in all the above tables represent that the
performance of the proposed system is not only better for day mode but also covers low
illumination conditions such as night and cloudy weather. The bar graph shows the overall
scenario clearly explains that for every mode in vehicle detection, the applied Fast R-CNN
classifier has brought the best results for the mechanism in the proposed work. The overall
average accuracy achieved in day mode is 94.06%, and in night mode is 94.35%. The average
accuracy of the overall model is 94.20%. Therefore, the objectives to achieve efficient
detection results in different illumination conditions, dense traffic, and in the presence of long
shadows have been accomplished (Fig. 11).
Table 4 represents the comparison of results achieved in the proposed vehicle detection system
with other start-of-the-art detection methods such as CNN, Single-shot multi-box Detection
(SSD), Neural Network, Faster R-CNN, Yolov3, SPP net, and R-CNN. The present research
study comprises both day and night mode results for vehicle detection and classification in
different cases. Therefore, it adds a certain novelty to the study of the vehicle detection system.
Table 4 shows the comparative analysis of results achieved by different state-of-the-art
methods in deep learning with the existing Fast R-CNN approach. It can be seen that our
model has achieved good results when compared to others. The proposed model improves the
detection results by overcoming different challenges faced by other researchers. As in [34],
promising results have been achieved using another region-based approach, faster R-CNN.
However, the results were obtained using a single mode of detection. Similarly, in [32]
Table 3 Overall comparative analysis of performance metrics in Day and Night mode
Mode of Detection Processing Time (sec) Accuracy (%) Recall (%) Precision (%)
combination of YOLOv3 and SPP net was integrated however the results were obtained in
only day mode. This reduces the generalization of the model. The single-shot multi-detector
technique was adopted by [36] in both day and night mode, but the detection accuracy was
reduced due to the absence of a batch normalization layer. Similarly, in [19], the author has
used the Faster R-CNN approach, but the model took a long time to produce the results
compared to the proposed approach. Different limitations have been observed, such as less
diverse datasets and misclassification in low illumination, which has been improved by the
proposed method. After comparison, it can be observed that the proposed approach can detect
multiple vehicles even in challenging traffic scenes.
Automatic vehicle detection is a critical application of Intelligent Transport System (ITS). The
deployment of ITS technologies on roads and highways has enormously become necessary
with the increase in traffic. The system has raised immense significance in the area of planning
and security bodies. The study’s main objective was to conduct the overall survey of existing
work in different modes to improve the results by overruling the demerits. The study used fast
R-CNN, a region-based deep learning technique and is famous for moving objects in less
computation time. The proposed work is computed in MATLAB using a Deep learning
toolbox on different videos taken under different traffic scenes such as Cloudy weather, bright
sunny, and night mode. The above methodology works well with complex scenes such as
dense traffic, long shadows, and low illumination conditions as cloudy weather and night
mode detections. To capture rich and discriminative information, foreground detection with
morphological operators was used by the system. Since the proposed work is based on the
detection of moving vehicles, the position estimation of vehicles is done using the Kalman
filter. The system also provides the labeling of vehicles based on the size of the boundary box
formed. Performance of the automatic vehicle detection system is evaluated in terms of
Accuracy, Precision, Recall, and Processing Time. The future might witness the advent of
more challenging and diverse datasets which are available publicly and has different types of
vehicles per image. Thus, the future scope shall be towards finding all the possible approaches
which can handle the real-time detection of each and every type of vehicle in challenging
environmental conditions thereby improving the performance metrics of the model.
Declarations
Informed consent All the authors are agreed for this submission.
Conflict of interest The authors declare that they have no conflict of interest in this work.
References
1. Aqel S, Hmimid A, Sabri MA, Aarab A (2017) Road traffic: Vehicle detection and classification. Intell Syst
Comput Vision (ISCV)
2. Billones RK, Bandala AA, Lim LA, Culaba AB, Vicerra RR, Sybingco E, Dadios EP (2018) Vehicle-
Pedestrian Classification with Road Context Recognition Using Convolutional Neural Networks. IEEE 10th
Int Conf Humanoid, Nanotechnol, Inform Technol, Commun Control, Environ Manag (HNICEM)
3. Charouh Z, Ghogho M, Guennoun Z (2019) Improved Background Subtraction-based Moving Vehicle
Detection by Optimizing Morphological Operations using Machine Learning. IEEE Int Symp Innov Intell
Syst Appl (INISTA)
4. Chen Z, Ellis T, Velastin SA (2011) Vehicle type categorization: A comparison of classification schemes.
14th Int IEEE Conf Intell Trans Syst (ITSC)
5. Chen L, Ye F, Ruan Y, Fan H, Chen Q (2018) An algorithm for highway vehicle detection based on
convolutional neural network. EURASIP J Image Video Process
6. Du X, Ang MH, Rus D (2017) Car detection for autonomous vehicle: LIDAR and vision fusion approach
through deep learning framework. IEEE/RSJ Int Conf Intelligent Robots and Systems (IROS)
7. Fu T, Stipancic J, Zangenehpour S, Miranda-Moreno L, Saunier N (2017) Automatic Traffic Data
Collection under Varying Lighting and Temperature Conditions in Multimodal Environments: Thermal
versus Visible Spectrum Video-Based Systems. J Advan Trans:1–15
8. Indrabayu BRY, Areni IS, Prayogi AA (2016) Vehicle detection and tracking using Gaussian Mixture
Model and Kalman Filter. Int Conf Comput Intell Cybern
9. Jabri S, Saidallah M, Alaoui AEBE, Fergougui AE (2018) Moving Vehicle Detection Using Haar-like, LBP
and a Machine Learning Adaboost Algorithm. IEEE Int Conf Image Process, Appl Syst (IPAS)
10. Juric D, Loncaric S (2014) A method for on-road night-time vehicle headlight detection and tracking. Int
Conf Connec Vehicles Expo (ICCVE)
11. Jin L, Chen M, Jiang Y, Xia H (2018) Multi-Traffic Scene Perception Based on Supervised Learning. IEEE
Access 6:4287–4296
12. Kim MS, Liu Z, Kang DJ (2016) On road vehicle detection by learning hard samples and filtering false
alarms from shadow features. J Mechanical Sci Technol 30(6):2783–2791
Multimedia Tools and Applications (2022) 81:18715–18735 18735
13. Kuang H, Chen L, Chan LLH, Cheung RCC, Yan H (2019) Feature Selection Based on Tensor
Decomposition and Object Proposal for Night-Time Multiclass Vehicle Detection. IEEE Trans Syst,
Man, Cybern: Syst 49(1):71–80
14. Kuang H, Zhang X, Li Y-J, Chan LLH, Yan H (2016) Nighttime Vehicle Detection Based on Bio-Inspired
Image Enhancement and Weighted Score-Level Feature Fusion. IEEE Trans Intell Trans Syst 18(4):927–
936
15. Kul S, Eken S, Sayar A (2017) Distributed and collaborative real-time vehicle detection and classification
over the video streams. Int J Advanc Robotic Syst 14(4):172,988,141,772,078
16. Kumar Y, Kaur K, Singh G (2020) Machine Learning Aspects and its Applications Towards Different
Research Areas. 2020 International Conference on Computation, Automation and Knowledge Management
(ICCAKM), 150–156.
17. Manana M, Tu C, Owolawi PA (2018) Preprocessed faster R-CNN for vehicle detection. 2018 Int Conf
Intell Innov Comp Appl (ICONIC)
18. Manzoor MA, Morgan Y (2017) Vehicle Make and Model classification system using bag of SIFT features.
IEEE 7th Ann Comput Comm Workshop Conf (CCWC)
19. Nguyen H (2019) Improving faster r-cnn framework for fast vehicle detection. Math Probl Eng 2019:1–11
20. Oliveira M, Santos V, Sappa AD (2015) Multimodal inverse perspective mapping. Inform Fusion 24:108–
121
21. Pawar B, Humbe VT, Kundnani L (2017) Morphology based moving vehicle detection. Int Conf Big Data
Anal Comput Intell (ICBDAC)
22. Razakarivony S, Jurie F (2016) Vehicle detection in aerial imagery: A small target detection benchmark. J
Visual Comm Image Repres 34:187–203
23. Sakhare KV, Tewari T, Vyas V (2019) Review of Vehicle Detection Systems in Advanced Driver Assistant
Systems. Arch Comp Methods Eng 27(2):591–610
24. Satzoda RK, Trivedi MM (2016) Looking at Vehicles in the Night: Detection and Dynamics of Rear Lights.
IEEE Trans Intell Trans Syst 20(12):4297–4307
25. Shaheen SA, Finson R (2013) Intelligent Transportation Systems. Reference Module in Earth Systems and
Environmental Sciences.
26. Soetedjo A, Somawirata IK (2018) Improving On-Road Vehicle Detection Performance by Combining
Detection and Tracking Techniques. 3rd Asia-Pacific Conf Intell Robot Syst (ACIRS)
27. Song H, Liang H, Li H, Dai Z, Yun X (2019) Vision-based vehicle detection and counting system using
deep learning in highway scenes. Eur Trans Res Rev 11(1)
28. Suhao L, Jinzhao L, Guoquan L, Tong B, Huiqian W, Yu P (2018) Vehicle type detection based on deep
learning in traffic scene. Procedia Comput Sci 131:564–572
29. Sun Z, Bebis G, Miller R (2006) On-road vehicle detection: A review. IEEE Trans Patt Anal Mach Intell
28(5):694–711
30. Tang Y, Zhang C, Gu R, Li P, Yang B (2015) Vehicle detection and recognition for intelligent traffic
surveillance system. Multimedia Tools Appl 76(4):5817–5832
31. Tu C, Du S (2018) A Hough Space Feature for Vehicle Detection. Advan Visual Comput Lecture Notes
Comput Sci:147–156
32. Wang X, Wang S, Cao J, Wang Y (2020) Data-driven based tiny-yolov3 method for front vehicle detection
inducing spp-net. IEEE Access 8:110,227–110,236
33. Wei Y, Tian Q, Guo J, Huang W, Cao J (2019) Multi-vehicle detection algorithm through combining Harr
and HOG features. Math Comp Simul 155:130–145
34. Yang B, Zhang Y, Cao J, Zou L (2018) On road vehicle detection using an improved faster R-CNN
framework with small-size region up-scaling strategy. Image Video Technol:241–253
35. Zhang R-H, You F, Chen F, He W-Q (2018) Vehicle Detection Method for Intelligent Vehicle at Night
Time Based on Video and Laser Information. Int J Patt Recog Artif Intell 32(04):1,850,009
36. Zhang F, Li C, Yang F (2019) Vehicle detection in urban traffic surveillance images based on convolutional
neural networks with Feature concatenation. Sensors 19(3):594
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.