Deep Learning-Based Object Detection and Classification For Autonomous Vehicles in Different Weather S
Deep Learning-Based Object Detection and Classification For Autonomous Vehicles in Different Weather S
ABSTRACT The rapid development of self-driving vehicles requires integrating a sophisticated sensing
system to address the various obstacles posed by road traffic efficiently. While several datasets are available
to support object detection in autonomous vehicles, it is crucial to carefully evaluate the suitability of these
datasets for different weather conditions across the globe. In response to this requirement, we present a novel
dataset named the Canadian Vehicle Datasets (CVD). Subsequently, we present deep learning models that
use this dataset. The CVD comprises street-level videos which were recorded by Thales, Canada. These
videos were collected with high-quality cameras mounted on a vehicle in the Canadian province of Quebec.
The recordings were made during daytime and nighttime, capturing weather conditions such as hazy, snowy,
rainy, gloomy, nighttime and sunny days. A total of 10000 images of vehicles and other road assets are
extracted from the collected videos. A total of 8388 images were annotated with corresponding generated
labels 27766 with their respective 11 different classes. We analyzed the performance of the YOLOv8 model
trained using the existing RoboFlow dataset. Then, we compared it with the model trained on the expanded
version of RoboFlow using the proposed weather-specific dataset, CVD. Final values of improved accuracy
of 73.26 %, 72.84 %, and 73.47 % (Precision/Recall/mAP) were reported upon adding the proposed dataset.
Finally, the model trained on this diverse dataset exhibits heightened robustness and proves highly beneficial
for both autonomous and conventional vehicle operations, making it applicable not only in Canada but also
in other countries with comparable weather conditions.
INDEX TERMS Autonomous vehicles, convolutional neural networks, intelligent transportation, object
detector, surveillance, YOLOv8.
I. INTRODUCTION
The implementation of recent artificial intelligence (AI)
The associate editor coordinating the review of this manuscript and applications, such as self-driving vehicles, intelligent surveil-
approving it for publication was Wei Wei . lance systems, and advanced urban infrastructures, can
2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
13648 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 12, 2024
T. Sharma et al.: DL-Based Object Detection and Classification for Autonomous Vehicles
potentially contribute to the development of sustainable smart been developed [10]. The You Look Only Once (YOLO)
cities and communities [1]. The utilization of highly accu- model, as described in [11], was developed with the primary
rate real-time road object identification algorithms can sig- objective of enhancing the efficiency of visual object classi-
nificantly enhance automated driving systems’ capabilities fication and location computations.
in effectively managing traffic flow and improving overall The convolutional network employed in this study exhibits
safety [2]. In order to effectively perceive and comprehend the ability to perceive and identify visual elements directly.
their surroundings, autonomous vehicles rely on a combina- The proposed approach involves the utilization of multi-
tion of essential sensory components. These include cameras, ple feature maps with varying resolutions to account for
which capture visual information, the Global Navigation objects of different sizes. This is achieved by aggregat-
Satellite System (GNSS) for precise positioning and naviga- ing predictions from these feature maps, enabling a more
tion, as well as range sensors such as radar or LiDAR. These comprehensive analysis. The details of this methodology
range sensors enable the vehicle to measure distances and can be found in reference [12]. The accuracy and speed of
accurately detect objects in its vicinity. By integrating these YOLO have been significantly enhanced with the introduc-
crucial technologies, autonomous cars are able to interpret tion of advanced algorithms such as YOLOv3, YOLOv5, and
their environment with a high degree of accuracy and make YOLOv8. YOLO serves the purpose of object identification,
informed decisions accordingly. The utilization of this system classification, and localization within images and videos [13].
necessitates the implementation of sophisticated perception, Problems in Object Detection in Autonomous Environ-
fusion, and planning algorithms [3]. ment such as Hue and excessive rain or snow might affect
To fully comprehend pictures, we should classify them and object detection in autonomous or typical situations [14].
estimate their concepts and object placements also referred Both driverless automobiles and human drivers encounter
as object detection [4]. Smart cities require object detection difficulties when it comes to accurately predicting traffic
in conventional traffic or autonomous vehicle environments conditions, especially when there are dynamic weather con-
[3]. It can locate accurate traffic data for picture analysis ditions like snowstorms, fog, rain, and sunny weather [15].
and traffic flow control. This information includes vehi- Accurately identifying objects, especially in road environ-
cle counts, trajectories, tracking locations, flow, classifica- ments, is a challenging process that often leads to incorrect
tion, traffic density, velocity, lane changes, and license plate determinations. Inaccuracies can have significant conse-
identification [5]. quences, especially when it comes to identifying vehicles
Multiple object detectors can also detect pedestrians, and other objects on the road. The decision-making pro-
diverse vehicle types, individuals, designated lanes, traffic cess involves using prediction-based models that have been
signals, earthworks, drainage systems, safety barriers, sig- learned previously [16].
nage, and lanes, as well as grasslands, shrubs, and trees In all these cases, drivers or autonomous cars need
[6]. Real-time object recognition and categorization from pre-alerts to change lanes, save time, and avoid risks. Other
image/video data lays the groundwork for several analytical object detection systems can forecast traffic and send drivers
characteristics, such as population or traffic volume over or autonomous cars signals or warnings [17].
time [7]. In [18], authors enhanced the YOLOv5 deep learning
Automatic driving (AD) heavily relies on Deep Learning neural network architecture to create an improved object
(DL). Deep neural networks outperform standard machine detector for drones and self-driving cars. By merging three
learning (ML) approaches in smart autonomous or self- datasets (HDrone, VisDrone, and KITTI), they outperformed
driving automobiles, smart tracking, and smart city-based previous approaches in detecting objects of varied sizes and
infrastructure [5]. achieved state-of-the-art results. Reference [19] developed a
Deep learning, a subfield of machine learning inspired by YOLOX-based network model for multi-scale item identifi-
the structure and function of the human brain, has emerged cation in complex situations. They used a CBAM-G module
as a powerful technique for addressing complex problems in the network backbone to enhance semantic information
that are challenging to model using traditional statistical with an object-contextual feature fusion module. The model
approaches [8]. Deep neural networks, such as the Convolu- outperformed alternatives in detection and had a 2.46%
tional Neural Network (CNN), have been widely employed in mAP improvement over the original model on the KITTI
computer vision to recognize and categorize various compo- dataset.
nents within images [9]. Algorithms can identify and classify The issue of foggy weather in autonomous driving is
objects such as street signs, automobiles, people, and other addressed by a novel domain adaptive object identification
items. approach, as discussed in [20]. The study’s authors employed
One of the notable advantages of CNN is its ability image- and object-level adaptation techniques and a unique
to autonomously identify significant features without the adversarial gradient reversal layer to identify and extract chal-
need for human intervention following the training process. lenging samples effectively. The results obtained in this study
Numerous CNN architectures that exhibit a remarkable bal- demonstrated the effectiveness and accuracy of the employed
ance between high accuracy and efficient processing have methodology.
Researchers examined the intrinsic fault tolerance of • This study analyzes the applicability of the Canadian
camera-based object detection (CBOD) methods [21] approach for identifying and categorizing road objects.
through various approximations. Despite the utilization of • Using transfer learning to train the model on two vehicle
lower precision arithmetic and the occasional occurrence data sets to improve object recognition accuracy.
of errors, the level of accuracy achieved was found to be • A comparison of model performance on existing and
within a margin of 1% when compared to the established mixed datasets (proposed weather-specific datasets and
baseline. Additional dimensions of error tolerance encompass existing dataset) is presented.
the utilization of LiDAR and radar-based sensors, which have The present study is structured in the following manner: the
the potential to mitigate the intricacy of hardware systems. dataset and technique are presented in Section II. Section III
The comprehensive investigation of the dataset revealed shows the pre-trained algorithm’s performance, followed
several components associated with the research criteria. The by transfer learning detection findings. Section IV presents
dataset did not include all weather conditions. To address this quantitative indicators statistical results and visualization
issue, we integrated various images from Roboflow’s open- graphs to evaluate the algorithm’s performance. The study
source annotations and custom-generated Canadian vehicle- closes with suggestions in Section V.
based annotations. The compilation of Canadian weather
images included a variety of scenes, ranging from bright II. DATASET AND METHODOLOGY
and sunny days to dreary and foggy conditions, as well as The present section provides a comprehensive overview of
rainy and snowy landscapes. The collection also featured the data sets utilized in the study, as well as an in-depth dis-
both daytime and nocturnal shots. This study used the YOLO cussion of the model training procedure. The findings derived
technique to focus on 2D object recognition using camera from the evaluation of the model are systematically presented
sensor data. YOLOv8 [22] is an updated version of the YOLO and organized into distinct subsections. The initial focus of
approach for object detection in autonomous driving. This this discussion pertains to the performance of pre-trained
approach has been further developed and expanded upon by algorithms. Next, we will outline the procedures involved in
several researchers. annotations and training the model.
RoboFlow and other datasets for training models only The testing and validation process utilizing simulated
cover generic traffic and road conditions, not changing datasets has been successfully concluded, and the algorithm’s
weather. Addressing multiple weather concerns requires performance has been thoroughly assessed through the appli-
training a model for different weather circumstances [23]. cation of diverse quantitative metrics. The initial segment
In industrialized countries like Canada, harsh winters and of this section provides an overview of the methodology
shifting weather (snowstorms and rain), as are summer and employed in this study, as well as the pre-existing vehicle
winter precipitation, are common and unexpected. The new dataset. Subsequently, a detailed description of the proposed
model is trained to recognize and categorize numerous item dataset utilized in this research is presented.
classes accurately in this challenging object identification The study was conducted in a systematic manner, ensuring
circumstance in bad weather. a logical progression from the beginning to the end.
Error-free performance requires high-quality, diverse data 1. Existing dataset RoboFlow is utilized, and a baseline
from real-world everyday settings. Autonomous driving (AD) model is trained. The model performs well on the Roboflow
data focused on temporal thinking and 360◦ vision may dataset; however, when tested for varying weather conditions
ignore variety and long-term capacities [24]. To address this in Canada using a subset of the proposed CVD dataset, the
issue, we propose a Canadian Vehicle Dataset (CVD) for performance of the RoboFlow model degrades significantly.
AD. It’s a vast, diversified multimodal picture collection This results in the need for a new model to suit the require-
from Quebec, Canada, collected over one year under different ment of autonomous vehicles in varying weather conditions
weather conditions. CVD applies to traffic sign identification, in Canada or other countries.
semantic and instance segmentation, and road categorization. 2. To address the aforementioned need, we propose using
This study used the deep learning-based YOLOv8 the CVD dataset along with the existing RoboFlow dataset
algorithm to identify and detect automobiles in vigilance and training a new model with improved robustness.
camera recordings under snowy, sunny, rainy, fog, and In this work, the model is first trained using the RoboFlow
nocturnal conditions. Our model is weather- and location- dataset utilizing the weights of YOLOv8 pre-trained on
specific. This study lays the basis for a global uniform the MSCOCO dataset. Next, transfer learning is applied to
prediction-based trained model for road item identification train YOLOv8 using a combination of RoboFlow and CVD
and categorization. It is extremely useful in typical and datasets. The additional training enhances the vehicle detec-
autonomous situations. The primary contribution of the pro- tion system accuracy. We have chosen the YOLOv8 model
posed study is based on model performance analysis assess- as it’s a lightweight model that effectively reduces around
ment results: 40% of parameters and 50% of computation compared to
• A heterogeneous dataset of 10000 images extracted from previously existing real-time object detection models, achiev-
videos captured from a vehicle-mounted camera in Que- ing improved detection accuracy and increased inference
bec, Canada, is proposed. speed [4].
First, we extracted images from the street-level recordings Self Driving Car Dataset has image dimensions of 512 ×
captured by RGB Cameras installed in Quebec by Thales 512×3, with the number of annotations 97,942 and the num-
Canada on the vehicle’s windshield. The Canadian Vehicle ber of classes is 11, including car (64399 labels), pedestrian
Dataset (CVD) comprises ten thousand images. We labeled (10806), biker (1864), traffic Light – Red (6870), traffic Light
8388 images for 11 distinct classes and then combined them – Yellow (272), traffic Light – Green (5465), Traffic Light
with the publicly available dataset RoboFlow. The study then - Red Left (1751), Traffic Light – Yellow Left (14), Traffic
entails training and evaluating a Deep Convolutional Neural Light – Green Left (310), Truck (3623), Traffic Light (2568).
Network (DCNN) model for detecting and classifying objects Preprocessing techniques such as - Auto orient, Discard EXIF
under different weather conditions. rotations, standardized pixel ordering, and Adaptive Equal-
In this study, we investigate the viability of YOLO-based ization were applied to the data. The RoboFlow Dataset did
approaches by recognizing and classifying vehicles and other not undergo any data augmentation.
road assets in real-time images using deep learning. A first-
order object identification technique called the YOLO family 2) THE PROPOSED CANADIAN VEHICLES DATASET (CVD)
of algorithms integrates a localization of numerous objects The datasets, such as RoboFlow that are accessible to the
using an anchor box. The YOLO family of algorithms has public show less diversity in lighting or weather conditions,
had eight iterations released so far. driving scenarios, and geographical coverage. Additionally,
We are motivated to choose the latest YOLO version these datasets have limited annotations in terms of both tasks
(YOLOv8) detection model due to its smaller architecture, and range.
high confidence score in their detection targets, and much These issues can lead to overly specialized solutions,
faster detection abilities than the old families of this model. which may not generalize to real-world AD systems’ full
These capabilities make the YOLOv8 algorithm a better operational design domain. Our prepared dataset, referred as
choice when compared to previous vehicle detection algo- the Canadian vehicles dataset (CVD),consists of road images
rithms. We proposed highly accurate vehicle detection in from Quebec province in Canada.
real-time with model parameters optimization. The dataset includes 10000 images, including 11 different
We have trained our models on RoboFlow and mixed classes; therefore, the proposed CVD is robust and hetero-
(RoboFlow + CVD) datasets in bad weather conditions, geneous. CVD contains images extracted from street-level
which are further adjusted to be used in congested traffic con- videos from Thales Inc. Canada.
ditions. We compared the efficiency of our trained versions The surveillance videos were recorded during the day and
with existing publicly available RoboFlow datasets. night, capturing various weather scenarios, including snow,
This study focuses on detecting and classifying vehicles rain, fog, and gloomy and sunny days. The videos were
under diverse traffic and adverse weather conditions, includ- captured using high-quality cameras mounted on a car in the
ing rain, sunlight, haze, nighttime, and snowfall. We gathered province of Quebec, Canada.
an extensive CVD in difficult weather conditions to improve To ensure consistency across all datasets, all images were
image accuracy from our local traffic patterns and employed manually annotated using labeling software and resized to
transfer learning on YOLOv8-based trained models. 512 × 512. Here are the key points of difference:
The knowledge that is already present in our local datasets • RoboFlow vehicle datasets contain images of normal
can be used in a transfer learning strategy [25]. A real-time traffic and road conditions, not environment-specific
image is the system’s input, and its output is a bounding box or varying weather-specific. Therefore, RoboFlow is a
for every object in the image, coupled with the class of each more generalized case.
object in the box. • New images collected from the Quebec, Canada street
Rapid and accurate vehicle recognition and categoriza- is more heterogeneous as this dataset is collected in
tion are needed for ITS-based applications. Small distances changing environments as well as weather situations
between vehicles on the road and interference from image such as snowy, fog, rain and sunny, and nighttime. The
frames holding vehicle images make it difficult to identify model trained on such datasets is more robust and highly
various vehicles abruptly and precisely. As a result, our useful in autonomous vehicle driving and can be applied
proposed technique offers a useful perspective on locating to countries with similar weather conditions.
automobiles in congested settings. Table 1 compares datasets used in this study (Roboflow
and proposed Canadian Vehicle datasets) based on different
criteria.
A. VEHICLE DATASET
1) ROBOFLOW VEHICLE DATASET B. ACQUISITION SYSTEM
RoboFlow introduced the self driving vehicle dataset, which Figure 1 shows the framework of the proposed system. In this
several researchers have used to generate novel techniques for framework, a vehicle was initially equipped with cameras and
road asset detection. sensors to gather real-time data for an AI-driven autonomous
We used this datasets as one of the datasets in our study vehicle in a diverse Canadian scenario, as depicted in
as they are open-sourced and widely accessible. RoboFlow Figure 1. The collected data was transmitted to the cloud or
TABLE 1. Comparison of roboflow and proposed canadian Vehicle Datasets based on different criteria.
an on-premises data center using wireless or cellular commu- In this work, we used video data (captured from a camera
nication technology. placed on a car driven on the street) collected from Canadian
This step involved collecting raw data from the various sen- streets in different weather conditions (prepared explicitly for
sors installed on the vehicle. This data was then preprocessed our study). The car traveled at an average speed of approxi-
to make it easily accessible and usable for the stakeholders. mately 40 km/h.
After completing the data preparation task, a web-based After data collection, we extracted images from videos
repository was created and implemented on Laval Univer- using Python scripts with a frame rate of 2 fps (minimum)
sity’s cloud-based server. and 10 fps (maximum). In data preprocessing, all photos were
This repository serves as a platform for authorized users to resized to 512 × 512 pixels to maintain uniformity and stored
access the dataset. The repository will be regularly updated in jpg format. No other preprocessing steps were applied
and maintained according to the requirements of cellular manually.
communication technology.
Furthermore, in this study, we exclusively used videos 1) DATA ANNOTATION
captured by RGB cameras for object detection. Data annotation plays a key role in ensuring the accurate func-
The main goal was to use a deep learning model on RGB tioning of numerous machine learning models. The afore-
videos for object detection, while driving a car. Additionally, mentioned study outlines the foundational steps necessary
we wanted to evaluate the effectiveness of the trained model for instructing a deep neural network to accurately identify
by preparing training datasets for various weather conditions. and differentiate objects among a diverse range of input
images [25].
C. DATA PREPARATION AND PREPROCESSING The process of annotating objects in images is labor-
Since object detection in varying weather conditions is the intensive in nature, and it involves significant time commit-
main challenge of our project, highly accurate and specific ment since it necessitates the initial manual evaluation of the
data collection and preparation is a very challenging and entire dataset on a screen.
time-consuming task. Therefore, the most crucial step is data Subsequently, all identified classes were annotated by
collection and preparation. including them within bounding boxes and categorized by
FIGURE 1. A vehicle, equipped with cameras and sensors to collect real-time data for AI-driven AV in a Canadian heterogeneous scenario.
assigning the appropriate label. Additionally, the image box coordinates, which were represented by four decimal
should be annotated only once for a particular scenario, numbers (xmin, ymin, xmax, ymax), identical to the PASCAL
and on receiving more than one image for the same sce- VOC format.
nario, the image should be skipped for annotation. Keep- Subsequently, the data was converted into the TFRecord
ing repetitive images may cause the overfitting of the file format, in accordance with the specifications of the Ten-
model. sorFlow Object Detection API. Example images for labeling
A total of 10000 images were extracted from the video cap- different types of objects in various weather scenarios are
tured in conditions such as sunny, light rain, snow, overcast, shown in Figure 2.
and fog. A total of 8388 images were chosen for inclusion in As we can see from statistics (Table 2), CVD data could
this study. be more balanced. Still, it is sufficient for our training and
We used Labeling software for image annotation. Anno- experimentation due to the high number of instances in each
tated 8388 images with corresponding generated 27766 labels class except the biker class, traffic light yellow and traffic
with their respective 11 different classes are presented in light yellow-left.
Table 2. We expected that combining CVD with other existing
For the annotation of images, we used Labeling software. datasets (RoboFlow in our case) could help in represent-
The XML format was used to hold class labels and bounding ing these classes better while training the requisite neural
TABLE 2. Selected class types and instances per class of total Canadian
Vehicle dataset (CVD).
The training runs for 300 epochs with a batch size of 64.
The default hyperparameters for the training include the ini-
tial learning rate = 0.01, momentum= 0.937 (SGD momen-
tum/Adam beta1), and weight decay: 0.0005. A mixed model-
ing case includes mixing the local data of some other country,
generally the target country (Canada in our case, and the
dataset is Canadian Vehicle dataset), with openly accessible
data to train the models, i.e., (CVD+RoboFlow).
network model. Furthermore, it has the potential to generate Further, the YOLOv8 model performance was tested
a balanced representation of classes and training highly effec- using these two databases [RoboFlow ∈ 29800 images,
tive models. RoboFlow + CVD ∈ 38215] in different weather scenar-
ios. For both cases, training is performed on 90% of the
III. EXPERIMENTAL SETUP dataset and testing on 10 %. Therefore, for the RoboFlow
The presented study includes the training and evaluating the case (Training=26820, Validation=2980) and for mixed case
YOLOv8 network model, considering diverse weather condi- (RoboFlow plus CVD) (Training=34394, Validation=3821),
tions by using distinct combinations of test and train datasets. images were used.
The development of an optimized deep convolutional neural Training of the algorithm across these two different vehicle
network using RGB image data for the automatic classifica- data sets using transfer learning is analyzed and we expected
tion of selected class types (Table 2) to identify different types an improvement in the performance of objects detection in
of objects while driving a car is presented in this section. different weather scenarios.
We utilized pure modeling and mixed modeling approaches The proposed experiment are in line with the previous
in model training. In pure modeling, a model is first trained experiments performed rigorously in the field of automatic
and tested only on a dataset captured from the same country. road inspection by [26], [27], [28], when dataset available
The model underwent training and testing on two distinct from some countries is extended by small amount of dataset
datasets, namely RoboFlow consisting of 29800 images and from other countries to suit the target domains better. Python
RoboFlow + CVD comprising 38215 images, in order to programming language was used with the following PC spec-
determine the suitability of integrating two different image ifications:
database. • CPU: Intel Core i9-12900F
Figure 3 demonstrates the proposed deep neural • GPU: NVIDIA® GeForce RTX™ 4090, 24 GB
network-based vehicle detection system. In training, we first GDDR6X
train a YOLOv8 model on publicly available RoboFlow • RAM: 64 GB DDR5, 4800 MHz
dataset. The training is initialized from pre-trained COCO • Storage: 2TB NVMe SSD
weights. • Operating System: Windows 11 Home
FIGURE 3. Proposed YOLOv8 based deep learning model architecture to detect objects in self-driving/autonomous vehicles.
In addition, we used Google Colab with a Tesla T4 GPU, negative (FN) are the results in which the model makes
which has a total memory capacity of 15109MB, for training inaccurate predictions for the positive and negative classes,
our models. respectively. Further, these parameters are calculated using
The PyTorch deep learning framework is employed for the Eqs. (5.1) - (5.3). The representations of true positive, true
execution of the model algorithm. The model is constructed negative, false positive, and false negative are illustrated
using SGD as the optimization function. below:
During the trial, we employed the original data augmenta- 1) True Positive (TP): occurs when a class is accurately
tion technique of the YOLOv8 algorithm. The effectiveness identified in the ground truth, and both the label and
of the optimized model for classifying objects in various the bounding box of the instance are correctly predicted
weather conditions was evaluated using RGB movies that with an Intersection over Union (IoU) >0.5.
were gathered (Figure 1). Ultimately, we performed a thor- 2) False Positive (FP): occurs when the model makes a
ough evaluation of the effectiveness of the YOLOv8 model prediction of a class at a certain position inside an
on both RoboFlow and mixed case datasets. image, but the instance of that class is not actually
Figure 3 showcases the YOLOv8 trained model, which present in the ground truth for the image. This also
utilizes a transfer learning approach and optimized hyper- includes the case in which the predicted label doesn’t
parameters to provide automatic object recognition and match with the actual label.
categorization. 3) False Negative (FN): refers to a situation when a certain
Table 2 displays the composition of each database, which class is actually present in the ground truth, but the
includes samples from 11 distinct classes. Furthermore, model fails to accurately forecast either the right label
the hyperparameters (learning rate, epoch, mini-batch size, or the bounding box of the instance.
and momentum) were fine-tuned to enhance performance. Recall measures the proportion of accurately predicted
The efficacy of the created models was assessed by com- features relative to the total number of features in the true
paring the performance of the YOLOv8 on two separate class, encompassing both true positives and false negatives.
benchmarks. (TP)
Recall = (5.1)
(TP + FN )
IV. RESULTS
A. EVALUATION PARAMETERS Precision is a metric that quantifies the proportion of accu-
rately predicted features, namely the true positives, relative to
The model performance was measured using a number of
the overall number of predicted features, which includes both
accuracy measure indices such as recall, precision, F1-score,
true positives and false positives.
class loss, and mean average precision (mAP) [29]. The
evaluation of the classification model relies on these param-
eters. All the indices frequently depend on the parameters of (TP)
Precision = (5.2)
the confusion matrix, which encompass true positive (TP), (TP + FP)
true negative (TN), false positive (FP), and false negative Precision and recall are inversely related, meaning that
(FN) [30]. On the contrary, false positive (FP) and false an increase in one measure typically leads to a decrease in
VOLUME 12, 2024 13655
T. Sharma et al.: DL-Based Object Detection and Classification for Autonomous Vehicles
FIGURE 5. Plots of the Precision, Recall, mAP (0.5) and class loss with 300 training epochs for (a) Roboflow dataset (b) Combined (CVD and RoboFlow)
datasets.
TABLE 3. Performance of the model YOLOv8 on in case of RoboFlow dataset and combined (CVD+RoboFlow).
is limited. Still, the improvement reported in other classes is detecting trucks (116 instances in CVD) in different weathers.
substantial. Further, although the precision for detecting and classifying
For example, CVD enhanced the prediction of cars (2212 pedestrians (333 instances in CVD) decreased by 2.2%, the
instances in CVD) in varying weather situations with an recall was improved by 49.1%.
improvement of 28% precision and 17.5% recall. Likewise, Similarly, for traffic lights (various sub-classes), mostly
a gain of 38.5% precision and 57.6% recall was reported in the detection has been improved substantially by adding
FIGURE 6. Results of DCNN model trained on RoboFlow and combined RoboFlow and CVD datasets in different weather scenarios (snowy,
night, rainy and sunny, daytime, gloomy, hazy).
CVD dataset for training the model, except for trafficLight- training is tailored to different weather scenarios, and the
Yellow and trafficLight-YellowLeft, where the impact may inclusion of a specific classes for three different colors of
not be considered due to a significantly less number of traffic light enhances its overall performance. This strate-
instances. gic approach involves incorporating three distinct classes
For the classes trafficLight and trafficLight-GreenLeft, for traffic light colors, a crucial adaptation for navigat-
we conducted a visual analysis to confirm the inability of ing varying weather conditions during autonomous vehicle
the model trained using only RoboFlow data to detect these operation.
classes. Autonomous vehicle controllers in urban areas face a The robustness of our trained model is evident, particularly
significant challenge in perceiving traffic lights. Urban driv- when combining CVD data with the Robolow dataset. This
ing introduces intricate scenarios with complex interactions resilience is consistent with findings from previous studies
involving traffic controls, vehicles, pedestrians, and more. that focused on traffic light detection in urban settings for
The difficulty is heightened when it comes to traffic lights, autonomous vehicles, reinforcing the effectiveness of our
posing a formidable computer vision challenge due to varying approach ([33], [34], and [35]).
lighting, view distances, and weather conditions. The visual analysis presented in section IV-C confirmed
Our model addresses this challenge by detecting traf- that the RoboFlow model could not detect any instances of
fic lights, distinguishing between red, yellow, and green these classes in the weather considered for capturing the data
states in input raw images at each timestep. The model’s from Canadian roads. The addition of CVD to the train set
results in improvement of the detection ability of the model superior accuracy in comparison to RoboFlow when tested on
for these classes, as reported in Table 3. combined Roboflow and CVD.
In summary, the detection system has seen significant
improvements in precision and recall for various object V. CONCLUSION
classes, particularly for cars and ‘‘traffic-light’’ and their This study presents a comprehensive dataset comprising
subclasses. A similar improvement was reported in mAP 8388 annotated images encompassing diverse vehicles. In
values. However, certain classes with a smaller number of total, there are 27766 labels distributed among 11 distinct
instances available might require attention to enhance the classes. The dataset was collected in various meteorological
system’s overall performance [36], [37]. conditions within the province of Quebec, located in Canada.
The present study has successfully showcased the application
of deep neural networks for road item detection in the specific
C. VISUAL ANALYSIS domain of smart cities and communities.
The observation reveals that the RoboFlow model exhibits The evaluation of deep learning-based object detection
limitations in detecting specific objects, whereas the imple- and classification models encompasses various weather con-
mentation of a mixed modeling approach demonstrates ditions, thereby assessing their performance across diverse
improved accuracy in detecting said objects. environmental contexts. Transfer learning, in combination
The visualization results of models trained on RoboFlow with the YOLOv8 algorithm, was employed in this study
and integrated with CVD datasets in various weather condi- to address the task of detecting road objects in challenging
tions are displayed in Figure 6. weather conditions.
After the training phase is over, the algorithm is tested The experiments illustrate how combining datasets con-
again using the same test images discussed earlier. The taining normal and varying weather scenarios can lead to
experimental results indicate a significant improvement in developing an efficient road object detection model tailored
the detection outcomes following extra training and transfer to a specific country. The experimental results showed that
learning. the YOLOv8 algorithm achieved an overall accuracy of 91%
The results of the visualization demonstrate that the mixed for car identification, 80.7% for pedestrian identification,
model exhibits a high level of efficacy in the detection and and 86.9% for traffic light-green identification, with a mean
classification of vehicles and other objects across various average precision (mAP) of 0.5.
weather conditions, in contrast to the RoboFlow case. The research presented here has potential applications in
The model successfully detects road objects in different detecting autonomous vehicles under different weather con-
weather conditions (snowy, night, rainy, and sunny) at various ditions in the future. In addition, the proposed generalized
locations (close or distant), as depicted in the sample images hybrid model can detect and classify vehicles in other coun-
of Figure 6. tries with similar weather conditions.
Results of the model trained on Roboflow and model This study establishes the foundation for developing a
trained on mixed datasets in snowy as well as snowstorm universally applicable and standardized predictive model to
with gloomy conditions are shown in figures 6a, 6b and 6h. effectively identify and categorize road objects. The find-
It is clear that traffic light, traffic lightRed, and traffic light- ings of this study have significant implications for variou
Green and other objects such as cars and pedestrian are contexts, including both regular scenarios and autonomous
effectively detected by the proposed mixed model. In another environments.
case these objects are not detected in snowstorm with gloomy Overall, the study underscores the substantial improve-
conditions. ment in model performance when trained on mixed datasets,
In Fig. 6c and 6j, the trained algorithm is tested for many encompassing diverse day and nighttime scenarios and vari-
objects in night conditions, and the mixed model effectively able weather conditions in Quebec, Canada, as compared
detects all objects with high confidence scores compared to to traditional datasets. However, the summary still lacks
algorithm tested on RoboFlow. an explicit analysis of whether the improved model perfor-
It is also clear from figure 6f (rainy) and 6i (gloomy) mance meets the needs outlined by driving regulations for
conditions that the mixed model effectively detects the road autonomous vehicles.
objects in these weather scenarios while model trained on It’s noteworthy that the current regulatory landscape may
Roboflow can detect only few objects. Similarly, the mixed not explicitly define the requirements for autonomous vehi-
model efficiently detected other objects in snowy and gloomy cles. Despite this, the study lays the foundational steps
conditions (Figure 6b) for developing a comprehensive pipeline of trustworthy AI
In figures 6e and 6h, it becomes apparent that the trained tailored for autonomous vehicles, indicating a promising tra-
model exhibits a notable degree of precision in its ability to jectory in addressing future regulatory considerations.
identify and classify objects under sunny conditions. In all
the findings it is observed that a reduced number of road A. FUTURE WORK AND RECOMMENDATIONS
objects are detected when the algorithm is exclusively evalu- The present study aims to explore the application of
ated using the RoboFlow dataset. The detecting algorithm has pre-existing data and models in developing vehicle object
detection and classification models adaptable to countries [6] P. Rajaji and S. Rahul, ‘‘Detection of lane and speed breaker warning
with diverse weather conditions. The data utilized for this system for autonomous vehicles using machine learning algorithm,’’ in
Proc. 3rd Int. Conf. Intell. Comput. Instrum. Control Technol. (ICICICT),
research was obtained in Canada. In subsequent iterations, 2022, pp. 401–406.
there is potential for further development of the aforemen- [7] A. Chehri and P. Fortier, ‘‘Wireless positioning and tracking for
tioned prototype to establish a singular standardized model Internet of Things in heavy snow regions,’’ in Proc. Hum. Centred
Intell. Syst. Conf. (KES-HCIS), Cham, Switzerland: Springer, 2021,
that can be universally implemented or, at the very least, pp. 395–404.
applied to a cohort of countries sharing similar weather [8] M. Hassaballah and A. I. Awad, Deep Learning in Computer Vision:
conditions. Principles and Applications, Boca Raton, FL, USA: CRC Press, ISBN
135100381X, 2020.
Furthermore, it is important to note that this study holds [9] S. Rani, D. Ghai, and S. Kumar, ‘‘Object detection and recognition using
significant value as a fundamental reference point. Its find- contour based edge detection and fast R-CNN,’’ Multimedia Tools Appl.,
ings can facilitate the replication of experiments by obtaining vol. 81, pp. 42183–42207, Dec. 2022.
[10] A. Vennelakanti, S. Shreya, R. Rajendran, D. Sarkar, D. Muddegowda,
supplementary images from a wide range of countries and and P. Hanagal, ‘‘Traffic sign detection and recognition using a CNN
diverse seasonal conditions. ensemble,’’ in Proc. IEEE Int. Conf. Consum. Electron. (ICCE), 2019,
This approach aims to improve the depiction of individual pp. 1–4.
[11] M. Haris and A. Glowacz, ‘‘Road object detection: A comparative study
classes and strengthen the overall resilience of the detec- of deep learning-based algorithms,’’ Electronics, vol. 10, no. 16, p. 1932,
tion system across all categories of items. One potential Aug. 2021.
avenue for augmenting coverage and expediting response [12] S. Shalu, S. Rathee, A. Yadav, P. Kherwa, and R. Gandhi, ‘‘An intelligent
lane and obstacle detection using YOLO algorithm,’’ Int. J. Intell. Syst.
time involves the integration of a vehicle detection system Appl., vol. 12, no. 3s, pp. 641–648, Nov. 2023.
on mobile devices, alongside the deployment of car vehicle [13] B. T. Nugraha and S.-F. Su, ‘‘Towards self-driving car using convolutional
recorders on various municipal-operated vehicles, encom- neural network and road lane detector,’’ in Proc. 2nd Int. Conf. Automat.
passing a range of transportation modes such as conventional Cogn. Sci. Opt. Micro Electro-Mechanical Syst. Inf. Technol. (ICACOMIT),
2017, pp. 65–69.
automobiles, public transit vehicles, and waste management [14] V. Arthi, R. Murugeswari, and P. Nagaraj, ‘‘Object detection of
trucks, among others. autonomous vehicles under adverse weather conditions,’’ in Proc. Int.
In future research endeavors, it is recommended to under- Conf. Data Sci. Agents Artif. Intell. (ICDSAAI), vol. 1, 2022 pp. 1–8.
[15] T. Sharma, B. Debaque, N. Duclos, A. Chehri, B. Kinder, and P. Fortier,
take a thorough assessment of the accuracy of the optimized ‘‘Deep learning-based object detection and scene perception under bad
model through a comparative analysis with contemporary weather conditions,’’ Electronics, vol. 11, P. 563, 2022.
deep learning models that are considered to be at the forefront [16] F. Leon and M. Gavrilescu, ‘‘A review of tracking, prediction and decision
making methods for autonomous driving,’’ 2019, arXiv:1909.07707.
of the field. Implementing this approach would allow us [17] T. Sharma, A. Chehri, and P. Fortier, ‘‘Communication trends, research
to determine the most attainable degree of precision. The challenges in autonomous driving and different paradigms of object detec-
model that has been presented exhibits the potential for tion,’’ in Proc. Int. KES Conf. Hum. Centred Intell. Syst., Cham, Switzer-
land: Springer, 2023, pp. 57–66.
expansion in order to accommodate the distinctive weather [18] Y. Chen, W. Zheng, Y. Zhao, T. H. Song, and H. Shin, ‘‘Dw-YOLO: An
conditions observed in developing and less developed efficient object detector for drones and self-driving vehicles,’’ Arab. J. Sci.
nations. Eng., vol. 48, pp. 1427–1436, 2023.
[19] S. Wu, Y. Yan, and W. Wang, ‘‘CF-YOLOX: An autonomous driving detec-
tion model for multi-scale object detection,’’ Sensors, vol. 23, p. 3794,
ACKNOWLEDGMENT 2023.
[20] J. Li, R. Xu, J. Ma, Q. Zou, J. Ma, and H. Yu, ‘‘Domain adaptive
Thales, Canada, provided the road videos to conduct this object detection for autonomous driving under foggy weather,’’ in Proc.
research. The pictures in the Data Acquisition Section are IEEE/CVF Winter Conf. Appl. Comput. Vis., 2023, pp. 612–622.
[21] M. Caro, H. Tabani, J. Abella, F. Moll, E. Morancho, R. Canal, J. Altet,
courtesy of the Laval University LSVN Laboratory. The A. Calomarde, F. J. Cazorla, and A. Rubio, ‘‘An automotive case study on
authors are thankful for their kind support. the limits of approximation for object detection,’’ J. Syst. Archit., vol. 138,
May 2023, Art. no. 102872.
[22] J. Terven and D. Cordova-Esparza, ‘‘A comprehensive review of YOLO:
REFERENCES From YOLOV1 to YOLOV8 and beyond, 2023, arXiv:2304.00501.
[23] A. Farid, F. Hussain, K. Khan, M. Shahzad, U. Khan, and Z. Mah-
[1] A. R. Javed, F. Shahzad, S. U. Rehman, Y. B. Zikria, I. Razzak, Z. Jalil, mood, ‘‘A fast and accurate real-time vehicle detection method using deep
and G. Xu, ‘‘Future smart cities: Requirements, emerging technologies, learning for unconstrained environments,’’ Appl. Sci., vol. 13, p. 3059,
applications, challenges, and future aspects,’’ Cities, vol. 129, Oct. 2022, 2023.
Art. no. 103794. [24] H. Yin and C. Berger, ‘‘When to use what data set for your self-
[2] X. Tang, Z. Zhang, and Y. Qin, ‘‘On-road object detection and tracking driving car algorithm: An overview of publicly available driving datasets,’’
based on radar and vision fusion: A review,’’ IEEE Intell. Transp. Syst. in Proc. IEEE 20th Int. Conf. Intell. Transp. Syst. (ITSC), 2017,
Mag., vol. 14, no. 5, pp. 103–128, Sep. 2022. pp. 1–8.
[25] X. Wu, D. Sahoo, and S. C. H. Hoi, ‘‘Recent advances in deep learning for
[3] I. Ahmed, G. Jeon, A. Chehri, and M. M. Hassan, ‘‘Adapting Gaus-
object detection,’’ Neurocomputing, vol. 396, pp. 39–64, 2020.
sian YOLOv3 with transfer learning for overhead view human detection
[26] D. Arya, H. Maeda, S. K. Ghosh, D. Toshniwal, H. Omata, T. Kashiyama,
in smart cities and societies,’’ Sustain. Cities Soc., vol. 70, Jul. 2021,
and Y. Sekimoto, ‘‘Global road damage detection: State-of-the-art solu-
Art. no. 102908.
tions,’’ in Proc. IEEE Int. Conf. Big Data, 2020, pp. 5533–5539.
[4] D. Feng, A. Harakeh, S. L. Waslander, and K. Dietmayer, ‘‘A review [27] D. Arya, H. Maeda, S. K. Ghosh, D. Toshniwal, A. A. Mraz, T. Kashiyama,
and comparative study on probabilistic object detection in autonomous and Y. Sekimoto, ‘‘Deep learning-based road damage detection and
driving,’’ IEEE Trans. Intell. Transp. Syst., vol. 23, no. 8, pp. 9961–9980, classification for multiple countries,’’ Autom. Constr., vol. 132, 2021,
Aug. 2022. Art. no. 103935, doi: https://doi.org/10.1016/j.autcon.2021.103935.
[5] E. Akleman, ‘‘Deep learning,’’ Computer, vol. 53, no. 9, p. 17, Sep. 2020, [28] D. Arya, H. Maeda, S. K. Ghosh, D. Toshniwal, H. Omata, H.
doi: 10.1109/MC.2020.3004171. Kashiyama, T. Sekimoto, and Y. Chen, ‘‘Crowdsensing-based road dam-
age detection challenge,’’ in Proc. IEEE Int. Conf. Big Data, Jan. 2022, ABDELLAH CHEHRI (Senior Member, IEEE)
pp. 6378–6386. received the master’s degree from University
[29] W. Farag, ‘‘Multiple road-objects detection and tracking for autonomous Nice-Sophia Antipolis-Eurecom, France, and the
driving,’’ J. Eng. Res., vol. 10, no. 1A, pp. 237–262, 2022. Ph.D. degree from Laval University, QC, Canada.
[30] R. Padilla, S. L. Netto, and E. A. B. Da Silva, ‘‘A survey on performance He is currently an Associate Professor with the
metrics for object-detection algorithms,’’ in Proc. Int. Conf. Syst. Signals Department of Mathematics and Computer Sci-
Image Process. (IWSSIP), 2020, pp. 237–242. ence, Royal Military College of Canada (RMC),
[31] S. Wu, J. Yang, X. Wang, and X. Li, ‘‘Iou-balanced loss functions Kingston, ON, Canada. He is the coauthor of more
for single-stage object detection,’’ Pattern Recognit. Lett., vol. 156,
than 250 peer-reviewed publications in established
pp. 96–103, Apr. 2022.
journals and conference proceedings sponsored by
[32] J. E. Hoffmann, H. G. Tosso, M. M. D. Santos, J. F. Justo, A. W. Malik,
established publishers, such as IEEE, ACM, Elsevier, and Springer. He is
and A. U. Rahman, ‘‘Real-time adaptive object detection and tracking for
autonomous vehicles,’’ IEEE Trans. Intell. Veh., vol. 6, no. 1, pp. 450–459, a member of the IEEE Communication Society, the IEEE Vehicular Tech-
Nov. 2020. nology Society (VTS), and the IEEE Photonics Society. He has served on
[33] Z. Ouyang, J. Niu, Y. Liu, and M. Guizani, ‘‘Deep CNN-based real-time roughly 30 conferences and workshop program committees. In addition,
traffic light detector for self-driving vehicles,’’ IEEE Trans. Mob. Comput., he served as the guest/associate editor for several well-reputed journals.
vol. 19, pp. 300–313, Jan. 2020, doi: 10.1109/TMC.2019.2892451.
[34] L. C. Possatti, R. Guidolini, V. B. Cardoso, R. F. Berriel, T. M. Paixão, C.
Badue, A. F. De Souza, and T. Oliveira-Santos, ‘‘Traffic light recognition
using deep learning and prior maps for autonomous cars,’’ in Proc. Int.
Joint Conf. Neural Netw. (IJCNN),2019, pp. 1–8.
[35] K. Wang, Y. Wang, B. Liu, and J. Chen, ‘‘Quantification of uncertainty
and its applications to complex domain for autonomous vehicles percep-
tion system,’’ IEEE Trans. Instrum. Meas., vol. 72, pp. 1–17, 2023, doi:
10.1109/TIM.2023.3256459.
[36] X. Wang, K. Li, and A. Chehri, ‘‘Multi-sensor fusion technology for 3D
object detection in autonomous driving: A review,’’ IEEE Trans. Intell.
ISSOUF FOFANA (Senior Member, IEEE)
Transp. Syst., Sep. 2023, doi: 10.1109/TITS.2023.3317372.
received the degree in electro-mechanical engi-
[37] I. Ahmed, G. Jeon, and A. Chehri, ‘‘A smart IoT enabled end-
to-end 3D object detection system for autonomous vehicles,’’ IEEE
neering from The University of Abidjan, Côte
Trans. Intell. Transp. Syst., vol. 24, no. 11, pp. 1–10, Nov. 2022, doi: d’Ivoire, in 1991, and the master’s and Ph.D.
10.1109/TITS.2022.3210490. degrees from École Centrale de Lyon, France, in
1993 and 1996, respectively. He was a Postdoc-
toral Researcher in Lyon, in 1997. He was with
the Schering Institute of High Voltage Engineering
Techniques, University of Hannover, Germany,
from 1998 to 2000. He was a fellow of the Alexan-
der von Humboldt Stiftung, from November 1997 to August 1999. He
joined Université du Qu’ebec à Chicoutimi (UQAC), QC, Canada, as an
Associate Researcher, in 2000, where he is currently a Professor. He also
holds the position of the Canada Research Chair of Insulating Liquids and
Mixed Dielectrics for Electrotechnology (ISOLIME). He is also with the
Research Chair on the Aging of Power Network Infrastructure (ViAHT) and
the Director of the MODELE Laboratory and the International Research
Centre on Atmospheric Icing and Power Network Engineering (CenGivre),
UQAC. He has authored or coauthored over 280 scientific publications, two
book chapters, and one textbook. He has edited two books and holds three
patents. He is an accredited Professional Engineer in the Province of Quebec
and a fellow of IET. He is currently a member of the DEIS AdCom and
the international scientific committees of some IEEE DEIS-sponsored or
technically sponsored conferences (ICDL, CEIDP, ICHVE, and CATCON).
He is a member of the ASTM D27 Committee.