Object Detection For Self Driving Car in Complex Traffic Scenarios
Object Detection For Self Driving Car in Complex Traffic Scenarios
1051/matecconf/202439304002
STAAAR-2023
1 Introduction
The incorporation of artificial intelligence (AI) technology is causing the automotive sector
to evolve quickly [1,2]. Through a yearly average growth rate of 36.15%, worldwide demand
for AI in cars is predicted to increase significantly, with a projected USD 6.6 billion in sales
of AI technology for cars by 2025 [3,4]. Detecting objects is a very important part of self-
driving cars to locate the object as well as to classify the object in front of it and its
surroundings so that based on it cars can decide to drive efficiently. In recent years there has
been huge growth in computer vision for that reason object detection for autonomous cars
has also improved. Yolo models showed great performance compared to other detection
models [5] As we all know, speed is crucial for real-time applications. Yolo models, being
single-stage detection systems, outperformed double-stage detection systems like R-CNN
and Faster R-CNN in terms of locating objects and classification speed [6]. Many kinds of
research happened on object detection for self-driving cars but most of them were done on
the dataset which does not consist of that much complex data. It is extremely difficult to
assure safety and dependability in extreme corner instances for self-driving cars using the
© The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons
Attribution License 4.0 (https://creativecommons.org/licenses/by/4.0/).
MATEC Web of Conferences 393, 04002 (2024) https://doi.org/10.1051/matecconf/202439304002
STAAAR-2023
IDD dataset [7], which contains very complex and diverse road scene scenarios. The dataset
used here consists of complex traffic scenarios of roads in Hyderabad, Bangalore cities, and
their outskirts in India. The objects in the dataset have used car, person, bicycle, bus, traffic
sign, traffic light, truck, autorickshaw, animal, motorcycle, and rider. By training the most
recent Yolo algorithms on an unconstrained complicated environment dataset and comparing
their accuracy. This paper makes significant advances in the field of object localization and
classification for self-driving cars.
Object localization and classification are very important parts of vision-based self-driving
cars. In recent years various deep learning methods come up with higher accuracy and speed
to detect objects in traffic road scenarios. Here we discussed some of the recent papers that
focused on object detection for self-driving cars.
The authors of [8] suggested an improved Yolov4 model to identify ten different types of
objects and an improved model to forecast whether or not people would cross the street. For
object detection, they have used the BDD100k dataset and for pose estimation of pedestrians,
they have collected and validated data by themselves. By optimizing they have reduced the
complexity of the model of yolov4 to achieve faster speed and accuracy.
In [9] they have improved the capability of the YOLOv5 object detector for
detecting tiny objects, specifically in the context of autonomous racing. The researchers
propose architectural modifications to the YOLOv5 model, including changes to the
backbone, neck, and other elements. The authors suggested further research and testing with
different datasets to validate and refine the proposed techniques. Overall, the research
advances the development of autonomous vehicle vision systems by offering insights into
enhancing tiny object recognition in the YOLOv5 model.
A novel hybrid model for object detection and tracking in autonomous cars is
presented in this paper [10]. It combines a tracking model based on Kalman filters with a
two-stage object detector based on Faster R-CNN. In-depth tests on the KITTI dataset
demonstrated faster and more accurate results than previous approaches, which is essential
for real-time object recognition and vehicle safety. Extensive experiments on the KITTI
dataset showcased superior accuracy and speed compared to existing methods, crucial for
real-time object detection and vehicle safety. It's important to remember, though, that the
intricacy of the design can need a large amount of processing power.
The authors of [11] emphasized the importance of exclusively testing self-
driving cars on Indian roads, which require a specialized dataset due to the unique and
unstructured complexities presented in India. Consequently, they developed this dataset.
Within this dataset, the authors employed Faster R-CNN for object detection and conducted
an evaluation. Various metrics, such as classifier accuracy, loss RPN classifier, loss RPN
regression, loss detector classifier, and loss detector regression, were used to evaluate the
model's performance after it had been trained on 1200 images.
In this study enhancement of object detection based on the yolov8 model for self-driving cars
has been proposed under the complex traffic environment. Here also comparison with other
YOLO models has been done. Fig. 1 shows the architecture of YOLOv8.
2
MATEC Web of Conferences 393, 04002 (2024) https://doi.org/10.1051/matecconf/202439304002
STAAAR-2023
2 Methodology
The below subsections discussed the data collection and preprocessing as well as the
strategies for training, validating, and testing are examined alongside their corresponding
assessment standards.
3
MATEC Web of Conferences 393, 04002 (2024) https://doi.org/10.1051/matecconf/202439304002
STAAAR-2023
All the training and evaluation were done with the Python programming language. The deep
learning-based YOLOv8 model was used for training on the unstructured complex traffic
dataset through transfer learning. The YOLOv8 model works faster and more accurately than
existing YOLO models for detecting objects, and this study assisted in the detection of
objects in an extensive variety of traffic circumstances, such as on Indian roads. This work
uses the YOLOv8 model given by Ultralytics [13] for training, saving models, and validation
and testing. There are a total of 6 versions of the YOLOv8 model presented those base models
that could be used like yolov8n.pt for nano object, yolov8s.pt used for small object detection,
yolov8m.pt, yolov8l.pt, and yolov8x.pt for medium, large, and extra-large object detection
purposes. Those base models were used for training, validation, and testing, around 6510 data
was used for training, 1860 data was used for validation, and 630 data for testing. Fig. 3
shows the workflow of this paper.
Determining the extent to which bounding boxes overlap about ground truth data indicates
successful recognition is a crucial step in evaluating object detection algorithms. IOUs are
utilized for this, and mAP0.5 is the accuracy when IOU=50, meaning that the detection is
deemed successful if there is more than 50% overlap. The bounding box must be detected
with greater accuracy. All the evaluations were done based on mAP0.5, mAP0.5-0.95,
Precision-Recall curve, and other Object Detection Metrics. The formulas for IOU, Recall,
and Precision are presented below.
4
MATEC Web of Conferences 393, 04002 (2024) https://doi.org/10.1051/matecconf/202439304002
STAAAR-2023
3 Experimental Results
In this section, the evaluation of various object localization and classification models is
presented, which are used for detecting complex traffic scenarios. These object detection
models were trained, validated, and tested on a complex dataset of 9700 labeled images.
The dataset contains various types of labeled images, such as person, traffic lights,
motorcycles, bicycles, traffic signs, animals, riders, buses, cars, trucks. The mean average
precision of different models is presented in Table 1. Furthermore, Fig. 4 displays the
precision-recall curves of trained models. YOLOv8x (1280x1280), YOLOv8x, YOLOv8l,
and YOLOv8m trained models are displayed in (a), (b), (c), and (d). The balance between
recall and accuracy for various thresholds is displayed on the precision-recall curve. High
accuracy is correlated with a low false positive rate, while high recall is correlated with a low
false negative rate. An area under the curve with a high value indicates high recall and
precision. When comparing the Precision-Recall curve of the YOLOv8x (1280x1280) trained
model to other models, it has shown good results. Fig. 5 shows the validation results of
trained models. YOLOv8x (1280x1280), and YOLOv8x (640x640), trained models are
displayed in (a) and (b). Fig.5.a) The YOLOv8x(1280x1280) trained model has shown good
accuracy. Fig. 6 displays the sample input images (1280x1280) used for testing. Additionally,
Fig. 7 displays the objects that the Yolov8x model detected. Consequently, the YOLOv8x
model demonstrates suboptimal performance in these underrepresented classes, despite
delivering strong results for other categories. When the input image resolution was increased
to 1280x1280, the YOLOv8x model performed approximately 14% better than 640x640
resolution images as seen in Table 1.
5
MATEC Web of Conferences 393, 04002 (2024) https://doi.org/10.1051/matecconf/202439304002
STAAAR-2023
(a) (b)
(c) (d)
Fig. 4. Precision-Recall curve of different Yolov8’s models after validation. a) Yolov8x (Image size
1280x1280) b) Yolov8x(Image size 640x640) c) Yolov8l(Image size 640x640) d) YOLOv8m( Image
size 640x640).
(a) (b)
6
MATEC Web of Conferences 393, 04002 (2024) https://doi.org/10.1051/matecconf/202439304002
STAAAR-2023
4 Conclusion
Object detection is a critical component of self-driving cars, but it can be difficult in a
complex, varied traffic environment. This paper suggests using the YOLOv8 model to
improve object detection in unstructured complex traffic scenarios. YOLOv8x (1280x1280)
trained model demonstrated superior performance over other models. By training the model
on a larger dataset with nearly identical and higher instances of each class, model
performance can be improved. despite delivering strong results for other categories. When
the input image resolution was increased to 1280x1280, the YOLOv8x model performed
approximately 14% better than 640x640 resolution images as seen in Table 1.
References
1. H. Mankodiya, D. Jadav, R. Gupta, S. Tanwar, W.C. Hong, R. Sharma, "OD-XAI:
Explainable AI-Based Semantic Object Detection for Autonomous Vehicles," Appl.
Sci. 12, 5310 (2022).
2. S.A. Khan, H. Lim, "Novel Fuzzy Logic Scheme for Push-Based Critical Data
Broadcast Mitigation in VNDN," Sensors 22, 8078 (2022).
3. A. N. Bhavana, M.M. Kodabagi, "Exploring the Current State of Road Lane Detection:
A Comprehensive Survey," Int. J. Hum.-Comput. Interact. 2, 40-46 (2023).
4. S.A. Khan, H. Lim, "Push-Based Forwarding Scheme Using Fuzzy Logic to Mitigate
the Broadcasting Storm Effect in VNDN," in Proceedings of the Artificial Intelligence
and Mobile Services–AIMS 2022: 11th International Conference, Held as Part of the
Services Conference Federation, SCF , Honolulu, HI, USA, December 10–14, (2022),
pp. 3–17. Springer, Berlin/Heidelberg, Germany.
7
MATEC Web of Conferences 393, 04002 (2024) https://doi.org/10.1051/matecconf/202439304002
STAAAR-2023
5. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified,
Real-Time Object Detection," in 2016 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR, 2016), pp. 779-788.
6. S.A. Khan, H.J. Lee, H. Lim, "Enhancing Object Detection in Self-Driving Cars Using
a Hybrid Approach," Electronics 12, 2768 (2023).
7. G. Varma, A. Subramanian, A. Namboodiri, M. Chandraker, and C.V. Jawahar, "IDD:
A Dataset for Exploring Problems of Autonomous Navigation in Unconstrained
Environments," in IEEE Winter Conference on Applications of Computer Vision
(WACV, 2019).
8. Y. Li, H. Wang, L.M. Dang, D. Han, H. Moon, T. Nguyen, "A Deep Learning-Based
Hybrid Framework for Object Detection and Recognition in Autonomous Driving,"
IEEE Access, vol. 8, (2020).
9. A. Benjumea, I. Teeti, F. Cuzzolin, and A. Bradley, "YOLO-Z: Improving Small
Object Detection in YOLOv5 for Autonomous Vehicles," (2021).
10. Y. Kortli, S. Gabsi, Y. Lew Yan Voon, J. Lew, M. Jridi, M. Maher, M. Marzougui, and
M. Atri, "Deep Embedded Hybrid CNN-LSTM Network for Lane Detection on
NVIDIA Jetson Xavier NX," Knowledge-Based Systems, (2022).
11. G.N.V.V. Satya Sai Srinath Namburi, Athul Zac Joseph, S. Umamaheswaran, Ch.
Lakshmi Priyanka, Malavika Nair M, and Praveen Sankaran, "NITCAD - Developing
an Object Detection, Classification, and Stereo Vision Dataset for Autonomous
Navigation in Indian Roads," Procedia Computer Science, (2020).
12. J. Solawetz and F. Francesco, "What is YOLOv8? The Ultimate Guide," (2023).
Available online: https://blog.roboflow.com/whats-new-in-yolov8/.
13. G. Jocher and AyushExel, "YOLO by Ultralytics," (2023). Available online:
https://docs.ultralytics.com/.