Project Report (Group 9)
Project Report (Group 9)
Bachelor of Technology
By
Signature of Supervisor(s)
MR. KAPILKUMAR
CSE DEPARTMENT
COER UNIVERSITY
April, 2024
Declaration
I declare that this written submission represents my ideas in my own words and
where others' ideas or words have been included, I have adequately cited and
referenced the original sources. I also declare that I have adhered to all principles of
academic honesty and integrity and have not misrepresented or fabricated or falsified
any idea/data/fact/source in my submission. I understand that any violation of the
above will be cause for disciplinary action by the Institute and can also evoke penal
action from the sources which have thus not been properly cited or from whom
proper permission has not been taken when needed.
We would want to convey my heartfelt gratitude to Mr. Kapil Kumar our mentor, for his
invaluable advice and assistance in completing our project. He was there to assist us in every
step of the way, and his motivation is what enabled us to accomplish my task effectively. I
would also like to thank all of the other supporting personal who assisted us by supplying the
equipment that was essential and vital, without which we would not have been able to
perform efficiently on this project. We would also want to thank the university college of
engineering Roorkee for accepting our project in my desired field of expertise. We’d also like
to thank our friends and parents for their support and encouragement as we worked on this
project.
Also, this project would not have been completed without the help of readers including
ourself and various websites on the internet have helped a lot us in writing and understanding
this project.
Thank you
CONTENTS
I. Introduction
II. Review of Literature
III. Report on present investigation
IV. Result and Discussion
V. Summary and Conclusion
VI. Appendix
VII. Refences
1. INTRODUCTION 1-2
1.1 APPLICATION OF OBJECT DETECTION IN AUTONOMOUS
1.1.1. Pedestrian Detection
1.1.2. Vehicle Detection
1.1.3. Obstacle Detection
1.1.4. Traffic Sign recognition
1. First of all we take an img and then passed it through CNN. This process is done to extract
the features of the img.
2. The obtained features are then passed through a series of fully connected layer that help in
predicting the object probabilities and bounding box coordinates.
3. In next step the img is divided into a grid cell , and each grid cell is responsible for
predicting a set of bounding box and object probabilities.
4. By the help of bounding box we predict the object by using a post processing algorithm to
remove overlapping boxes and choose the box with the highest probability.
1
5. The final output that we get is a set of predicted bounding box.
1.1 Application of object detection in autonomous vehicles:
1 . Pedestrian Detection
Object detection technique is used to detect pedestrians on road and by the help of this the
safety also increases.
2 . Vehicle Detection
By detecting other vehicles autonomous vehicle are able to make a safe distance and
navigate through traffic.
3 . Obstacle Detection
By detecting the obstacle like construction on road helps to avoid accident on roads.
Detecting the traffic sign and speed limit allows us to follow the traffic rules and regulations.
2
CHAPTER 2 REVIEW OF LITERATURE
There are many traditional models that are used to detect the object but they have some
limitations due to which new models are proposed to increase the speed and accuracy of the
model. Some of the traditional models are :
3 . Fast R-CNN
4. Faster R-CNN
5 . YOLO
Histogram of oriented Gradients was introduced in 1986. It is the oldest method for object
detection. It was not so popular at that time. It become popular in 2005 where it is used to
perform many task related to computer. HOG extract the features of an image to detect the
object.
Below are some points that tell us the working of HOG works -
1 . First of all we have to find the gradient by dividing the entire computation of the image
into gradient representation (8x8 cells).
2. By the help of 64 gradient vector we split the cell into angular bins and compute the
histogram for a particular area. This process helps to reduce the size of 64 vectors to 9 values.
3. When we get 9 values for histogram of each cell then we choose to create overlaps for the
bock of cell.
4. The final step is to form the feature blocks, normalize the obtained features vector and
collect all the features vector to get all HOG features..
3
LIMITATIONS –
• It is time consuming.
• Computational complexity is very high
It was introduced in 2014. This model remove many issues that are present in HOG. In this
we are trying to extract about 2000 features by making use of selective features. Selective
search algorithm helps us for selecting the most significant extractions.
Below are some points that tell us the working of HOG works -
1 . First step is that by the help of selective search algorithm we select the important regional
proposals that ensure to generate multiple sub segment of a particular img.
2 . Once the selective search algorithm is completed our next step is to extract the features.
By the help of a pre-trained convolutional neural network we are able to extract the features.
3 . The final step is to make predictions of the image . The prediction are made by the
computation of a classification model and regression model is used to correct the bounding
box classification for the proposed region.
LIMITATIONS –
This model was introduced in 2015. In R-CNN we pass each region proposal one by one in
CNN architecture and selective search algorithm generate 2000 region proposal so it is very
complex and expensive to train the image using R-CNN. So to remove this problem
FastRCNN was introduced. Basically it take the whole image as an input in CNN architecture
instead of taking 2000 region proposal.
4
LIMITATIONS –
Faster R-CNN was introduced in 2015. We know that there are some issues in R-CNN and to
remove those issues Fast R-CNN model was proposed. But there are issues in Fast R-CNN
and to remove them Faster R-CNN model was introduced. Fast R-CNN also use selective
search algorithms to compute the region proposals, so this technique was replaced by Faster
R-CNN by introducing superior region proposal network. The region proposal network
reduce the margin computation time , usually 10 ms per image. This network consist of
convolutional network by the help of which we obtain essential feature of each pixel. For
each feature we have multiple anchor (the centre of the sliding window with unique size and
scale). These anchors are passed into classification layer and regression layer by the help of
which we classify the object and localize the bounding box.
LIMITATION –
• It must not be fast enough for real-time application due to multi-stage process.
2.5 YOLO
YOLO (You Only Look Once) was introduced by Joseph Redmon, Santosh Divvala, Ross
Girshick, and Ali Farhadi in 2016.YOLO (You Only Look Once) is a real-time object
detection algorithm that uses deep learning to detect objects in images or videos. YOLO
works by processing an image or video frame at a time and predicting the location and class
of objects in the frame. It uses a convolutional neural network (CNN) to extract features from
the input image and then applies a series of regression models to predict the bounding boxes
and class probabilities of objects in the frame. YOLO is known for its speed and accuracy,
making it a popular choice for real-time object detection applications.
5
CHAPTER 3 REPORT ON THE PRESENT INVESTIGATION
We are using YOLOv8 model for object detection to increase the speed and accuracy.
Here are some steps which we have followed during the completion of our project – 1
3.1 Methodology
This step involves gathering images of vehicles from online sources like Pexels and Pixabay.
These platforms offer a wide variety of high-quality images that can be used for training an
object detection model. It is important to collect a diverse set of images that cover different
types of vehicles, backgrounds, lighting conditions, and angles to ensure that the model
generalizes well to real-world scenarios.
Labeling is the process of marking images with bounding boxes that indicate the location of
objects (in this case, vehicles) within the img. The tool mentioned, labelImg, is commonly
used for this purpose. It allows users to open images in a graphical interface and draw
bounding boxes around objects. These labeled images are then saved along with XML files
that contain information about the coordinates of the bounding boxes and the corresponding
object classes.
The labeled images are typically divided into three subsets: training, testing, and validation.
The training set is used to train the model, the testing set is used to evaluate the model's
performance during training, and the validation set is used to fine-tune the model and assess
its generalization ability.
Before training the object detection model, it's necessary to have the appropriate software and
libraries installed. In this case, the ultralytics package needs to be installed using pip. This
6
package provides implementations of various deep learning models, including YOLOv8,
which is a popular architecture for object detection tasks.
Training the object detection model involves feeding the labeled images into the YOLOv8
architecture and adjusting the model's parameters to minimize the difference between the
predicted bounding boxes and the ground truth bounding boxes.
The main.py file contains the code for configuring the training process, including specifying
hyperparameters such as learning rate, batch size, and number of epochs. During training, the
model learns to recognize vehicles in the images and predict their bounding boxes.
The output of this step is a trained model file named yolov8n.pt, which contains the learned
weights and parameters of the model.
Once the model is trained, it can be used to detect vehicles in new images or videos. In this
case, the model is applied to a video file (test2.mp4) to identify vehicles.
The yolo command is used to perform object detection using the trained YOLOv8 model.
Parameters such as the model file (yolov8n.pt), confidence threshold (conf), and source file
(test2.mp4) are specified to customize the detection process.
The output of this step is typically a new video file or images with bounding boxes drawn
around the detected vehicles, providing visual confirmation of the model's performance.
Python plays a crucial role in the application of YOLO (You Only Look Once), a real-time
object detection system, due to its versatility, extensive library support, and ease of use.
Python is often used to integrate, customize, and extend the YOLO codebase, allowing
developers to tailor the model to specific use cases. It is also employed for data preprocessing
and augmentation, leveraging libraries such as NumPy, OpenCV, and PIL for tasks like
7
resizing, cropping, and applying transformations to images. Furthermore, Python's popular
deep learning libraries such as TensorFlow and PyTorch are commonly used for training,
fine-tuning, and inference with YOLO models, while visualization libraries like matplotlib
and seaborn aid in model analysis and performance visualization. The rich Python community
and availability of resources further make it an attractive choice for working with
YOLO.
3.2.2 YOLOv8 model
The YOLOv8 model, an advanced iteration of the YOLO (You Only Look Once) series, is
widely employed for real-time object detection across diverse applications due to its
exceptional features and capabilities. Its real-time detection capability makes it well-suited
for applications such as autonomous vehicles, surveillance systems, and robotics where rapid
and accurate object detection is essential. The model's versatility extends across various
domains including industrial automation, retail analytics, security systems, medical imaging,
and sports analytics. Moreover, YOLOv8 is known for its precision in accurately localizing
objects within images, an important feature for applications such as medical imaging and
quality control in manufacturing. Its scalability and efficiency make it suitable for
deployment in both high-powered server environments and resource-constrained edge devices,
contributing to its wide applicability.
Deep learning plays a critical role in object detection by leveraging complex neural network
architectures to automatically extract features from images, train object detection models, and
significantly enhance detection accuracy compared to traditional computer vision methods.
Through techniques like convolutional neural networks, deep learning models can efficiently
learn hierarchical representations of data, enabling precise localization of objects and the
ability to detect a wide range of object classes across diverse domains such as healthcare,
autonomous driving, and surveillance.
8
CHAPTER 4 RESULT AND DISCUSSION
YOLO is another popular deep learning model for object detection in autonomous vehicles. In
a study that compared the performance of Faster R-CNN and YOLO on the COCO dataset,
YOLO achieved higher speed (45 frames per second) and comparable accuracy to Faster
RCNN.
(Fig 1)
Fig 1 shows the detection of cars and truck and also shows the percentage of accuracy.
9
(Fig 2)
Fig 2 shows the detection of cars and shows the percentage of accuracy.
10
CHAPTER 5 SUMMARY AND CONCLUSIONS
This Project “Object detection in autonomous vehicles using deep learning” focus on
detection of objects like cars, bike and trucks in real-time by the help of YOLOv8 model.
Autonomous vehicles are those without a driver that offer better security and comfort to
passengers. The safety of their propulsion and their ability to avoid causing traffic accidents
are the two most crucial factors with regard to autonomous cars. It involves the system and
device functional safety of the vehicle. Object detection is a critical component in enabling
autonomous vehicles to perceive and interact with their environment. In recent years, deep
learning-based approaches have shown significant improvements in object detection accuracy
and speed. We propose a method for object detection in autonomous vehicles using YOLOv8
model. Our approach achieves high accuracy and fast speed , making it suitable for real-time
applications in autonomous vehicles. In this project first we collect images of vehicles from
online sources like Pexels and Pixabay. Then we do the image labeling for marking images
with bounding boxes that indicate the location of objects (in this case, vehicles) within the
image. After that we choose YOLOv8 model for object detection. The purpose to choose
YOLOv8 model is that it has the capability to detect the object in real time and also the speed
and accuracy of this model is very high. Then we train our model. Once the model get trained
we can detect vehicles in new images or videos.
The purpose of our project is to detect the object in autonomous vehicles using deep learning
by using YOLOv8 model. By the help of YOLOv8 algorithm we are able to detect the images
correctly and we can predict the Bounding boxes and multiple class probabilities are
displayed concurrently but we are not able to achieve 100 % accuracy.
11
CHAPTER 7 REFERENCES
1. Youtube : https://youtu.be/m9fH9OWn8YM?si=m6_nKzeRK7RmE1C5
2. Deep learning module : https://docs.ultralytics.com/quickstart/#conda-docker-image
3. https://ieeexplore.ieee.org/document/9633965
4. Muhammad Azriyahya “Deep Learning for Object Identification in LiDAR for
Autonomous Vehicles” 2020 IEEE 10th International Conference on System
Engineering and Technology (ICSET), 9 November 2020, Shah Alam, Malaysia.
5. Ruturaj Kulkarni “"Traffic Light Detection and Recognition for Self-Driving Vehicles
using Deep Learning," 2018 IEEE Fourth International Conference on Computing,
Communication, Control, and Automation (ICCUBEA)
12
13