0% found this document useful (0 votes)

19 views71 pages

Automatic License Plate Recognition Using Yolov8: Submitted by

The project report details the development of an Automatic License Plate Recognition (ALPR) system using YOLOv8 for real-time detection and recognition of vehicle license plates. The system aims to enhance traffic management and security by automating license plate identification, utilizing technologies such as EasyOCR for text recognition. The report includes acknowledgments, a declaration of originality, and outlines the methodologies, technologies used, and the expected outcomes of the project.

Uploaded by

Ajay Bara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views71 pages

Automatic License Plate Recognition Using Yolov8: Submitted by

Uploaded by

Ajay Bara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 71

Automatic License Plate Recognition Using YOLOv8

Project Report Submitted in Partial Fulfilment of the Requirements for the Degree of

Master in Computer Application

in
Computer Science and Engineering

Submitted by

Raushan Kumar : (Roll No. 2022PGCACA016)

Anubroto Maitra : (Roll No. 2022PGCACA017) Ritik
Jain: (Roll No.2022PGCSCA018)

Under the Supervision of

Prof. Dr. RR Suman

Department of Computer Science and Engineering National Institute of Technology

Jamshedpur
CERTIFICATE

This is to certify that the report entitled Automatic License Plate Recognition
using YOLOv8” is a bonafide record of the Project done by Raushan Kumar (Roll
No.: 2022PGCACA016) under my supervision, in partial fulfillment of the
requirements for the award of thedegree of Masters in Computers Applications in
Computer Science and Engineering from National Institute of Technology
Jamshedpur.

Prof. Dr. RR Suman (Guide)

Computer Science and Engineering

Date: 25,April 2025

Department seal

ii
DECLARATION

I certify that the work contained in this report is original and has been done by us
under the guidance of my supervisor(s). The work has not been submitted to any
other Institute for any degree. I have followed the guidelines provided by the
Institute in preparing the report. I have conformed to the norms and guidelines given
in the Ethical Code of Conduct of the Institute. whenever I have used materials
(data, theoretical analysis, figures, and text) from other sources, I have given due
credit to them by citing them in the text of the report and giving their details in the
references. Further, I have taken permission from the copyright owners of the
sources, whenever necessary.

Signature of the Students:

2022PGCACA018 Ritik Jain Sign

2022PGCACA017 Anubroto Maitra Sign
2022PGCACA016 Raushan Kumar Sign
ACKNOWLEDGEMENT

It gives us immense pleasure to express my deep sense of gratitude to my supervisor

Prof. Dr. RR Suman for his valuable guidance, motivation, constant inspiration and
above all for their ever- cooperating attitude that enable me in bringing up thisthesis
in the present form. Our heartfelt gratitude also goes to Dr. Danish Ali Khan, Head
of Department of Computer Science Department for providing us the opportunity
to avail the excellent facilities and infrastructure. We are equally thankful to all
other faculty members and non-teaching staffs of Computer Applications
Department for their guidance and support. We are also thankful to all my family
members whose love, affection, blessings and patience encouraged us to carry out
this thesis successfully. We also extend my gratitude toall my friends for their
cooperation.

ii
ABSTRACT

This project, “ALPR: Automatic License Plate Recognition,” presents an intelligent system capable

of automatically detecting and recognizing vehicle license plates from images. Utilizing YOLOv8 (You

Only Look Once version 8) for real-time object detection and EasyOCR for accurate text recognition,

the application streamlines the process of license plate identification. Users simply upload an image of a

vehicle (e.g., a car), and the system automatically detects the license plate, extracts the characters, and

returns the plate number in text format. This recognized text is then securely stored for future use or

analysis.

Technology is playing a crucial role in transforming various sectors, and transportation management is

no exception. With the rapidly increasing number of vehicles globally, challenges such as traffic

congestion and unorganized parking have become common. In many regions, especially in developing

countries, there is no proper infrastructure for parking and traffic monitoring. Vehicles are often parked

haphazardly due to the absence of a smart management system.

The integration of ALPR technology can be a game-changer in creating smart traffic and parking

solutions. By automating license plate recognition, authorities can better monitor vehicle movement,

enforce parking regulations, and enhance security. This project represents a foundational step toward

implementing smarter, more efficient urban mobility systems through the use of computer vision and AI

Keywords: YOLOv8, PyTorch, EasyOCR, PyTesseract, Convolutional Neural Network

(CNN), Image Augmentation, OpenCV, Object Detection, Optical Character Recognition

(OCR), Bounding Box Detection, Deep Learning, Image Preprocessing, Character

Segmentation, Real-Time Recognition, Smart Traffic Management, Vehicle Identification,

Automated Surveillance.

iii
Table of Contents

ABSTRACT ...................................................................................................... ii
LIST OF ABBREVIATIONS .......................................................................... v
1 INTRODUCTION ...................................................................................... vi
1.1 Purpose
1.2 Features
1.3 Technologies Used
1.4 Execution Workflow
1.5 Specific Goals
1.6 Expected Outcomes
2. LITERATURE REVIEW ......................................................................... xii
2.1 Object Detection Techniques

Optical Character Recognition (OCR) in LPR Systems

2.2
2.3 Role of Deep Learning in License Plate Detection
3. CONVOLUTIONAL NEURAL NETWORK (CNN) ............................. xvi

4. MODEL ARCHITECTURE AND ALGORITHMS .............................. xiv

3.1 YOLOv8 Architecture
3.2 Understanding Pytesseract for OCR xvi
3.3 PyTorch Framework
5. PROPOSED METHODOLOGY .............................................................. xxvii
4.1Data Collection and Annotation
4.2 Data Preprocessing and Augmentation
Model Training using YOLOv8
4.3
4.4 Text Extraction with Pytesseract
4.5 System Workflow Diagram
6. EXPERIMENTAL RESULTS AND EVALUATION ............................ xxxii
6.1Dataset
6.2Augmentation Techniques Applied xxv
6.3 Evaluation Metrics (Precision, Recall, mAP) xxvi
6.4 Comparative Performance Analysis xxvii
6.5 Real-time Testing and Observations
7. Conclusions and Scope for Future Work ................................................. xxxvi

REFERENCES ................................................................................................. xxxix

iv
LIST OF ABBREVIATIONS

Abbreviation Full Form

1. CNN Convolutional Neural Network
3. DL Deep Learning
4. AI Artificial Intelligence
5. ROI Region of Interest
6. GUI Graphical User Interface
7. RGB Red Green Blue (Color Space)
8. ReLU Rectified Linear Unit
9. SGD Stochastic Gradient Descent
10. F1 Score Harmonic Mean of Precision and Recall
11. ROC Receiver Operating Characteristic
12. GPU Graphics Processing Unit
13. TPU Tensor Processing Unit
14. ALPR Automatic License Plate Recognition
15. OCR Optical Character Recognition
16. YOLO You Only Look Once
17. NMS Non-Maximum Suppression
18. IoU Intersection over Union
19. FPS Frames Per Second
20. API Application Programming Interface
21. COCO Common Objects in Context (Dataset)
22. CV Computer Vision
23. PyTesseract Python Tesseract OCR Tool
24. OpenCV Open Source Computer Vision Library
25. YAML Yet Another Markup Language
26. FPS Frames Per Second

Abbreviations shall be given in the alphabetical order. Give sufficient spacing

between the abbreviation and its expanded form.

List of Abbreviations shall be included only if there are more than 3 abbreviations
used in the report.

When it appears for the first time in the text of the report, the expanded form shall
be given with the abbreviation in the parenthesis. For example: “Flexible
Manufacturing Systems (FMS) are extensively used …”

v
1. INTRODUCTION

Majority of road accidents or rule violations such as speeding, ignoring traffic signals, neglecting safety
equipment in vehicles (such as seatbelts and helmets), and improper overtaking occur due to distraction,
lack of concentration, or unawareness. In rural areas, these situations often arise mainly due to a lack of
awareness about the consequences of violating traffic rules. Road safety and driver behavior while
driving are largely influenced by the efficiency of the fine management system, as one of the most
effective ways to shape human behavior is by imposing fines immediately when rules are broken.

Our application, FineScan Pro, primarily focuses on utilizing an object detection module, YOLOv8.
YOLOv8 is a state-of-the-art framework designed to detect objects within images or video frames.
YOLO has a wide range of applications in autonomous vehicles and image recognition. It surpasses
conventional image detection systems by dividing an image into multiple grids and processing the entire
grid in a single pass. This method delivers exceptional performance in real-time applications, where
high accuracy and fast processing are crucial.

FineScan Pro is not just a reporting tool but a comprehensive system that promotes accountability and
transparency in traffic management. The application is further enhanced with features such as real-time
location tracking, payment gateway integration, and a fine appeals system, making it a holistic
solution for traffic violation management.

As the world shifts toward smart solutions for everyday challenges, FineScan Pro stands as a testament
to the power of technology in enhancing public services. This project explores the intersection of mobile
application development, machine learning, and cloud hosting to build a robust, user-friendly, and
efficient system for managing and reducing traffic violations.

1.1 Problem Statement

Manual vehicle identification is slow, error-prone, and inefficient for handling the growing number of
vehicles on roads. There is a need for an automated system that can accurately detect and recognize
license plates from images or video in real-time. This project aims to develop an Automatic License
Plate Recognition (ALPR) system using YOLOv8 to improve traffic monitoring, rule enforcement, and
vehicle tracking efficie

vi
1.2 Purpose

The main purpose of building this project is to create a smart technology for future. We can see
number of cars are increasing day by day. It can create issues of accidents, crimes & traffic.
Now our plan is that, we need a one smart system which can detect car number plate and
recognize the number automatically. So, with this feature we can improve our traffic & parking
solutions. Even we can prevent the crime also. It can automate the traffic & parking
management in very smarter way. So, in this we are implementing our first step of Number
Plate Recognition.

1.3 Current Situation: -

Technology is playing a very vital role in changing our life in many ways. There are many
sectors which benefited after implementing technology in it. Now it’s time for robotics & IOT
kind of management system. We have seen in our country, not even in country but in all over
world day by day number of cars is increasing. And it’s creating traffic & parking management
issues. In current scenario, there is no proper facility of parking in our country. People park
their car in area which whatever they like. There is no proper management for parking & traffic
solutions. India having too much populations, so that’s why number of personal vehicles are
increasing. So that’s why it’s creating issues. Our system is the first step to automate the
solution for traffic & parking. In our application user can be any security agencies or
government. They need to upload the image of car. It will provide the all-vehicle info of the
owner. And admin can see the full report of it.

1.2 Features

Automatic License Plate Detection

Uses YOLOv8 for fast and accurate detection of vehicle license plates from images.
License Plate Text Recognition

Utilizes OCR (like EasyOCR or PyTesseract) to extract and convert license plate text from
detected regions.
Data Augmentation:

The model incorporates data augmentation techniques (such as rotation and scaling) to enhance
generalization and improve performance across diverse datasets.
Real-Time Processing

Capable of detecting and recognizing plates from live video or real-time camera feed (if
applicable).
Image Upload Support

Users can upload images of vehicles to detect and recognize license plates automatically.
High Accuracy and Speed

Achieves real-time detection with optimized accuracy using YOLOv8’s deep learning
architecture
vii
1.3 Technologies Used
Programming Language: Python

Description:
Python is employed as the primary programming language for developing the Brain Tumor
Detection project. Renowned for its simplicity and flexibility, Python provides a robust
environment for implementing machine learning and deep learning models. Its extensive
library ecosystem ensures efficient handling of data processing, image manipulation, and
neural network implementation.
Significance:
Python’s support for machine learning frameworks and libraries like TensorFlow, Keras,
and OpenCV accelerates the development of complex models. Its clear syntax and strong
community support make it ideal for prototyping and building production-ready
applications.

1. TensorFlow and Keras

Description:
TensorFlow is an open-source machine learning framework used to design, build, and train
the Convolutional Neural Network (CNN) that powers the brain tumor detection system.
Keras, a high-level neural networks API integrated into TensorFlow, simplifies the creation
and training of deep learning models, providing flexibility and user-friendly tools.
Significance:
TensorFlow’s computational graph-based approach ensures efficient resource management
during training and inference. Keras complements it with pre-built functions for tasks like
model design, compilation, and evaluation. Together, these frameworks form the
foundation for building and optimizing the CNN architecture.

2. OpenCV (Open Source Computer Vision)

Description:

OpenCV is utilized for preprocessing the MRI images before they are fed into the CNN
model. Tasks such as image resizing, normalization, thresholding, and noise reduction
using Gaussian blur are performed using OpenCV. These operations ensure that the input
images are clean and uniformly formatted for accurate model predictions.
Significance:
By enhancing the quality of input images, OpenCV plays a critical role in increasing the
viii
CNN’s ability to detect patterns, ultimately improving classification accuracy

Pytesseract: Pytesseract is an Optical Character Recognition (OCR) tool used to extract text from the detected
license plates. After the license plate is localized by YOLOv8, Pytesseract helps recognize and extract the
alphanumeric characters on the plate, providing a full License Plate Recognition (LPR) system when combined
with YOLOv8 for detectio

ix
3. Google Colab
Description:
Google Colab offers a cloud-based platform for running Python code and Jupyter
notebooks. With free access to GPUs and TPUs, it enables efficient training of
computationally intensive CNN models. Google Colab supports seamless integration with
libraries like TensorFlow and OpenCV and provides an interactive environment for
experimenting with deep learning workflows.
Significance:
The ability to access powerful hardware without local infrastructure makes Google Colab
an invaluable resource for researchers and developers. Its collaborative features and real-
time sharing options enhance productivity and experimentation.

4. Matplotlib and cv2_imshow

Description:
Matplotlib is used for visualizing model performance metrics, including training and
validation accuracy and loss. Additionally, it helps plot confusion matrices and other
critical evaluation graphs. The cv2_imshow function, specific to Google Colab, allows
MRI images to be displayed directly within the notebook for analysis.
Significance:
Visualization tools like Matplotlib provide critical insights into model behavior, facilitating
better understanding and debugging during development and evaluation phases.

5. Google Drive Integration

Description:
Google Drive serves as a centralized storage platform for datasets, trained models, and
experimental results. It is integrated with Google Colab to allow smooth data access,
uploading, and sharing.
Significance:

x
Centralized storage enhances collaboration among team members and ensures that data and
models remain accessible and organized. It also acts as a backup to prevent data loss during
development.

6. Pandas and NumPy

Description:
Pandas is employed for managing and analyzing structured data, while NumPy provides
tools for numerical computation and efficient array manipulations. These libraries form the
foundation for initial data analysis and preprocessing.
Significance:
Both Pandas and NumPy streamline the handling of large datasets, allowing for quick
manipulations and computations that form the groundwork for training deep learning
models.

1.4 Execution Workflow

The execution workflow of the Brain Tumor Detection project involves several systematic
stages, from data acquisition to the deployment of the trained model. Below is a detailed
breakdown of each step in the workflow:

1. Data Collection: MRI images categorized into Glioma, Meningioma, No Tumor,

and Pituitary Tumor.
2. Data Preprocessing: Includes resizing images to 150x150 pixels, normalization,
and augmentation.
3. Model Training: A CNN model is trained using TensorFlow/Keras with a well-
defined architecture.
4. Validation and Testing: The model's performance is evaluated using metrics like
accuracy, precision, recall, and F1 score.
5. Prediction: Users upload MRI images to receive real-time predictions.
6. Visualization: Results, including confusion matrices and performance plots, are
displayed for analysis.
1.4 Execution Workflow
The execution workflow for the License Plate Detection system follows a structured
sequence of steps to ensure that the model is built, trained, evaluated, and deployed
efficiently. Each step is crucial for the overall success of the project, ensuring the system
meets the desired accuracy and performance for real-world applications. The workflow is as
follows:
• Dataset Collection:
The first step in building the license plate detection system is to gather a
comprehensive dataset of vehicle images. These images should include annotated
license plates, which means that the location of each license plate in the image is

xi
labeled with bounding box coordinates. The dataset should represent a variety of
vehicle types, license plate designs, and environmental conditions (e.g., lighting,
camera angles, and background clutter) to ensure the model generalizes well. The
images can be collected from public datasets, or custom data may be collected using
cameras in various real-world settings like highways or parking lots. It is essential to
have a large and diverse dataset to train the YOLOv8 model effectively.
• Preprocessing:
Once the dataset is collected, it is preprocessed to ensure the data is in a suitable format for
model training. Image resizing is done to standardize the input size, ensuring that all images
have consistent dimensions. Normalization is applied to adjust the pixel values, usually
scaling them to a range between 0 and 1 to improve training efficiency. Data augmentation
techniques are then employed, including random rotations, flips, scaling, and brightness
adjustments, to artificially expand the dataset. Augmentation helps the model become more
robust by introducing variability in the images, making it more capable of handling various
real-world conditions such as varying light levels, plate designs, and vehicle angles.
• Model Training:
In this step, the YOLOv8 model is trained on the preprocessed dataset. YOLOv8 is a deep
learning object detection model known for its speed and accuracy, making it ideal for real-
time applications like license plate detection. The model is trained using PyTorch, leveraging
its capabilities to perform efficient tensor operations and dynamic graph computation. During
training, the model learns to identify license plates by adjusting its internal weights through
backpropagation and optimization techniques. Hyperparameters such as the learning rate,
batch size, and number of epochs are fine-tuned to achieve the best results. The training
process involves the model iterating over the dataset multiple times, progressively improving
its ability to detect license plates.
• Evaluation:
After training, the model’s performance is evaluated using various metrics to ensure its
accuracy and effectiveness. Precision measures how many of the detected license plates were
correct, while recall assesses how many actual plates were detected by the model. mAP
(mean Average Precision) is a more comprehensive metric that considers both precision and
recall and is commonly used in object detection tasks to evaluate the overall performance of
the model. These metrics provide insights into the strengths and weaknesses of the model,
helping identify areas for improvement and guiding further fine-tuning of the model to
achieve optimal detection accuracy.
• Testing:
Following evaluation, the trained model is tested on a separate set of test images that were
not included in the training or validation sets. The goal of testing is to verify the model’s
ability to generalize to new, unseen data. The model’s detection capabilities are tested under
real-world conditions to ensure it performs well in diverse scenarios such as different
lighting conditions, varying angles, and occlusions. The results from testing help confirm
whether the model is ready for deployment or if additional improvements are necessary.
• Deployment:
Once the model achieves satisfactory performance on test images, it is ready for deployment
in real-time applications. The trained YOLOv8 model is integrated into a software system that
xii
can detect license plates in live video feeds or images from traffic cameras, parking lot
surveillance, or toll booths. Real-time detection is crucial for applications like automatic toll
collection and parking management, where the system must identify license plates with
minimal delay. The deployment step may involve optimizing the model for speed and
efficiency, ensuring that it can handle a high volume of images or video streams without
significant lag.

1.5 Specific Goals

The primary goals of this project are to develop a robust and reliable automatic license plate detection
system that is capable of accurately identifying license plates from vehicle images in real-time, using state-
of-the-art techniques. The system should be highly effective under varying environmental conditions,
diverse vehicle types, and different camera angles. The detailed goals are as follows:

• Develop a robust detection system using YOLOv8:

The first goal is to design and implement a robust license plate detection system utilizing YOLOv8 (You
Only Look Once version 8). YOLOv8 has shown remarkable capabilities in object detection due to its
high-speed processing and exceptional accuracy. By fine-tuning the YOLOv8 model, the system will be
able to accurately locate license plates in vehicle images, regardless of the vehicle type, lighting, or
position. This will allow for efficient and precise detection, which is crucial for real-world applications
like toll booths and surveillance systems.

• Implement data augmentation techniques:

In order to improve the model's ability to generalize well across different types of images and
conditions, various data augmentation techniques will be employed. This includes rotating images,
scaling, flipping, adjusting brightness, and adding noise to simulate different environmental conditions
such as varying lighting, shadows, and motion blur. This process will help increase the model’s
robustness and its ability to accurately detect license plates under a variety of scenarios, such as
nighttime images, images taken from different angles, and images with occlusions. The goal is to ensure
that the model works effectively in diverse and unpredictable real-world situations.

• Train the YOLOv8 model using PyTorch:

PyTorch will be the framework used to train the YOLOv8 model. The model will be trained on a large
and diverse dataset of vehicle images, which will help the model generalize well to unseen data.
PyTorch’s dynamic computation graph and easy-to-use interface will allow for quick adjustments and
improvements to the model during the training process. The goal is to achieve optimal performance in
terms of accuracy and computational efficiency, enabling real-time detection capabilities. The training
process will also involve fine-tuning hyperparameters such as learning rate, batch size, and number of
epochs to achieve the best results.

• Integrate Pytesseract for Optical Character Recognition (OCR):

After detecting the license plate in an image, the next critical step is to read the alphanumeric characters
on the plate. To achieve this, Pytesseract, an OCR tool, will be integrated into the system. Pytesseract
will convert the image of the license plate into readable text, providing a full License Plate Recognition
(LPR) system. The characters extracted using OCR will be essential for applications that require not
only detecting the license plate but also reading the text on it for identification, such as automatic
vehicle identification systems in toll booths or parking areas.

• Optimize the model for real-time detection:

xiii
One of the primary goals of the project is to ensure that the detection and recognition system is
optimized for real-time applications. This means that the system should be able to detect license plates
and recognize the characters almost instantly, with minimal processing delay. Real-time performance is
crucial for applications like toll collection, parking lot management, and surveillance, where delays can
lead to inefficiencies or failures in automated operations. The model will be optimized for both speed
and accuracy, ensuring that the system is scalable and responsive for large-scale deployment in real-
world scenarios.

• Evaluate the system’s performance:

Once the model has been trained, it is essential to rigorously evaluate its performance to ensure its
effectiveness. The evaluation will be done using metrics such as precision, recall, and mean Average
Precision (mAP). Precision measures how many of the detected license plates were actually correct,
while recall evaluates how many of the actual plates were successfully detected. The mAP score is a
comprehensive metric that considers both precision and recall, providing an overall performance
measure. These metrics will allow for an objective assessment of the model’s capabilities, highlighting
areas of strength and identifying opportunities for further improvement.

Expected Outcomes
The License Plate Detection System built using YOLOv8 and Pytesseract is expected to achieve several key
outcomes that align with the objectives of this project. These outcomes will demonstrate the system’s
effectiveness, scalability, and potential for real-world applications. The expected outcomes are as follows:
• High Accuracy in License Plate Detection:
The system should be capable of accurately detecting license plates in a wide variety of vehicle images.
This includes handling images with varying conditions, such as different lighting, angles, and vehicle
types. With the YOLOv8 model's capability for fast and precise object detection, the expected outcome
is a high detection rate of license plates across test datasets, as well as a low false-positive rate.
• Robustness to Environmental Variability:
Thanks to the application of data augmentation techniques, the model is expected to be highly robust to
environmental changes. Whether it’s day or night, sunny or rainy, the model should be able to detect
license plates under different lighting conditions, in cluttered backgrounds, and from various camera
angles. This would make the system reliable in real-world conditions, where such variables are
common.
• Real-Time License Plate Recognition (LPR):
One of the primary outcomes is that the model should be optimized for real-time performance. This will
ensure that the system can process live video feeds or images quickly enough for applications such as
toll collection or parking lot management. Real-time processing would allow the system to detect and
recognize license plates without significant delays, providing immediate results for use in automated
systems.
• Effective Optical Character Recognition (OCR):
After detecting the license plate, the Pytesseract OCR tool will be used to extract the characters on the
plate. The expected outcome is that the system should reliably extract the alphanumeric characters from
the plate and convert them into readable text. This feature will be essential for real-world applications
that require both detection and recognition of the license plate number, such as vehicle tracking, toll
processing, or access control in gated areas.
• Comprehensive Evaluation Metrics:
The system’s effectiveness will be measured using evaluation metrics such as precision, recall, and mAP
xiv
(mean Average Precision). The expected outcome is that the model should achieve high scores in these
metrics, demonstrating its ability to both correctly detect license plates and minimize false detections.
Precision will ensure that the detections are mostly accurate, recall will confirm that the model is
detecting as many license plates as possible, and mAP will provide an overall assessment of the
system’s performance.
• Scalability and Flexibility:
Another key outcome is that the model will be flexible enough to adapt to a variety of use cases and
environments. It should work seamlessly on different datasets, such as urban or rural environments, with
a range of vehicle types, license plate designs, and weather conditions. This scalability will make the
system suitable for integration into larger automated systems, including toll booths, parking
management systems, and surveillance systems, where it can handle high volumes of vehicle data.
• Deployment for Real-World Applications:
The final expected outcome is that the system will be ready for deployment in real-world environments.
By integrating the trained YOLOv8 model into an application for real-time license plate detection, the
system should be able to be easily deployed in environments such as highways, parking lots, or at toll
gates. This would streamline operations like automated toll collection, access control for parking lots,
and monitoring for security purposes.
• Improved Model Performance Over Time:
As the system is deployed and used in different environments, the model is expected to improve over
time through continuous training and fine-tuning. This would be an important outcome, as the system
would become more effective in recognizing license plates across a wider variety of scenarios and
vehicle type

xv
• 3. SYSTEM REQUIREMENT

Hardware Requirements:
Component Minimum Requirement Recommended Requirement

Processor Intel i5 8th Gen / AMD Intel i7 or above / AMD Ryzen 7 or

(CPU) Ryzen 5 above

Memory (RAM) 8 GB RAM 16 GB or more RAM

Graphics Card NVIDIA GPU with 4GB VRAM NVIDIA RTX 2060/3060 or
(GPU) (e.g., GTX 1050 Ti) higher with CUDA support

Storage 20 GB of free disk space SSD with 50 GB or more free space

Display 1366×768 resolution Full HD (1920×1080) or higher

Software Requirements
:
Software Component Description
Operating System Windows 10 / Linux (Ubuntu 20.04 recommended) / macOS
Python Version 3.8 or above
PyTorch Deep learning framework for training YOLOv8
Ultralytics YOLOv8 YOLOv8 model for object detection (pip install ultralytics)
OpenCV For image processing (pip install opencv-python)
Pytesseract OCR library to extract text from detected license plates
Tesseract-OCR Engine Backend OCR engine (needs to be installed separately)
Jupyter Notebook / Google Colab For development, training, and experimentation
Required for GPU acceleration with
CUDA Toolkit (if using GPU)
PyTorch

External Tools and Dependencies

• Google Colab (if training on cloud): Offers free GPU access for
deep learning tasks.
• Labeling Tool (like LabelImg): For annotating license plates in the
training dataset.
• Git: Version control system (optional but recommended).
• Google Drive: For dataset and model storage during Colab usage.

xvi
xvii
xviii
2. CONVOLUTIONAL NEURAL NETWORK (CNN)

The basic idea of Convolutional Neural Network was introduced by Kunihiko Fukushima in
1980s. Convolutional Neural Networks (ConvNets or CNNs) are a category of Neural
Networks that have proven very effective in areas such as image recognition and
classification. Computer vision techniques are dominated by convolutional neural networks
because of their accuracy in image classification. CNN is a class of deep, feed-forward
artificial neural networks (where connections between nodes do not form a cycle) & use a
variation of multi-layer perceptrons designed to require minimal pre-processing. ConvNet
architectures make the explicit assumption that the inputs are images, which allows us to
encode certain properties into the architecture. These then make the forward function more
efficient to implement and vastly reduce the number of parameters in the network. ConvNets
are made up of neurons that have learnable weights and biases. Each neuron receives some
inputs, performs a dot product and optionally follows it with a non-linearity.
The whole network still expresses a single differentiable score function: from the raw image
pixels on one end to class scores at the other. And they have a loss function on the last layer.

(I) Why CNN is different than simple Neural Network:

Convolutional Neural Networks have a different architecture than regular Neural
Networks [86]. Regular Neural Networks transform an input by putting it through a series
of hidden layers. Every layer is made up of a set of neurons, where each layer is fully
connected to all neurons in the layer before and where neurons in a single layer function
completely independently and do not share any connections between themselves. Finally,
there is a last fully-connected layer, which is the output layer that represent the
predictions. Regular Neural Networks do not scale well to full images. Convolutional
Neural Networks are a bit different. First of all, the layers are organized in three
dimensions: width, height and depth. Further, the neurons in one layer do not connect to
all the neurons in the next layer but only to a small region of it. Lastly, the final output
will be reduced to a single vector of probability scores, organized along the depth
dimension. Moreover, CNNs perform convolution operation in case of matrix
multiplication.

Figure 5: A simple neural network and A Convolutional Neural Network

xix
In the above, in figure 5, the left side represents a regular three Layer neural network. On

xx
the other hand, the right side of the figure represents a CNN which arranges its neurons in
three dimensions (width, height, depth).
Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron
activations. In this example, the red input layer holds the image, so its width and height
would be the dimensions of the image, and the depth would be three (Red, Green, Blue
channels).

(ii) The Convolution Operation:

Convolutional Neural Networks perform a mathematical operation, known as convolution
operation. Convolution is a mathematical operation on two functions (f and g) and it
produces a third function. The convolution operation of f and g is denoted as f*g . It is
defined as the integral of the product of the two functions after one is reversed and shifted.
This operation is a particular kind of integral transform

There are three elements that enter into the convolution operation:
• Input image: It is the image that is given as an input.
• Feature detector: The feature detector is often referred to as a “kernel" or a “ filter".
Sometimes a 3*3 or a 5*5 or a 7*7 matrix is used as a feature detector.
• Feature map: The feature map is also known as an activation map. It is called feature
map because it is also a mapping of where a certain kind of feature is found in the
image.

Figure 6: Convolution Operation

(iii) The Convolution Arithmetic:

Here the facts are shown how properties of an output image changes from input by
some factors.
Let, Input Image Size = I, (means width I and Height I i.e. image size is I * I)
Filter Size = F, (Filter is F * F)
xxi
Number of Filters = K

xxii
Number of Strides = S
Amount of Zero Padding = P
Then the output image size will be:

• Forward Pass for Convolutional Layer:

In forward pass of convolutional layer, a filter performs dot multiplication with all the
parts of the input matrix one after another of filter size, then sum all the elements of the
single dot product and adds the bias value with it and places the final value in the
corresponding location of the output matrix for that dot product. This process continues
across full input image with all filters and for each filter there comes an output.
• Back-propagation for Convolutional Layer:
In back-propagation, the cost function is first found out, then this cost function
measures the displacement with the output. After that, applying gradient descent on this
function will update the filter value of the previous layer. This process continues until it
reaches to the input layer.

The iteration of forward pass and back-propagation will continue until the network
finds the desired output.

(iv) Layer’s Used to a Build CNN Model:

A simple CNN is a sequence of layers, and every layer of a CNN transforms one volume
of activations to another through a differentiable function. Three main types of layers
used to build CNN architectures are:

• Convolutional Layer: The Convolutional layer is the core block of the Convolutional
Neural Network. It has some special properties. It does most of the computational
heavy lifting.The CONV layer’s parameters consist of a set of learn-able filters. Every
filter is small spatially (along width and height), but extends through the full depth
of the input volume.
For example, a typical 3X3 filter on a first layer of a ConvNet might have size 5*5*3
(i.e. 5 pixels width and height, and 3 because images have depth 3, the color channels).
During the forward pass, each filter is convolved across the width and height of
the input volume and dot products are computed between the entries of the filter and
the input at any position. As the filters are slided over the width and height of the input
volume, a 2-dimensional activation map will be produced that gives the responses
of that filter at every spatial position. Intuitively, the network will learn filters that activate
when they see some type of visual feature such as an edge of some orientation.
These activation maps are stacked along the depth dimension and the output volume
is produced.

xxiii
Figure 7: Convolution Operation of CNN

Stride: Stride is used to slide along width and height of the

input image. When the stride is 1, then we move the filters one

pixel at a time. When the stride is 2, then the filters jump 2

pixels at a time as we slide them around. Figure 9 shows that a

2*2 filter moves along the input size of 4*4 through width and

height when the

xxiv
stride is 1.

Figure 9: Sliding filter along an input image when the stride is 1

– Zero Padding: Sometimes in the input layer, we pad the input image with zero
that is called zero-padding. Zero padding allows us to control the size of the
input layer. If we don’t use zero-padding, sometimes some property from the
edges can be lost.
Figure 10 illustrates the scenario of zero-padding of an input.

Figure 10: Zero-padding of an input (padding amount = 1)

The task of pooling is done by is done by summarizing the sub-regions of the input
using some methods like taking the average or the maximum or the minimum value of
the sub-regions only. These methods are called pooling functions.

– Different Kinds of Pooling Functions:

The pooling layer consists of some symmetric aggregation functions such as:
∗ Max Pooling: It returns the maximum value from its rectangular neighborhood.
∗ Average Pooling: It returns the maximum value from its rectangular
neighborhood.
∗ Weighted Average Pooling: It calculates its neighborhood weight based on
distance from its center pixel.
xxv
∗ L2 Norm Pooling: It returns the square root sum of its rectangular neighborhood.

xxvi
We have used Max Pooling to reduce the computational cost.
– Pooling Layer Arithmetic:
Pooling layer works by sliding the window or filter
across the input.
Let, Spatial Extent = f
Stride = s
window size = w
The equation shows that the output size from the pooling layer will be,

• Fully-Connected Layer:
In fully connected layer, every neuron is connected to its previous layer neuron like
the neural network. Its activation is also computed by matrix multiplication with its
weight followed by bias as like neural network. Usually, fully connected layer is a
column vector.

Figure 11: Fully Connected Layer of CNN

(iv) Activation Function:
Activation functions are used to introduce non-linearity to neural networks. It squashes
the values in a smaller range. For example, a sigmoid activation function squashes
values between a range 0 to 1.

xxvii
Figure 12: Before(left) and after(right) applying activation function

• Commonly Used Activation Functions:

There are many activation functions used in deep learning industry. Here we will
discuss about some activation functions in brief which are commonly in use.
– Sigmoid:
The Sigmoid function bounds the input value in between 0 to 1 range.
For large positive number, it returns 1 and for large negative number, it returns
0. The mathematical representation of the sigmoid function is

Figure 13: Curve of Sigmoid Function

Advantages of Sigmoid Function:
1. It Provides smooth gradient and prevents any ‘jump’ in output values.
2. It normalizes the output of each neuron.
3. It enables clear predictions.
Disadvantages of Sigmoid Function:
1. It causes a ‘vanishing gradient’ problem. For very high or very low input
values, there is almost no change to the prediction. This can result in the
network refusing to learn further, or being too slow to reach an accurate
prediction.
xxviii
2. Outputs are not zero centered.

xxix
– Tanh/Hyperbolic Tangent:
This function is like the sigmoid function. It bounds all real numbers to the range [-1,
1]. The tanh function is mainly used classification between two classes.

Figure 14: Curve of tanh Function

Advantages of tanh Function:
1. This function is zero centered, that makes it easier to model inputs that have
strongly negative, neutral, and strongly positive values.
2. Otherwise like the sigmoid function.
Disadvantages of tanh Function:
The disadvantages of tanh function are as like as the sigmoid function.

– ReLU (Rectified Linear Unit):

ReLU refers to Rectified Linear Unit. It simply thresholds the input value to zero.
For positive, it returns the number and for negative, it returns 0. In AlexNet
architecture, after using ReLU as an activation function, it was 6 times faster
than using the tanh function. The formula of ReLU is as following:

Figure 15: Curve of ReLU Function

Advantages of ReLU Function:
1. It is computationally efficient and allows the network to converge very quickly.
2. Although it looks like a linear function, ReLU is non-linear.
xxx
3. It has a derivative function and allows for back-propagation.
Disadvantages of ReLU Function:

xxxi
It introduces the Dying ReLU problem. When inputs approach zero, or are negative,
the gradient of the function becomes zero, the network cannot perform back-
propagation and cannot learn.

– Leaky ReLU:
It is an improved version of ReLU. It solves the Dying problem of ReLU. For a
positive number, it works like ReLU, and for a negative number, the number is
multiplied by a very small number (i.e. 0.001). The mathematical representation of
this function is:

Figure 16: Curve of leaky ReLU Function

Advantages of Leaky ReLU Function:
1. It prevents dying ReLU problem. This variation of ReLU has a small positive
slope in the negative area, so it does enable back-propagation, even for negative
input values.
2. Otherwise like ReLU.

Swish Activation Function:

The Swish activation function was proposed by researchers at Google in 2017. It is a smooth, non-
monotonic function that tends to outperform ReLU on deeper models across various benchmarks.
Mathematical Formula:
Swish(x)=x⋅σ(x)\text{Swish}(x) = x \cdot \sigma(x)Swish(x)=x⋅σ(x)
Where σ(x)\sigma(x)σ(x) is the sigmoid function:
σ(x)=11+e−x\sigma(x) = \frac{1}{1 + e^{-x}}σ(x)=1+e−x1
Characteristics:
• Smoothness: Unlike ReLU, Swish is a continuously differentiable function, which helps in
stable training.
• Non-monotonic: Swish allows small negative inputs, unlike ReLU which strictly sets them to
zero. This can help the model retain more nuanced information.
• Self-gating: Since it uses the sigmoid of input multiplied by input, it can regulate the flow of
gradients more effectively.
xxxii
Benefits in Deep Learning:
• Better generalization in deep neural networks.
• Improves accuracy in computer vision models, especially in object detection and image
classification.
• Used in architectures like EfficientNet.

Mish Activation Function

Mish is another smooth and non-monotonic activation function introduced as an improvement
over Swish. It has shown even better empirical performance across several datasets and tasks.
Mathematical Formula:

Characteristics:

• Smooth and Differentiable: Like Swish, Mish is also smooth and avoids abrupt transitions in
gradient flow.
• Non-monotonic: Mish allows negative values, unlike ReLU, enabling richer information
propagation.
• Stronger Regularization: Mish tends to act as a natural regularizer, preventing overfitting and
enhancing generalization.
•
Benefits in Deep Learning:

• Outperforms ReLU, Swish, and other functions in object detection and image classification
benchmarks.
• More stable during training and capable of deeper network convergence.
• Used in recent versions of YOLO (including YOLOv4 and YOLOv8), where it improves detection
accuracy and model robustness.

xxxiii
Basic Terminology in yolo

Bounding Box:
A bounding box in essence, is a rectangle that surrounds an object, that specifies its position, class(eg: car,
person) and confidence (how likely it is to be at that location).

Bounding box specified with respect to its top left and Bounding box specified with respect to its center
bottom right points coordinates

xxxiv
IoU

IOU (Intersection over Union) is a term used to describe the extent of overlap of two boxes. The greater the
region of overlap, the greater the IOU.

If IoU is close to 1 then we can say that our model perfectly overlaps the object.
If IoU is close to 0 or 0 then we can say that our model didn’t predict the object coordinates at all.

NMS

Non max suppression is a technique used mainly in object detection that aims at selecting the best bounding box
out of a set of overlapping boxes.

xxxv
Cross Stage Partial Network (CSPNet)

xxvii
Spatial Pyramid Pooling (SPP)

• During classification tasks, the output feature map is flattened and directed to the FC layer for further
softmax operation. However, to use the FC layer, we have to fix the size of an input image while
training that hinders to detect objects at different scales and object aspect ratios.

• To solve this issue, the final output feature map undergoes channel-wise pooling for different sizes of
spatial bins. If input feature map dimensions are 512X100X100(CXHXW) and spatial bins are 1X1,
2X2, 4X4, then SPP generates 512, 4*512, 16*512 1-D vectors which are then concatenated to feed into
FC layer.

1.2 Overview of YOLOv8

YOLOv8 is the latest version [23-24] of the YOLO (You Only Look Once) models. The YOLO models are
popular for their accuracy and compact size. It is a state-of-the-art model that could be trained on any powerful
or low-end hardware. Alternatively, they can also be trained and deployed on the cloud. The first YOLO
model was introduced in a C repository called Darknet in 2015 by Joseph Redmond [15] when he was working
on it as PHD at the University of Washington. It has since been developed by the community for subsequent
versions.

xxix
YOLOv8 is developed by Ultralytics, a team known for its innovative YOLOv5 model [20]. It was introduced
on January 10th, 2023. YOLOv8 is used to detect objects in images, classify images, and distinguish objects
from each other. Ultralytics has made numerous enhancements to YOLOv8, making it better and more user-
friendly than YOLOv5. It is an advanced model that improves upon the success of YOLOv5 by incorporating
modifications that enhance its power and user-friendliness in various computer vision tasks. These
enhancements include a modified backbone network, an anchor-free detection head, and a new lossfunction.
Furthermore, it provides built-in support for image classification tasks. YOLOv8 is distinctive in that it delivers
unmatched speed and accuracy performance while maintaining a streamlined design that makes it suitable for
different applications and easy to adapt to various hardware platforms.

Architecture of YOLOv8
As of the current writing, there is no published paper yet on YOLOv8, so detailed insights into the research
techniques and ablation studies conducted during its development are unavailable. However, an analysis of the
YOLOv8 repository [24] and its documentation [23] over its predecessor YOLOv5 [20], reveals several key
features and architectural improvements.

Architecture components
The YOLOv8 architecture is composed of two major parts, namely the backbone and the head, both of which
use a fully convolutional neural network.

YOLOv8 Variants

xxvii
All of these models belong to the YOLOv8 family, each variant offers different trade-offs between accuracy,
speed, and model size. The variants are divided based on the difference in the value of the parameters like
depth_multiple (d), width_multiple (w), and max_channel (mc).

depth_multiple(d):
depth_multiple parameter determines how many Bottleneck Blocks are used in the C2f block. This scales the
number of layers in the network. A value less than 1 reduces the depth (fewer layers), making the model smaller
and faster but potentially less accurate. Conversely, a value greater than 1 increases the depth (more layers),
leading to a larger and potentially more accurate model but slower to run.
width_multiple (w):
This scales the number of channels in the convolutional layers. A value less than 1 thins the network (fewer
channels), resulting in a smaller and faster model but potentially sacrificing some accuracy. On the other hand,
a value greater than 1 widens the network (more channels), creating a larger and potentially more accurate
model but requiring more processing power.
max_channels (mc):
This parameter sets an upper limit on the number of channels allowed in the network. It is a safety measure to
prevent the model from becoming too wide (too many channels) especially when width_multiple is set high.
This can help control the model size and prevent overfitting.

Types of YOLOv8:
• n: smallest model, fastest inference but lowest accuracy
• s: small model, good balance of speed and accuracy
• m: medium model, higher accuracy than small models with moderate inference speed
• l: large model, highest accuracy but slowest inference
• xl: extra-large model, best accuracy for resource-intensive applications

Blocks used in YOLOv8 Architecture

Before taking a deep dive into the architecture of YOLOv8, we have to learn about the basic blocks used in the
architecture.
Convolutional Block (Conv Block)

xxxi
It is the most basic block in the architecture which consists of the Conv2d layer, BatchNorm2d layer,
and SiLU activation function.
Conv2d Layer: Convolution is a mathematical operation that involves sliding a small matrix (called a
kernel or filter) over the input data, performing element-wise multiplication, and summing the results to
produce a feature map. The “2D” in Conv2D refers to the fact that the convolution is applied in two
spatial dimensions, typically height and width.

• k: Number of filters or kernels. It represents the depth of the output volume, and each filter is
responsible for detecting different features in the input.

• s: Stride. It is the step size at which the filter/kernel slides over the input. A larger stride reduces the
spatial dimensions of the output volume.

• p: Padding. Padding is the additional border of zeros added to the input on each side. It helps preserve
spatial information and can be used to control the spatial dimensions of the output volume.

• c: Number of channels in the input. For example, in an RGB image, c would be 3 (one channel for each
color: red, green, and blue).

• BacthNorm2d Layer: Batch Normalization (BatchNorm2d) is a technique used in deep neural

networks to improve training stability and convergence speed. In the context of convolutional neural
networks (CNNs), the BatchNorm2d layer specifically applies batch normalization to 2D inputs, which
are typically the outputs of convolutional layers. It ensures that the numbers going through the network
aren’t too big or too small. This helps in preventing problems during training.

xxvii
• SiLU Activation Function: SiLU, which stands for Sigmoid Linear Unit, is an activation function used
in neural networks. It is also known as the Swish activation function.

• The SiLU activation function is defined as follows:

• SiLU(x)=x⋅σ(x)
• where σ(x) is the sigmoid function, which is given by:
• σ(x)=1/(1+e^-x)
• The key characteristic of SiLU is that it allows for smooth gradients, which can be beneficial during the
training of neural networks. Smooth gradients can help avoid issues like vanishing gradients, which can
impede the learning process in deep neural networks.

Bottleneck Block

The bottleneck block consists of the Conv Block with a shortcut connection. If the shortcut=true then
the shortcut is implemented in the bottleneck block else the input is passed through two Conv Blocks in
a series.

Shortcut Connection: The shortcut connection, also known as a skip connection or residual connection,
is a direct connection that bypasses one or more layers in the network. It allows the gradient to flow
more easily through the network during training, addressing the vanishing gradient problem and making
it easier for the model to learn.
In the specific context of a bottleneck block, the shortcut connection allows the model to bypass the
convolutional blocks if necessary. This way, the model can choose to use the identity mapping provided
by the shortcut, making it easier to learn the identity function when needed. The inclusion of a shortcut
connection enhances the ability of the model to learn complex representations and improves the training
of deep CNNs preventing vanishing gradient problems.

xxxiii
What is the vanishing gradient problem?
The vanishing gradient problem is a challenge that arises during the training of deep neural networks,
particularly in architectures with many layers. It occurs when the gradients of the loss function
concerning the parameters (weights) of the network become extremely small as they are backpropagated
from the output layer to the input layer during the training process.

C2f Block

C2f block consists of a convolutional block which then the resulting feature map will be split. One feature map
goes to the Bottleneck block whereas the other goes directly to the Concat block. In the C2f block, the number
of the Bottleneck blocks used is defined by the depth_multiple parameter of the model. At the end, the feature
map from the bottleneck block and the split feature map are concatenated and inputted into a final convolutional
block.
Spatial Pyramid Pooling Fast (SPPF) Block:

xxvii
The SPPF Block consists of a convolutional block followed by three MaxPool2d layers. Every resulting feature
map from the MaxPool2d layer is then concatenated at the end and fed to a convolutional block.
The basic idea behind Spatial Pyramid Pooling is to divide the input image into a grid and pool features from
each grid cell independently, allowing the network to handle images of different sizes effectively.
In essence, Spatial Pyramid Pooling enables neural networks to work with images of different resolutions by
capturing multi-scale information through pooling operations at different levels of granularity. This can be
particularly useful in tasks such as object recognition, where objects may appear at different scales within an
image
While SPP offers advantages, it can be computationally expensive. SPP-Fast addresses this by using a simpler
pooling scheme. Instead of using multiple pooling levels with different kernel sizes, SPP-Fast might use a
single fixed-size kernel for pooling, reducing the number of computations needed. SPP-Fast offers a trade-off
between accuracy and speed.
MaxPool2d Layer: Pooling layers are used to downsample the spatial dimensions of the input volume,
reducing the computational complexity of the network and extracting dominant features. Max pooling is a
specific type of pooling operation where, for each region in the input tensor, only the maximum value is
retained, and the other values are discarded.
In the case of MaxPool2d, the pooling is applied in both the height and width dimensions of the input tensor.
The layer is defined by specifying parameters such as the size of the pooling kernel and the stride. The kernel
size determines the spatial extent of each pooling region, and the stride determines the step size between
successive pooling regions.

xxxv
Detect Block

Detect Block is responsible for the detection of the objects. Unlike in previous versions of YOLO, YOLOv8 is
an anchor-free model which means it predicts directly the center of an object instead of the offset from a known
anchor box. Anchor-free detection reduces the number of box predictions, which speeds up complicated post-
processing steps that sift through candidate detections after inference.
The Detect Block contains two tracks. The first track is for bounding box predictions and the second track is
for class predictions. Both tracks contain two convolutional blocks followed by a single Conv2d layer which
gives the Bounding Box loss and Class Loss respectively

xxvii
YOLOv8 Architecture consists of three main sections: Backbone, Neck, and Head.
Backbone is the deep learning architecture that acts as a feature extractor of the inputted image.
Neck combines the features acquired from the various layers of the Backbone module.
Head predicts the classes and the bounding box of the objects which is the final output produced by the object
detection model.

Backbone Section:

xxxvii
In Block 0, the processing starts with the input image size of 640 x 640 x 3 which is fed to the convolutional
block with kernel size 3, stride 2, and padding 1. The spatial resolution is reduced when stride= 2 is used. The
convolutional block produces the feature map of 320 x 320 because the kernel moves in 2-pixel increments.
To obtain the output channel of the convolution block, the following formula is used :
min(64,mc)*w
Here,
64 is the base output channel
mc is the max_channel
w is the width_multiple
For example, if we are using the “n” variant YOLOv8 model then our final output channel becomes =
min(64,1024)*0.25 = 64*0.25 = 16
Likewise, this operation is calculated in every convolutional block present in the architecture.
Block 2, is a C2f block that contains two parameters i.e. shortcut and n. Here, the shortcut is the boolean
parameter that denotes if the Bottleneck block utilizes the shortcut or not. If the value of the shortcut= true then
the bottleneck block inside the C2f block utilizes the shortcut else it doesn’t.
Here, n determines how many bottleneck blocks are used inside the C2f block. In the case of Block 2, n is given
by:
n= 3*d
where d= depth_multiple

For example, if we are using the “n” variant YOLOv8 model the depth_multiple of the “n” type YOLOv8
model is 0.33 so,the number of bottleneck block used inside the C2f becomes (n) = 3* 0.33=0.99 i.e. 1
bottleneck block is used.
In the C2f block the resolution of the feature map and the output channel is unchanged.
In Block 9, the SPPF Block is used after the last convolution layer of the C2f block in the Backbone.

xxvii
The main function of the SPPF block is to generate the fixed feature representation of the object in various sizes
in an image without resizing the image or introducing spatial information loss.

Neck and Head Section

The neck section is responsible for upsampling the feature map and combining the features acquired from the
various layers of the Backbone section.

The upsample layer present in the Neck section simply increases the feature map by double without making
any changes in the output channel.
Concat Block sums off the output channels of the blocks that are being concatenated without any change in
resolution.
The head section is responsible for predicting the classes and the bounding box of the objects which is the final
output produced by the object detection model.
The first Detect block in the Head section specializes in detecting small objects that are inputted from the C2f
block present in Block 15.
The second Detect block in the Head section specializes in detecting medium-sized objects which is inputted
from the C2f block present in Block 18.
The third Detect block in the Head section specializes in detecting small objects that are inputted from the C2f
block present in Block 21

xxxix
xxvii
Conclusion
In conclusion, YOLOv8, an evolution of the YOLO family, redefines object detection with its anchor-free
architecture, balancing speed and accuracy across various model variants. Utilizing convolutional and
bottleneck blocks, alongside innovative features like Spatial Pyramid Pooling Fast, YOLOv8 efficiently
processes images for real-time detection. Its backbone, neck, and head sections synergize to extract features,
upsample, and predict classes and bounding boxes. With its versatility, YOLOv8 offers a range of models
catering to diverse needs, from rapid inference to high accuracy. Overall, YOLOv8 represents a pinnacle in
object detection, empowering applications with unparalleled performance and user-friendliness.

YOLOv8 Tasks and Modes

The YOLOv8 framework can be used to perform computer vision tasks such as detection, segmentation,
classification, and pose estimation. It comes with pre-trained models for each task. The pretrained models for
detection, Segmentation and Pose are pretrained on the COCO dataset [25-26], while Classification models are
pretrained on the ImageNet dataset. YOLOv8 introduces scaled versions such as YOLOv8n (nano), YOLOv8s
(small), YOLOv8m (medium), YOLOv8l (large), and YOLOv8x (extra big).These several versions provide
variable model sizes and capabilities, catering tovarious requirements and use scenarios.For Segmentation,
Classification and Pose estimation; these various scaled versions use suffixes such as -seg, -cls and -pose
respectively. These tasks don’t require additional commands and scripts for making masks, contours or for
classifying the images. With a well-labelled and sufficient dataset, the accuracy can be high. Also using a GPU
over a CPU is recommended for the training process to furtherenhance the performance by decreasing
computation time. YOLOv8 offers multiple modes that can be used either through a command line interface
(CLI) or through Python scripting, allowing users to perform different tasks based on their specific needs and
requirements. These modes are
Train. This mode is used to train a custom model on a dataset with specified hyperparameters. During the
training process, YOLOv8 employs adaptive techniques to optimize the learning rate and balance the loss
function. This leads to enhanced model performance.

Val. This mode is used to evaluate a trained model on a validation set to measure its accuracy and
generalization performance. This mode can help in tuning the hyperparameters of the model for improved
performance.

Predict. This mode is used to make predictions using a trained model on new images or videos. The model is
loaded from a checkpoint file, and users can input images or videos for inference. The model predicts object
classes and locations in the input file.

Export. This mode is used to convert a trained model to a format suitable for deployment in other software
applications or hardware devices. This mode is useful for deploying the model in production environments.
Commonly used YOLOv8 export formats are PyTorch, TorchScript, TensorRT, CoreML, and PaddlePaddle.

Track. This mode is used to perform real-time object tracking in live video streams. The model is loaded from
a checkpoint file and can be used for applications like surveillance systems or self-driving cars.

xli
Benchmark. This mode is used to profile the performance of different export formats in terms of speed and
accuracy. It provides information on the size of the exported format, mAP50-95 metrics for object detection,
segmentation, and pose, or accuracy_top5 metrics for classification, as well as inference time per image.This
enables users to select the most suitable export format for their particular use case, considering their
requirements for speed and accuracy.

Performance Evaluation

4.1 Object Detection Metrics

mAPval. mAPval stands for Mean Average Precision on the validation set. It is a popular metric used to
evaluate the accuracy of an object detection model. Average precision (AP) is calculated for each class in the
validation set, and then the mean is taken across all classes to obtain the mAPval score. A higher mAPval score
indicates better accuracy of the object detection model in detecting objects of different classes in the validation
set.
Speed CPU ONNX. This relates to the object identification model's speed while running on a CPU (Central
Processing Unit) using the ONNX (Open Neural Network Exchange) runtime. ONNX is a prominent deep
learning model representation format, and model speed can be quantified in terms of inference time or frames
per second (FPS). Higher values for Speed CPU ONNX indicate faster inference times on a CPU, which can be
important for real-time or near-real-time applications.

Speed A100 TensorRT. This refers to the speed of the object detection model when running on an A100 GPU
(Graphics Processing Unit) using TensorRT, which is an optimization library developed by NVIDIA for deep
learning inference. Similar to Speed CPU ONNX, the speed can be measured in terms of inference time or
frames per second (FPS). Higher values for Speed A100 TensorRT indicate faster inference times on a
powerful GPU, which can be beneficial for applications that require high throughput or real-time processing.

Latency A100 TensorRT FP16 (ms/img). This refers to the latency or inference time of the object detection
model when running on an NVIDIA A100 GPU with TensorRT optimization, using the FP16 (half-precision
floating point) data type. It indicates how much time the model takes to process a single image, typically
measured in milliseconds per image (ms/img). Lower values indicate faster inference times, which are desirable
for real-time or low-latency applications.

Params (M). Params (M) refers to the number of model parameters in millions. It represents the size of the
model, and generally larger models tend to have more capacity for learning complex patterns but may also
require more computational resources for training and inference.

FLOPs (B). FLOPs (B) stands for Floating point operations per second in billions. It is a measure of the
computational complexity of the model, indicating the number of floating-point operations the model performs
per second during inference. Lower FLOPs (B) values indicate less computational complexity and can be
desirable for resource-constrained environments, while higher values indicate more computational complexity
and may require more powerful hardware for efficient inference.

Benchmark Datasets and Computational Efficiency

Performance of YOLOv8 on COCO. The COCO val2017 dataset [28-29] is a commonly used benchmark
xxvii
dataset for evaluating object detection models. It consists of a large collection of more than 5000 diverse images
with 80 object categories, and it provides annotations for object instances, object categories, and other relevant
information. The dataset is an Industry-Standard benchmark for object detection performance and for
comparing the accuracy and speed of different object detection models

The training and results were sourced from Ultralytics GitHub repository [24]1. All scaled versions of YOLOv8
along with previous versions of YOLO i.e. YOLOv5, YOLOv7 were trained on COCO. Here mAPval values
are for single-model single-scale on the COCO val2017 dataset and Speed is averaged over COCO val images
using an Amazon EC2 P4d instance.

xliii
Performance of YOLOv8 on RF100.
The Roboflow 100 (RF100) dataset [30-31] is a diverse, multi-domain benchmark comprising 100 datasets
created by using over 90,000 public datasets and 60 million public images from the Roboflow Universe, a web
application for computer vision practitioners. The dataset aims to provide a more comprehensive evaluation
of object detection models by offering a wide range of real-life domains, including satellite, microscopic, and
gaming images. With RF100,researchers can test their models' generalizability on semantically diverse
data. 1
YOLOv8 is evaluated on the RF100 benchmark alongside YOLOv5 and YOLOv7. [email protected] is a specific
version of the mAP metric that measures the average precision of a model at a detection confidence threshold of
0.5. In other words, it measures how well the model is able to detect objectswhen it is at least 50% confident
that an object is present in the image.The process and results were sourced from the robofow blog [32]2.
Small versions of each model are trained for a total of 100 epochs. To minimize the effect of random
initialization and ensure reproducibility, each experiment is run using a single seed.

3. PROPOSED METHODOLOGY

Automatic License Plate Detection using YOLOv8" aims to build a smart and efficient system capable of
recognizing license plates in real time. It combines object detection with Optical Character Recognition (OCR)
to facilitate automated vehicle identification, which can further be integrated into mobile or web-based
applications for traffic law enforcement, smart parking, and toll management systems. The network architecture
used for the detection of number plates in the project is YOLOv8(You only view once). This process involves
training on the dataset which involves images containing vehicles with annotated bounding boxes around their
number plates.
The administrative interface also empowers the administrators to manage driver's licences, vehicle records, and
fine details. Additional features include the capability to modify fines, add new licences and vehicles, ensuring
the system remains adaptable to changing regulatory requirements
Our project's goal is to make it easier for citizens to report traffic offenses by offering an easy-to-use interface
xxvii
that makes it simple to submit facts and supporting documentation. The application uses machine learning
models to automate license plate recognition and vehicle identification, providing traffic enforcement personnel
with a quick and precise decision-making process.
The proposed methodology incorporates modern tools and frameworks including YOLOv8, PyTorch,
Pytesseract, and Python, structured as follows:

Number Plate Detection

This step involves training of YOLOv8 model to 3 identify the number plate from the image frame. This
Process involves collection of dataset of various number plate images and annotate the data on number plate by
bounding boxes around each image and train 7 the model in order to recognize the number plates
automaticallyon basis of the pattern obtained.

xlv
Optical Character Recognition (OCR)
After the stage of number plate detection by YOLOv8 the text must be extracted which is done by OCR engine.
The OCR engine[11] processes the number plate detected and generates an output of the characters thatwere
recognized.

Data Preparation and Preprocessing

During data preprocessing, bounding boxes for vehicles and license plates are converted to the YOLO format
[13]. The original bounding box is typically described by its top-left corner coordinates (xmin , ymin ) with width
w and height h, along with the image dimension (img_width, img_height ).The conversion to YOLO format
involves transforming these values into normalized center coordinates (xcenter, ycenter) andthe normalized width
and height of the bounding box. The following equations are used for this transformation:

These normalized values ensure that the bounding box coordinates are scaled between 0 and 1 relative
to the image dimensions, which is the format required for YOLO training.

Bounding Box Regression Loss

This term minimizes the difference between the predicted and actual bounding box coordinates. It can be
expressed as:

2 2 2
𝑜𝑏𝑗
̂) + (√ℎ − √ℎ̂) ]
𝐿𝑏𝑏𝑜𝑥 = ∑𝑠𝑖=1 ∑𝐵𝑗=1 11𝑗 [(𝑥 − 𝑥̂)2 + (𝑦 − 𝑦̂)2 + (√𝑤 − √𝑤

where x,y,w, ℎ are the ground truth bounding box center and dimensions 𝑥̂, 𝑦, ̂ ℎ̂ are the predicted bounding
̂ 𝑤,
𝑜𝑏𝑗
box center and dimensions, S is the number of grid cells, B is the number of bounding boxes per cell, 1𝑖𝑗 is an
indicator function that is 1 if the object is present in the i-th grid cell and 0 otherwise.

Object Confidence Loss

YOLO predicts a confidence score for each bounding box. The confidence score is the product of the objectless
probability (whether the box contains an object) and the Intersection over Union (IoU) between the predicted
and
ground truth boxes [11]. The confidence loss is given by:

xxvii
where 𝑐𝑖 is the ground truth confidence score (1 if the object is present, 0 otherwise), 𝑐̂𝑖 is the predicted
𝑛𝑜_𝑜𝑏𝑗
confidence score, 1𝑖𝑗 is 1 if no object is present in the grid cell and 0otherwise.

Classification Loss
For each bounding box, the model predicts the probability of the object belonging to a particular
class(e.g.,car,motorcycle, license plate). The classification loss is calculatedas the cross-entropy between the
true and predicted class:

where 𝑝𝑖 (c) is the ground truth probability for class C,𝑝̂ (𝑐) is the predicted class probability for class
c,C is the total number of classes. The total loss ℒ for the YOLO model is the sum of these three components:

where 𝜆𝑏𝑏𝑜𝑥, 𝜆𝑐𝑜𝑛𝑓, 𝜆𝑐𝑙𝑎𝑠𝑠 are weighting factors for the respective loss terms.

License Plate Recognition

Once YOLOv8 detects the license plate region, we extract the text. The OCR process can be described as
recognizing
characters C from the cropped image 𝐼𝑝𝑙𝑎𝑡𝑒 . The OCR engine aims to maximize the probability of generating
the correct sequence of characters:

where Ct is the character at time step, 𝜃 are the parameters of the OCR model, T is the total number of
characters in the
recognized sequence. This sequence is generated by selecting the most probable character at each step 𝑡, based
on the
features extracted from the image.

xlvii
xlviii
EXPERIMENTAL RESULTS AND EVALUATION

3.1 Dataset

This dataset consists of images of car license plates, paired with their corresponding annotations in YOLO
format. It is designed for training and evaluating models focused on detecting car license plates in images.
The dataset was derived from the Car License Plate Detection dataset on Kaggle and has been split into
training and testing subsets.

• Label: The class of the object (for this dataset, it will always be 0, representing car license plates).
• Xc, Yc: Center coordinates of the bounding box, normalized to the width and height of the image.
• W, H: Width and height of the bounding box, also normalized.

xlix
Implementation platform
The proposed methodology is implemented and tested on the specified dataset using Google
Colab, a cloud-based platform that supports Python 3.9.16. The computations are performed
on a laptop equipped with a 12th Gen Intel(R) Core(TM) i5-1235U processor, operating at
1,300 MHz, featuring 10 cores and 12 logical processors. The operating system used is
Microsoft Windows 11.

3.2 Performance Measures

1. Confusion Matrix: The Confusion matrix is one of the most intuitive metrics
used to find the correctness and accuracy of the model. It is used for Classification
problem where the output can be of two or more types of classes. The Confusion
Matrix consists of four parameters which are described below-
• True Positive (TP): Number of tumor images that are correctly classified.
• True Negative (TN): Number of non-tumor images that are correctly
classified.
• False Positive (FP): Number of non-tumor images that are misclassified as
tumor.
• False Negative (FN): Number of tumor images that are misclassified as
non-tumor.

2. Accuracy: It’s the most popular performance matrix which measures how
often the classifier produce the correct prediction. Mathematically Accuracy defined as
the ratio of the number of correct predicted images and the total number of images and
symbolically represented as

3. Precision: It is the retrieved information that are relevant to the model.

Precision is the ratio of the number of tumor images that are correctly classified (TP)
and the number of images classified or misclassified as tumor (TP + FP). The lower
the FP the higher the Precision. The model is more effective in case of higher
precision rate.

4. Recall: It is the fraction of the images which are successfully retrieved. Recall
is the ratio of the number of tumor images that are correctly classified and the number
of images that are to be predicted. Sensitivity, Hit Rate, True Positive Rate are the
other names of Recall. The lower the False negative the higher the recall because the
number of tumor images that are classified as non-tumor is low.

xxxii
xxxiv
5. F-Score: It is the harmonic mean of Precision and Recall and is a measure of
test accuracy. F-score reaches its best value at 1 (100% precision and recall) and worst
value at 0. F-Score can be defined as

6. Specificity: Specificity is the True Negative Rate (TNR) of the model and the
statistical measure of the binary classification test. As we are dealing with a binary
classification (tumor or nontumor) so we can use this as the performance evaluation. It
is the ratio of the number of non-tumor images that are correctly classified (TN) and the
number of images that are classified or misclassified as non-tumor (TN + FP). The lower
the false positive (FP) the higher the specificity or selectivity.

Confusion Matrix of the Proposed Model

Figure 20: Confusion Matrix of the Proposed Model

xxxii
7. Key Metric Analysis:
o Precision (0.922): Indicates the classifier's ability to correctly identify true
positives among predicted positives in the testing phase., showing
reliability in predictions.

o Recall (0.96): Reflects the ability to identify true positives out of actual
positives.

o F1-Score (0.88): Balances Precision and Recall, and its drop is

proportionate to the individual metrics. It highlights a well-rounded
performance.

xxxv
o mAP50 (0.91459): Indicates the overall correct predictions, which remain
high.

o
o o

xxxv
xxxv
xxxv
4. CONCLUSION AND SCOPE FOR FUTURE WORK

This study demonstrated the effectiveness of a YOLOv8-based system for Automatic License Plate Recognition
(ALPR)on resource-constrained devices, achieving high detection accuracy and real-time performance. Among
the YOLOv8 variants, YOLOv8-n balanced precision and speed, achieving a mean Average Precision (mAP) of
92% and maintaining 36 FPS on the UFPR-ALPR dataset, making it suitable for mobile and embedded systems.
Image preprocessing techniques, such as sharpening, further improved recognition rates inchallenging
environments. At the same time, comparisons withstate-of-the-art Optical Character Recognition (OCR)
methods highlighted the system’s robustness and adaptability across diverse datasets. Despite its success,
challenges remain in generalizing to uncommon license plate formats and extremeconditions, which can be
addressed by expanding training datasets, optimizing lightweight architectures, and exploring advanced transfer
learning

FUTURE WORK

Current system is working with the images, in future we will try to add the video functionality also so
user can upload the video.

In future we will try to add live tracking like from live CCTV Cam it can detect the vehicle's number
plate & make the record.

This system only can detect the number plate from the car, but in future we will do with the all types of
vehicle.

Currently we are working with the Haar cascading technique but we will try with the Yolo/TF object
detection for the plate detection, it will be more accurate.

Our system can work only on the desktop but we have planned for the android application, we will
implement it in future.

Currently we are using the dummy data for the vehicle's owner information, we have tried a lot but there
are no government API available which can provide the real data.

We will find any other way to implement this in the future.

xxxv
REFERENCES
[1] R. Antar, S. Alghamdi, J. Alotaibi, and M. Alghamdi, "Automatic
Number Plate Recognition of Saudi License Car Plates," Engineering,
Technology & Applied Science Research, vol. 12, no. 2, pp. 8266–8272,
Apr. 2022, https://doi.org/10.48084/etasr.4727.
[2] M. S. Sheikh, J. Liang, and W. Wang, "A Survey of Security Services,
Attacks, and Applications for Vehicular Ad Hoc Networks (VANETs),"
Sensors, vol. 19, no. 16, Aug. 2019, Art. No. 3589,
https://doi.org/10.3390/s19163589.
[3] Y. Yang, Y. Xiao, Z. Chen, D. Tang, Z. Li, and Z. Li, "FCBTYOLO: A
Lightweight and High-Performance Fine Grain Detection Strategy for
Rice Pests," IEEE Access, vol. 11, pp. 101286–101295, 2023,
https://doi.org/10.1109/ACCESS.2023.3314697.
[4] M. Salemdeeb and S. Erturk, "Multi-national and Multi-language
License Plate Detection using Convolutional Neural Networks,"
Engineering, Technology & Applied Science Research, vol. 10, no. 4,
pp. 5979–5985, Aug. 2020, https://doi.org/10.48084/etasr.3573.
[5] C. Henry, S. Y. Ahn, and S.-W. Lee, "Multinational License Plate
Recognition Using Generalized Character Sequence Detection," IEEE
Access, vol. 8, pp. 35185–35199, 2020, https://doi.org/10.1109/
ACCESS.2020.2974973.
[6] H. Shi and D. Zhao, "License Plate Recognition System Based on
Improved YOLOv5 and GRU," IEEE Access, vol. 11, pp. 10429–10439,
2023, https://doi.org/10.1109/ACCESS.2023.3240439.
[7] X. Li, X. Wang, C. Qu, J. Song, H. Li, and Y. Xi, "Vehicle pose
estimation by parking AGV based on RGBD camera," in 2024 9th
International Conference on Automation, Control and Robotics
Engineering (CACRE), Jeju Island, Korea, Republic of, Jul. 2024, pp.
289–294, https://doi.org/10.1109/CACRE62362.2024.10635074.
[8] T. Mustafa and M. Karabatak, "Real Time Car Model and Plate
Detection System by Using Deep Learning Architectures," IEEE Access,
vol. 12, pp. 107616–107630, 2024, https://doi.org/10.1109/
ACCESS.2024.3430857.
[9] A. Tourani, A. Shahbahrami, S. Soroori, S. Khazaee, and C. Y. Suen, "A
Robust Deep Learning Approach for Automatic Iranian Vehicle License
Plate Detection and Recognition for Surveillance Systems," IEEE
Access, vol. 8, pp. 201317–201330, 2020, https://doi.org/10.1109/
ACCESS.2020.3035992.
[10] M. S. Beratoğlu and B. U. Töreyіn, "Vehicle License Plate Detector in
Compressed Domain," IEEE Access, vol. 9, pp. 95087–95096, 2021,
https://doi.org/10.1109/ACCESS.2021.3092938.
[11] K. Sangsuwan and M. Ekpanyapong, "Video-Based Vehicle Speed
xxxix
Estimation Using Speed Measurement Metrics," IEEE Access, vol. 12,
pp. 4845–4858, 2024, https://doi.org/10.1109/ACCESS.2024.3350381.
[12] W. Ali, G. Wang, K. Ullah, M. Salman, and S. Ali, "Substation Danger
Sign Detection and Recognition using Convolutional Neural Networks,"
Engineering, Technology & Applied Science Research, vol. 13, no. 1,
pp. 10051–10059, Feb. 2023, https://doi.org/10.48084/etasr.5476.
[13] D. Habeeb, A. H. Alhassani, L. N. Abdullah, C. S. Der, and L. K. Q.
Alasadi, "Advancements and Challenges: A Comprehensive Review of
GAN-based Models for the Mitigation of Small Dataset and Texture
Sticking Issues in Fake License Plate Recognition," Engineering,
Technology & Applied Science Research, vol. 14, no. 6, pp. 18401–
18408, Dec. 2024, https://doi.org/10.48084/etasr.8870.
[14] S. Luo and J. Liu, "Research on Car License Plate Recognition Based on
Improved YOLOv5m and LPRNet," IEEE Access, vol. 10, pp. 93692–
93700, 2022, https://doi.org/10.1109/ACCESS.2022.3203388.
[15] Z. Xu et al., "Towards End-to-End License Plate Detection and
Recognition: A Large Dataset and Baseline," in Computer Vision –
ECCV 2018, vol. 11217, V. Ferrari, M. Hebert, C. Sminchisescu, and Y.
Weiss, Eds. Cham: Springer International Publishing, 2018, pp. 261–
277, https://doi.org/10.1007/978-3-030-01261-8_16.
[16] R. Laroca et al., "A Robust Real-Time Automatic License Plate
Recognition Based on the YOLO Detector," in 2018 International Joint
Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, Jul.
2018, pp. 1–10, https://doi.org/10.1109/IJCNN.2018.8489629.
[17] R. Laroca, E. Cardoso, D. Lucio, V. Estevam, and D. Menotti, "On the
Cross-dataset Generalization in License Plate Recognition:," in
Proceedings of the 17th International Joint Conference on Computer
Vision, Imaging and Computer Graphics Theory and Applications,
Online Streaming, 2022, pp. 166–178, https://doi.org/
10.5220/0010846800003124.
[18] R. Laroca, A. B. Araujo, L. A. Zanlorensi, E. C. De Almeida, and D.
Menotti, "Towards Image-Based Automatic Meter Reading in
Unconstrained Scenarios: A Robust and Efficient Approach," IEEE
Access, vol. 9, pp. 67569–67584, 2021,
https://doi.org/10.1109/ACCESS.2021.3077415.
[19] S. M. Silva and C. R. Jung, "Real-time license plate detection and
recognition using deep convolutional neural networks," Journal of
Visual Communication and Image Representation, vol. 71, Aug. 2020,
Art. no. 102773, https://doi.org/10.1016/j.jvcir.2020.102773.
[20] F. Borisyuk, A. Gordo, and V. Sivakumar, "Rosetta: Large Scale System
for Text Detection and Recognition in Images," in Proceedings of the
24th ACM SIGKDD International Conference on Knowledge Discovery

xxxv
& Data Mining, London United Kingdom, Jul. 2018, pp. 71–79,
https://doi.org/10.1145/3219819.3219861.
[21] G. R. Gonçalves, M. A. Diniz, R. Laroca, D. Menotti, and W. R.
Schwartz, "Multi-task Learning for Low-Resolution License Plate
Recognition," in Progress in Pattern Recognition, Image Analysis,
Computer Vision, and Applications, vol. 11896, I. Nyström, Y.
Hernández Heredia, and V. Milián Núñez, Eds. Cham: Springer
International Publishing, 2019, pp. 251–261

xxxix

K P P Abhilash Emergency Medicine Best Practices at CMC EMAC 2018
100% (1)
K P P Abhilash Emergency Medicine Best Practices at CMC EMAC 2018
531 pages
Smart Parking System Using Yolov3 Deep Learning Model: Major Project Report
No ratings yet
Smart Parking System Using Yolov3 Deep Learning Model: Major Project Report
26 pages
License
No ratings yet
License
14 pages
Automatic Number Plate Recognition System: Computer Science and Engineering
No ratings yet
Automatic Number Plate Recognition System: Computer Science and Engineering
67 pages
License Plate Detection Using Yolov8X and Easy OCR: Abstract
No ratings yet
License Plate Detection Using Yolov8X and Easy OCR: Abstract
9 pages
Real-Time Vehicle Classification and License Plate Recognition Via Deformable Convolution-Based Yolo v8 Network
No ratings yet
Real-Time Vehicle Classification and License Plate Recognition Via Deformable Convolution-Based Yolo v8 Network
8 pages
Main PPT c5
No ratings yet
Main PPT c5
19 pages
Car Parking Space Detection Using Yolo: A Project Report Submitted by
No ratings yet
Car Parking Space Detection Using Yolo: A Project Report Submitted by
60 pages
Vehicle Number Plate Detection
No ratings yet
Vehicle Number Plate Detection
18 pages
PYQ DEMO COMBO PYQ BANK All Odisha Previous Year Subject Wise Topic Wise 20000 Questions Answer PDF
100% (1)
PYQ DEMO COMBO PYQ BANK All Odisha Previous Year Subject Wise Topic Wise 20000 Questions Answer PDF
51 pages
License Plate Detection Using YOLOv8
No ratings yet
License Plate Detection Using YOLOv8
36 pages
ODI Interview Questions and Answers
88% (8)
ODI Interview Questions and Answers
13 pages
Cost Volume Profit Analysis & Absorption Costing
0% (1)
Cost Volume Profit Analysis & Absorption Costing
21 pages
1 s2.0 S1877050923010761 Main
No ratings yet
1 s2.0 S1877050923010761 Main
84 pages
Noi Dung
No ratings yet
Noi Dung
82 pages
REPORT
No ratings yet
REPORT
59 pages
Project Report Pallapati
No ratings yet
Project Report Pallapati
62 pages
Mayank
No ratings yet
Mayank
38 pages
Class Actvity 1 Answers
55% (11)
Class Actvity 1 Answers
10 pages
7th Semester - Project Report
No ratings yet
7th Semester - Project Report
28 pages
7th Semester - Project Report (FINAL)
No ratings yet
7th Semester - Project Report (FINAL)
35 pages
Technical Seminar Report - Ganesh
No ratings yet
Technical Seminar Report - Ganesh
19 pages
8782-Article Text-22476-1-10-20240305
No ratings yet
8782-Article Text-22476-1-10-20240305
7 pages
ANPD Project Documentation
No ratings yet
ANPD Project Documentation
49 pages
Individual Assignments of Emerging
No ratings yet
Individual Assignments of Emerging
4 pages
ALPR
No ratings yet
ALPR
7 pages
Sessional Marks (Theory)
0% (1)
Sessional Marks (Theory)
1 page
Real-Time Fake Number Plate Detection and Analysis With Raspberry Pi and Deep Learning
No ratings yet
Real-Time Fake Number Plate Detection and Analysis With Raspberry Pi and Deep Learning
20 pages
No. Plate Reserch Paper
No ratings yet
No. Plate Reserch Paper
22 pages
ML Based Number Plate Recognition Model Using Computer Vision
No ratings yet
ML Based Number Plate Recognition Model Using Computer Vision
6 pages
Main
No ratings yet
Main
3 pages
6 Month MCQs (Oct To May 25) English
No ratings yet
6 Month MCQs (Oct To May 25) English
197 pages
Abhi
No ratings yet
Abhi
20 pages
Why The Hammered Bracelet Could Not Be Flown Over
No ratings yet
Why The Hammered Bracelet Could Not Be Flown Over
21 pages
DL Project Report Anirudh Bhardwaj
No ratings yet
DL Project Report Anirudh Bhardwaj
11 pages
Paper Final
No ratings yet
Paper Final
8 pages
Vanity Plate Identification
No ratings yet
Vanity Plate Identification
7 pages
Final Word Project
No ratings yet
Final Word Project
41 pages
Final Report File
No ratings yet
Final Report File
22 pages
22IT071
No ratings yet
22IT071
12 pages
Saquib&Shiva Final ML Report
No ratings yet
Saquib&Shiva Final ML Report
16 pages
Chapter 1 - Marketing in Today's Economy
No ratings yet
Chapter 1 - Marketing in Today's Economy
43 pages
Final Project Thesis
No ratings yet
Final Project Thesis
7 pages
Pooja
No ratings yet
Pooja
10 pages
File Goc Hybrid WPod-Net and Tesseract OCR For Intelligent Parking System - A Novel Approach To Vietnamese License Plate Recognition
No ratings yet
File Goc Hybrid WPod-Net and Tesseract OCR For Intelligent Parking System - A Novel Approach To Vietnamese License Plate Recognition
15 pages
Applsci 13 04902
No ratings yet
Applsci 13 04902
16 pages
Reprint
No ratings yet
Reprint
23 pages
Ai Project Report PDF
No ratings yet
Ai Project Report PDF
54 pages
Group Project Presentation PDF
No ratings yet
Group Project Presentation PDF
13 pages
Automatic Number Plate Recognition System
No ratings yet
Automatic Number Plate Recognition System
10 pages
Ideal Project Report
No ratings yet
Ideal Project Report
36 pages
IEEE
No ratings yet
IEEE
5 pages
Mini
No ratings yet
Mini
22 pages
License Plate Recognition System Based On Improved
No ratings yet
License Plate Recognition System Based On Improved
13 pages
License Plate Recognition OpenCV Python
No ratings yet
License Plate Recognition OpenCV Python
2 pages
Final
No ratings yet
Final
22 pages
Drone Based Automatic Number Plate Detection and Database Updating Using IoT
No ratings yet
Drone Based Automatic Number Plate Detection and Database Updating Using IoT
8 pages
Final Report
No ratings yet
Final Report
32 pages
Project Report
No ratings yet
Project Report
57 pages
Dip Project MUSKAN
No ratings yet
Dip Project MUSKAN
5 pages
SCADA
No ratings yet
SCADA
12 pages
Phase 2
No ratings yet
Phase 2
54 pages
Research - Paper License Plate Recognition
No ratings yet
Research - Paper License Plate Recognition
6 pages
Number Plate Project
No ratings yet
Number Plate Project
8 pages
Domain: Computer Vision
No ratings yet
Domain: Computer Vision
21 pages
License Plate Detection Project Report
No ratings yet
License Plate Detection Project Report
5 pages
NexData ML Solutions
No ratings yet
NexData ML Solutions
4 pages
Alpr 2046 Fyp FR
No ratings yet
Alpr 2046 Fyp FR
18 pages
Week 5 MODULE PURPOSIVE COMMUNICATION
No ratings yet
Week 5 MODULE PURPOSIVE COMMUNICATION
13 pages
Harrison's Rheumatology, 2nd Edition Scribd Download
100% (13)
Harrison's Rheumatology, 2nd Edition Scribd Download
15 pages
Maribel - r92 - El Chico Que Detesto
No ratings yet
Maribel - r92 - El Chico Que Detesto
443 pages
Major Project
No ratings yet
Major Project
1 page
300 Ohm Twin-Lead J-Pole Portable Antenna
No ratings yet
300 Ohm Twin-Lead J-Pole Portable Antenna
3 pages
Mental Math Slide Show
No ratings yet
Mental Math Slide Show
22 pages
The Threats To The Objectivity in Internal Auditing
No ratings yet
The Threats To The Objectivity in Internal Auditing
2 pages
The NGINX Real-Time API Handbook
No ratings yet
The NGINX Real-Time API Handbook
26 pages
Phrasal Verbs
No ratings yet
Phrasal Verbs
20 pages
15MW Periodic Maintenace Schedule
No ratings yet
15MW Periodic Maintenace Schedule
8 pages
CA Final DT A MTP 2 May 2025 Exam Castudynotes Com
No ratings yet
CA Final DT A MTP 2 May 2025 Exam Castudynotes Com
21 pages
AllPack Cataloque - 11.10.24
No ratings yet
AllPack Cataloque - 11.10.24
8 pages
FLEX-1500 Service Manual
No ratings yet
FLEX-1500 Service Manual
49 pages
Lecture 1a
No ratings yet
Lecture 1a
22 pages
14 Hes
No ratings yet
14 Hes
2 pages
A Study On The Performance of Insurance Companies in 1xynrowx1f
No ratings yet
A Study On The Performance of Insurance Companies in 1xynrowx1f
13 pages
Lovely Professional University: Academic Task - 2 Mittal School of Business
No ratings yet
Lovely Professional University: Academic Task - 2 Mittal School of Business
2 pages
Catalogue Wireframe
No ratings yet
Catalogue Wireframe
11 pages
The Status of Knowledge
No ratings yet
The Status of Knowledge
8 pages
JPPPF June2025 111 02 13 26 Dwi+Ambar
No ratings yet
JPPPF June2025 111 02 13 26 Dwi+Ambar
14 pages
General Biology Chapter 2 Assignment
No ratings yet
General Biology Chapter 2 Assignment
2 pages
Wikitude Development Essentials: Definitive Reference for Developers and Engineers
From Everand
Wikitude Development Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet